1. Getting Started Guide for Linux#
1.1. Introduction#
This document contains instructions for installing, using, and contributing to rocWMMA. The quickest way to install is from prebuilt packages. Alternatively, there are instructions to build from source. The document also contains an API Reference Guide, Programmer’s Guide, and Contributor’s Guides.
1.1.1. Documentation Roadmap#
The following is a list of rocWMMA documents in the suggested reading order:
Getting Started Guide (this document): Describes how to install and configure the rocWMMA library; designed to get users up and running quickly with the library.
API Reference Guide : Provides detailed information about rocWMMA functions, data types and other programming constructs.
Programmer’s Guide: Describes the code organization, Design implementation detail, Optimizations used in the library and those that should be considered for new development and Testing & Benchmarking detail.
Contributor’s Guide : Describes coding guidelines for contributors.
1.2. Prerequisites#
- A ROCm enabled platform, more information here. 
1.3. Installing pre-built packages#
rocWMMA can be installed on Ubuntu or Debian using
sudo apt-get update
sudo apt-get install rocWMMA
rocWMMA can be installed on CentOS using
sudo yum update
sudo yum install rocWMMA
rocWMMA can be installed on SLES using
sudo dnf upgrade
sudo dnf install rocWMMA
Once installed, rocWMMA can be used just like any other library with a C++ API. The rocwmma.hpp header file will need to be included in the user code in order to make calls into rocWMMA.
Once installed, rocwmma.hpp can be found in the /opt/rocm/include directory. Only this installed file should be used when needed in user code. Other rocWMMA files can be found in /opt/rocm/include/internal, however these files should not be directly included.
1.4. Building and Installing rocWMMA#
For most users building from source is not necessary, as rocWMMA can be used after installing the pre-built packages as described above. If desired, the following instructions can be used to build rocWMMA from source.
1.4.1. System Requirements#
As a general rule, 8GB of system memory is required for a full rocWMMA build. This value can be lower if rocWMMA is built without tests. This value may also increase in the future as more functions are added to rocWMMA.
1.4.2. GPU Support#
AMD CDNA class GPU featuring matrix core support: gfx908, gfx90a as ‘gfx9’
Note: Double precision FP64 datatype support requires gfx90a
OR
AMD RDNA3 class GPU featuring AI acceleration support: gfx1100, gfx1101, gfx1102 as ‘gfx11’
1.4.3. Download rocWMMA#
The rocWMMA source code is available at the rocWMMA github page. rocWMMA has a minimum ROCm support version 5.4. Check the ROCm Version on your system. For Ubuntu use
apt show rocm-libs -a
For Centos use
yum info rocm-libs
The ROCm version has major, minor, and patch fields, possibly followed by a build specific identifier. For example the ROCm version could be 4.0.0.40000-23, this corresponds to major = 4, minor = 0, patch = 0, build identifier 40000-23. There are GitHub branches at the rocWMMA site with names rocm-major.minor.x where major and minor are the same as in the ROCm version. For ROCm version 4.0.0.40000-23 you need to use the following to download rocWMMA:
git clone -b release/rocm-rel-x.y https://github.com/ROCmSoftwarePlatform/rocWMMA.git
cd rocWMMA
Replace x.y in the above command with the version of ROCm installed on your machine. For example: if you have ROCm 5.0 installed, then replace release/rocm-rel-x.y with release/rocm-rel-5.0
The user can build either
- library 
- library + samples 
- library + tests 
- library + tests + assembly 
You only need (library) if you call rocWMMA from your code. The client contains the test samples and benchmark code.
Below are the project options available to build rocWMMA library with/without clients.
| Option | Description | Default Value | 
|---|---|---|
| AMDGPU_TARGETS | Build code for specific GPU target(s) | gfx908:xnack-;gfx90a:xnack-;gfx90a:xnack+;gfx1100;gfx1101;gfx1102 | 
| ROCWMMA_BUILD_TESTS | Build Tests | ON | 
| ROCWMMA_BUILD_SAMPLES | Build Samples | ON | 
| ROCWMMA_BUILD_ASSEMBLY | Generate assembly files | OFF | 
| ROCWMMA_BUILD_VALIDATION_TESTS | Build validation tests | ON (requires ROCWMMA_BUILD_TESTS=ON) | 
| ROCWMMA_BUILD_BENCHMARK_TESTS | Build benchmark tests | OFF (requires ROCWMMA_BUILD_TESTS=ON) | 
| ROCWMMA_BUILD_EXTENDED_TESTS | Build extended testing coverage | OFF (requires ROCWMMA_BUILD_TESTS=ON) | 
| ROCWMMA_VALIDATE_WITH_ROCBLAS | Use rocBLAS for validation tests | ON (requires ROCWMMA_BUILD_VALIDATION_TESTS=ON) | 
| ROCWMMA_BENCHMARK_WITH_ROCBLAS | Include rocBLAS benchmarking data | OFF (requires ROCWMMA_BUILD_BENCHMARK_TESTS=ON) | 
1.4.4. Build only library#
ROCm-cmake has a minimum version requirement 0.8.0 for ROCm 5.3.
Minimum ROCm version support is 5.4.
By default, the project is configured as Release mode.
To build only library, run the following comomand :
CC=hipcc CXX=hipcc cmake -B<build_dir> . -DROCWMMA_BUILD_TESTS=OFF -DROCWMMA_BUILD_SAMPLES=OFF
Here are some other example project configurations:
| Configuration | Command | 
|---|---|
| Basic | CC=hipcc CXX=hipcc cmake -B<build_dir> . | 
| Targeting gfx908 | CC=hipcc CXX=hipcc cmake -B<build_dir> . -DAMDGPU_TARGETS=gfx908:xnack- | 
| Debug build | CC=hipcc CXX=hipcc cmake -B<build_dir> . -DCMAKE_BUILD_TYPE=Debug | 
| Build without rocBLAS(default on) | CC=hipcc CXX=hipcc cmake -B<build_dir> . -DROCWMMA_VALIDATE_WITH_ROCBLAS=OFF -DROCWMMA_BENCHMARK_WITH_ROCBLAS=OFF | 
After configuration, build with
cmake –build <build_dir> – -j
1.4.5. Build library + samples#
To build library and samples, run the following comomand :
CC=hipcc CXX=hipcc cmake -B<build_dir> . -DROCWMMA_BUILD_TESTS=OFF -DROCWMMA_BUILD_SAMPLES=ON
After configuration, build with
cmake –build <build_dir> – -j
The samples folder in <build_dir> contains executables in the table below.
| executable name | description | 
|---|---|
| simple_sgemm | a simple GEMM operation [D = alpha * (A x B) + beta * C] using rocWMMA API for single-precision floating point types | 
| simple_dgemm | a simple GEMM operation [D = alpha * (A x B) + beta * C] using rocWMMA API for double-precision floating point types | 
| simple_hgemm | a simple GEMM operation [D = alpha * (A x B) + beta * C] using rocWMMA API for half-precision floating point types | 
| perf_sgemm | an optimized GEMM operation [D = alpha * (A x B) + beta * C] using rocWMMA API for single-precision floating point types | 
| perf_dgemm | an optimized GEMM operation [D = alpha * (A x B) + beta * C] using rocWMMA API for double-precision floating point types | 
| perf_hgemm | an optimized GEMM operation [D = alpha * (A x B) + beta * C] using rocWMMA API for half-precision floating point types | 
| simple_sgemv | a simple GEMV operation [y = alpha * (A) * x + beta * y] using rocWMMA API for single-precision fp32 inputs and output | 
| simple_dgemv | a simple GEMV operation [y = alpha * (A) * x + beta * y] using rocWMMA API for double-precision fp64 inputs and output | 
| simple-dlrm | a simple DLRM operation using rocWMMA API | 
| hipRTC_gemm | a simple GEMM operation [D = alpha * (A x B) + beta * C] demonstrating runtime compilation (hipRTC) compatibility | 
1.4.6. Build library + tests#
rocWMMA has several test suites that can be built:
- DLRM tests 
- GEMM tests 
- Unit tests 
DLRM tests cover the dot product interactions between embeddings used in DLRM.
GEMM tests cover block-wise Generalized Matrix Multiplication (GEMM) implemented with rocWMMA.
Unit tests cover various aspects of rocWMMA API and internal functionality.
rocWMMA can build both validation and benchmark tests. The library uses CPU or rocBLAS methods for validation (where available) and benchmark comparisons based on the provided project option. By default, the project is linked against rocBLAS for validating results. Minimum ROCBLAS library version requirement is 2.39.0 for ROCm 4.3.0
To build library and tests, run the following command :
CC=hipcc CXX=hipcc cmake -B<build_dir> .
After configuration, build with
cmake –build <build_dir> – -j
The tests in <build_dir> contains executables in the table below.
| executable name | description | 
|---|---|
| dlrm/dlrm_dot_test-* | a DLRM implementation using rocWMMA API | 
| dlrm/dlrm_dot_lds_test-* | a DLRM implementation using rocWMMA API with LDS shared memory | 
| gemm/mma_sync_test-* | a simple GEMM operation [D = alpha * (A x B) + beta * C] using rocWMMA API | 
| gemm/mma_sync_multi_test-* | a modified GEMM operation, each wave targets a sub-grid of output blocks using rocWMMA API | 
| gemm/mma_sync_multi_ad_hoc_test-* | an adhoc version of mma_sync_multi_test-* | 
| gemm/mma_sync_multi_lds_test-* | a modified GEMM operation, each wave targets a sub-grid of output blocks using LDS memory and rocWMMA API and wave-level collaboration | 
| gemm/mma_sync_multi_lds_ad_hoc_test-* | an adhoc version of mma_sync_multi_lds_test-* | 
| gemm/mma_sync_coop_wg_test-* | a modified GEMM operation, each wave targets a sub-grid of output blocks using LDS memory and rocWMMA API and workgroup-level collaboration | 
| gemm/mma_sync_coop_wg_ad_hoc_test-* | an adhoc version of mma_sync_coop_wg_test-* | 
| gemm/barrier_test-* | a simple GEMM operation with wave synchronization | 
| unit/contamination_test | tests against contamination of pristine data for loads and stores | 
| unit/cross_lane_ops_test | tests cross-lane vector operations | 
| unit/fill_fragment_test | tests fill_fragment API function | 
| unit/io_shape_test | tests input/output shape meta data | 
| unit/io_traits_test | tests input/output logistical meta data | 
| unit/layout_test | tests accuracy of internal matrix layout patterns | 
| unit/load_store_matrix_sync_test | tests load_matrix_sync and store_matrix_sync API functions | 
| unit/load_store_matrix_coop_sync_test | tests load_matrix_coop_sync and store_matrix_coop_sync API functions | 
| unit/map_util_test | tests mapping utilities used in rocWMMA implementations | 
| unit/vector_iterator_test | tests internal vector storage iteration implementation | 
| unit/vector_test | tests internal vector storage implementation | 
*= validate: executables that compare outputs for correctness against reference sources such as CPU or rocBLAS calculations.
*= bench: executables that measure kernel execution speeds and may compare against those of rocBLAS references.
1.4.7. Build library + Tests + Assembly#
To build library and tests with assembly code generation, run the following command :
CC=hipcc CXX=hipcc cmake -B<build_dir> . -DROCWMMA_BUILD_ASSEMBLY=ON
After configuration, build with
cmake –build <build_dir> – -j
The assembly folder in <build_dir> contains assembly generation of test executables in the format [test_executable_name.s]