Programmer’s Guide

3. Programmer’s Guide#

3.1. Library Source Code Organization#

The rocWMMA code is split into four major parts:

The library directory contains all source code for the library.
The samples directory contains real-world use-cases of the rocWMMA API.
The test directory contains all validation, performance and unit tests of rocWMMA API.
Infrastructure

3.1.1. The library directory#

3.1.1.1. library/include/rocwmma/#

Contains C++ include files for the rocWMMA API. These files also contain Doxygen comments that document the API.

3.1.1.2. library/include/internal#

Internal include files for:

Type support
Input / output configuration, shapes and traits
Layout
Mapping Utility
Cross-lane operation utility
Vector blend utility
Packing and unpacking
Conversion and broadcasting
Load and store
Matrix multiply-accumulate
Cooperative load and store
Threadblock synchronization
Utility code

3.1.2. The samples directory#

3.1.2.1. samples/hipRTC_gemm.cpp#

sample code for calling Simple GEMM algorithm demonstration without LDS memory usage and no transpose, from within the hipRTC environment.

3.1.2.2. samples/simple_sgemv.cpp#

sample code for calling Simple matrix multiply-accumulate with a vector demonstration, without LDS and no transpose for single-precision floating point types.

3.1.2.3. samples/simple_dgemv.cpp#

sample code for calling Simple matrix multiply-accumulate with a vector demonstration, without LDS and no transpose for double-precision floating point types.

3.1.2.4. samples/simple_sgemm.cpp#

Sample code for calling Simple GEMM algorithm demonstration without LDS memory usage and no transpose for single-precision floating point types.

3.1.2.5. samples/simple_dgemm.cpp#

Sample code for calling Simple GEMM algorithm demonstration without LDS memory usage and no transpose for double-precision floating point types.

3.1.2.6. samples/simple_hgemm.cpp#

Sample code for calling Simple GEMM algorithm demonstration without LDS memory usage and no transpose for half-precision floating point types.

3.1.2.7. samples/perf_sgemm.cpp#

Sample code for calling the best performant multi-block GEMM algorithm demonstration with LDS memory, Macro Tile Collaboration, Data Re-use and Optimized pipeline for single-precision floating point types.

3.1.2.8. samples/perf_dgemm.cpp#

Sample code for calling the best performant multi-block GEMM algorithm demonstration with LDS memory, Macro Tile Collaboration, Data Re-use and Optimized pipeline for double-precision floating point types.

3.1.2.9. samples/perf_hgemm.cpp#

Sample code for calling the best performant multi-block GEMM algorithm demonstration with LDS memory, Macro Tile Collaboration, Data Re-use and Optimized pipeline for half-precision floating point types.

3.1.2.10. samples/simple_dlrm.cpp#

Sample code for calling Simple Deep Learning Recommendation Model (DLRM) for machine learning.

3.1.2.11. samples/common.hpp#

Common code used by all the above rocWMMA samples files.

3.1.3. The test directory#

3.1.3.1. test/bin#

Script to generate benchmark plots from the gtest output dumps of benchmark tests of rocWMMA.

3.1.3.2. test/dlrm#

Test code for various strategies of DLRM application. This test is used to validate dlrm functions using rocWMMA API.

3.1.3.3. test/gemm#

Test Code for various strategies of GEMM application. This test is used to validate and benchmark GEMM functions using rocWMMA API.

3.1.3.4. test/unit#

Test code for testing the basic functional units of rocWMMA library.

3.1.4. Infrastructure#

CMake is used to build and package rocWMMA. There are CMakeLists.txt files throughout the code.
Doxygen/Breathe/Sphinx/ReadTheDocs are used to produce documentation. Content for the documentation is from:
- Doxygen comments in include files in the directory library/include
- files in the directory docs/source.
Jenkins is used to automate Continuous Integration testing.
clang-format is used to format C++ code.