3. Programmer’s Guide#
3.1. Library Source Code Organization#
The rocWMMA code is split into four major parts:
The library directory contains all source code for the library.
The samples directory contains real-world use-cases of the rocWMMA API.
The test directory contains all validation, performance and unit tests of rocWMMA API.
Infrastructure
3.1.1. The library directory#
3.1.1.1. library/include/rocwmma/#
Contains C++ include files for the rocWMMA API. These files also contain Doxygen comments that document the API.
3.1.1.2. library/include/internal#
Internal include files for:
Type support
Input / output configuration, shapes and traits
Layout
Mapping Utility
Cross-lane operation utility
Vector blend utility
Packing and unpacking
Conversion and broadcasting
Load and store
Matrix multiply-accumulate
Cooperative load and store
Threadblock synchronization
Utility code
3.1.2. The samples directory#
3.1.2.1. samples/hipRTC_gemm.cpp#
sample code for calling Simple GEMM algorithm demonstration without LDS memory usage and no transpose, from within the hipRTC environment.
3.1.2.2. samples/simple_sgemv.cpp#
sample code for calling Simple matrix multiply-accumulate with a vector demonstration, without LDS and no transpose for single-precision floating point types.
3.1.2.3. samples/simple_dgemv.cpp#
sample code for calling Simple matrix multiply-accumulate with a vector demonstration, without LDS and no transpose for double-precision floating point types.
3.1.2.4. samples/simple_sgemm.cpp#
Sample code for calling Simple GEMM algorithm demonstration without LDS memory usage and no transpose for single-precision floating point types.
3.1.2.5. samples/simple_dgemm.cpp#
Sample code for calling Simple GEMM algorithm demonstration without LDS memory usage and no transpose for double-precision floating point types.
3.1.2.6. samples/simple_hgemm.cpp#
Sample code for calling Simple GEMM algorithm demonstration without LDS memory usage and no transpose for half-precision floating point types.
3.1.2.7. samples/perf_sgemm.cpp#
Sample code for calling the best performant multi-block GEMM algorithm demonstration with LDS memory, Macro Tile Collaboration, Data Re-use and Optimized pipeline for single-precision floating point types.
3.1.2.8. samples/perf_dgemm.cpp#
Sample code for calling the best performant multi-block GEMM algorithm demonstration with LDS memory, Macro Tile Collaboration, Data Re-use and Optimized pipeline for double-precision floating point types.
3.1.2.9. samples/perf_hgemm.cpp#
Sample code for calling the best performant multi-block GEMM algorithm demonstration with LDS memory, Macro Tile Collaboration, Data Re-use and Optimized pipeline for half-precision floating point types.
3.1.2.10. samples/simple_dlrm.cpp#
Sample code for calling Simple Deep Learning Recommendation Model (DLRM) for machine learning.
3.1.2.11. samples/common.hpp#
Common code used by all the above rocWMMA samples files.
3.1.3. The test directory#
3.1.3.1. test/bin#
Script to generate benchmark plots from the gtest output dumps of benchmark tests of rocWMMA.
3.1.3.2. test/dlrm#
Test code for various strategies of DLRM application. This test is used to validate dlrm functions using rocWMMA API.
3.1.3.3. test/gemm#
Test Code for various strategies of GEMM application. This test is used to validate and benchmark GEMM functions using rocWMMA API.
3.1.3.4. test/unit#
Test code for testing the basic functional units of rocWMMA library.
3.1.4. Infrastructure#
CMake is used to build and package rocWMMA. There are CMakeLists.txt files throughout the code.
Doxygen/Breathe/Sphinx/ReadTheDocs are used to produce documentation. Content for the documentation is from:
Doxygen comments in include files in the directory library/include
files in the directory docs/source.
Jenkins is used to automate Continuous Integration testing.
clang-format is used to format C++ code.