Programmer’s guide#
This document provides insight into the library source code organization, design implementation details, helpful information for new development, and testing and benchmarking details.
Library source code organization#
The rocWMMA code is split into four major parts:
- The - librarydirectory contains the library source code.
- The - samplesdirectory contains real-world use-cases of the rocWMMA API.
- The - testdirectory contains validation tests for rocWMMA API.
- Infrastructure 
library directory#
The library directory contains the following include files:
- library/include/rocwmma/: C++ include files for the rocWMMA API. These files also contain Doxygen comments that document the API.
- library/include/internal: Internal include files for:- Type support 
- Input and output configuration, shapes and traits 
- Layout 
- Mapping Utility 
- Cross-lane operation utility 
- Vector blend utility 
- Packing and unpacking 
- Conversion and broadcasting 
- Load and store 
- Matrix multiply-accumulate 
- Cooperative load and store 
- Threadblock synchronization 
- Utility code 
 
samples directory#
The samples directory contains the sample codes for the following use cases:
- samples/hipRTC_gemm.cpp: For calling simple General Matrix Multiply (GEMM) algorithm demonstration without LDS memory usage and no transpose, from within the hipRTC environment
- samples/simple_sgemv.cpp: For calling simple matrix multiply-accumulate with a vector demonstration, without LDS and no transpose for single-precision floating point types
- samples/simple_dgemv.cpp: For calling simple matrix multiply-accumulate with a vector demonstration, without LDS and no transpose for double-precision floating point types
- samples/simple_sgemm.cpp: For calling simple GEMM algorithm demonstration without LDS memory usage and no transpose for single-precision floating point types
- samples/simple_dgemm.cpp: For calling simple GEMM algorithm demonstration without LDS memory usage and no transpose for double-precision floating point types
- samples/simple_hgemm.cpp: For calling simple GEMM algorithm demonstration without LDS memory usage and no transpose for half-precision floating point types
- samples/perf_sgemm.cpp: For calling the best performing multi-block GEMM algorithm demonstration with LDS memory, macro tile collaboration, data reuse and optimized pipeline for single-precision floating point types
- samples/perf_dgemm.cpp: For calling the best performing multi-block GEMM algorithm demonstration with LDS memory, macro tile collaboration, data reuse and optimized pipeline for double-precision floating point types
- samples/perf_hgemm.cpp: For calling the best performant multi-block GEMM algorithm demonstration with LDS memory, macro tile collaboration, data reuse and optimized pipeline for half-precision floating point types
- samples/simple_dlrm.cpp: For calling simple Deep Learning Recommendation Model (DLRM) for machine learning
- samples/common.hpp: Common code used by all the above rocWMMA samples files
test directory#
The test directory contains the test codes for testing the following functionalities:
- test/bin: To generate benchmark plots from the- gtestoutput dumps of rocWMMA’s benchmark tests.
- test/dlrm: For various strategies of DLRM application. This test is used to validate DLRM functions using rocWMMA API.
- test/gemm: For various strategies of GEMM application. This test is used to validate and benchmark GEMM functions using rocWMMA API.
- test/unit: For testing the basic functional units of rocWMMA library.
Infrastructure#
- CMake is used to build and package rocWMMA. There are - CMakeLists.txtfiles throughout the code.
- Doxygen/Breathe/Sphinx/ReadTheDocsare used to produce documentation. The API documentation is generated using:- Doxygen comments in include files in the directory - library/include
- files in the directory - docs/source.
 
- Jenkins is used to automate Continuous Integration (CI) testing. 
- clang-formatis used to format C++ code.