Programmer’s guide#
This document provides insight into the library source code organization, design implementation details, helpful information for new development, and testing and benchmarking details.
Library source code organization#
The rocWMMA code is split into four major parts:
The
librarydirectory contains the library source code.The
samplesdirectory contains real-world use-cases of the rocWMMA API.The
testdirectory contains validation tests for rocWMMA API.Infrastructure
library directory#
The library directory contains the following include files:
library/include/rocwmma/: C++ include files for the rocWMMA API. These files also contain Doxygen comments that document the API.library/include/internal: Internal include files for:Type support
Input and output configuration, shapes and traits
Layout
Mapping Utility
Cross-lane operation utility
Vector blend utility
Packing and unpacking
Conversion and broadcasting
Load and store
Matrix multiply-accumulate
Cooperative load and store
Threadblock synchronization
Utility code
samples directory#
The samples directory contains the sample codes for the following use cases:
samples/hipRTC_gemm.cpp: For calling simple General Matrix Multiply (GEMM) algorithm demonstration without LDS memory usage and no transpose, from within the hipRTC environmentsamples/simple_sgemv.cpp: For calling simple matrix multiply-accumulate with a vector demonstration, without LDS and no transpose for single-precision floating point typessamples/simple_dgemv.cpp: For calling simple matrix multiply-accumulate with a vector demonstration, without LDS and no transpose for double-precision floating point typessamples/simple_sgemm.cpp: For calling simple GEMM algorithm demonstration without LDS memory usage and no transpose for single-precision floating point typessamples/simple_dgemm.cpp: For calling simple GEMM algorithm demonstration without LDS memory usage and no transpose for double-precision floating point typessamples/simple_hgemm.cpp: For calling simple GEMM algorithm demonstration without LDS memory usage and no transpose for half-precision floating point typessamples/perf_sgemm.cpp: For calling the best performing multi-block GEMM algorithm demonstration with LDS memory, macro tile collaboration, data reuse and optimized pipeline for single-precision floating point typessamples/perf_dgemm.cpp: For calling the best performing multi-block GEMM algorithm demonstration with LDS memory, macro tile collaboration, data reuse and optimized pipeline for double-precision floating point typessamples/perf_hgemm.cpp: For calling the best performant multi-block GEMM algorithm demonstration with LDS memory, macro tile collaboration, data reuse and optimized pipeline for half-precision floating point typessamples/simple_dlrm.cpp: For calling simple Deep Learning Recommendation Model (DLRM) for machine learningsamples/common.hpp: Common code used by all the above rocWMMA samples files
test directory#
The test directory contains the test codes for testing the following functionalities:
test/bin: To generate benchmark plots from thegtestoutput dumps of rocWMMA’s benchmark tests.test/dlrm: For various strategies of DLRM application. This test is used to validate DLRM functions using rocWMMA API.test/gemm: For various strategies of GEMM application. This test is used to validate and benchmark GEMM functions using rocWMMA API.test/unit: For testing the basic functional units of rocWMMA library.
Infrastructure#
CMake is used to build and package rocWMMA. There are
CMakeLists.txtfiles throughout the code.Doxygen/Breathe/Sphinx/ReadTheDocsare used to produce documentation. The API documentation is generated using:Doxygen comments in include files in the directory
library/includefiles in the directory
docs/source.
Jenkins is used to automate Continuous Integration (CI) testing.
clang-formatis used to format C++ code.