Programmer’s guide#
This document provides insight into the library source code organization, design implementation details, helpful information for new development, and testing and benchmarking details.
Library source code organization#
The rocWMMA code is split into four major parts:
The
library
directory contains the library source code.The
samples
directory contains real-world use-cases of the rocWMMA API.The
test
directory contains validation tests for rocWMMA API.Infrastructure
library
directory#
The library
directory contains the following include files:
library/include/rocwmma/
: C++ include files for the rocWMMA API. These files also contain Doxygen comments that document the API.library/include/internal
: Internal include files for:Type support
Input and output configuration, shapes and traits
Layout
Mapping Utility
Cross-lane operation utility
Vector blend utility
Packing and unpacking
Conversion and broadcasting
Load and store
Matrix multiply-accumulate
Cooperative load and store
Threadblock synchronization
Utility code
samples
directory#
The samples
directory contains the sample codes for the following use cases:
samples/hipRTC_gemm.cpp
: For calling simple General Matrix Multiply (GEMM) algorithm demonstration without LDS memory usage and no transpose, from within the hipRTC environmentsamples/simple_sgemv.cpp
: For calling simple matrix multiply-accumulate with a vector demonstration, without LDS and no transpose for single-precision floating point typessamples/simple_dgemv.cpp
: For calling simple matrix multiply-accumulate with a vector demonstration, without LDS and no transpose for double-precision floating point typessamples/simple_sgemm.cpp
: For calling simple GEMM algorithm demonstration without LDS memory usage and no transpose for single-precision floating point typessamples/simple_dgemm.cpp
: For calling simple GEMM algorithm demonstration without LDS memory usage and no transpose for double-precision floating point typessamples/simple_hgemm.cpp
: For calling simple GEMM algorithm demonstration without LDS memory usage and no transpose for half-precision floating point typessamples/perf_sgemm.cpp
: For calling the best performing multi-block GEMM algorithm demonstration with LDS memory, macro tile collaboration, data reuse and optimized pipeline for single-precision floating point typessamples/perf_dgemm.cpp
: For calling the best performing multi-block GEMM algorithm demonstration with LDS memory, macro tile collaboration, data reuse and optimized pipeline for double-precision floating point typessamples/perf_hgemm.cpp
: For calling the best performant multi-block GEMM algorithm demonstration with LDS memory, macro tile collaboration, data reuse and optimized pipeline for half-precision floating point typessamples/simple_dlrm.cpp
: For calling simple Deep Learning Recommendation Model (DLRM) for machine learningsamples/common.hpp
: Common code used by all the above rocWMMA samples files
test
directory#
The test
directory contains the test codes for testing the following functionalities:
test/bin
: To generate benchmark plots from thegtest
output dumps of rocWMMA’s benchmark tests.test/dlrm
: For various strategies of DLRM application. This test is used to validate DLRM functions using rocWMMA API.test/gemm
: For various strategies of GEMM application. This test is used to validate and benchmark GEMM functions using rocWMMA API.test/unit
: For testing the basic functional units of rocWMMA library.
Infrastructure#
CMake is used to build and package rocWMMA. There are
CMakeLists.txt
files throughout the code.Doxygen/Breathe/Sphinx/ReadTheDocs
are used to produce documentation. The API documentation is generated using:Doxygen comments in include files in the directory
library/include
files in the directory
docs/source
.
Jenkins is used to automate Continuous Integration (CI) testing.
clang-format
is used to format C++ code.