Supported ROCm and NVIDIA CUDA functions#
Here is a list of the ROCm and NVIDIA CUDA functions supported by hipBLASLt:
ROCm
AMD sparse MFMA matrix core support
Mixed-precision computation support:
FP16input/output,FP32matrix core accumulateBFLOAT16input/output,FP32matrix core accumulateINT8input/output,INT32matrix core accumulateINT8input,FP16output,INT32matrix core accumulateFP8input,FP32output,FP32matrix core accumulateBF8input,FP32output,FP32matrix core accumulate
Matrix pruning and compression functionalities
Auto-tuning functionality (see
hipsparseLtMatmulSearch())Batched sparse GEMM support:
Single sparse matrix/multiple dense matrices (broadcast)
Multiple sparse and dense matrices
Batched bias vector
Activation function fuse in SpMM kernel support:
ReLU
ClippedReLU (ReLU with upper bound and threshold setting)
GeLU
GeLU scaling (implied enable GeLU)
Abs
LeakyReLU
Sigmoid
Tanh
Ongoing feature development
Add support for mixed-precision computation:
FP8input/output,FP32matrix core accumulateBF8input/output,FP32matrix core accumulateAdd kernel selection and generator, used to provide the appropriate solution for the specific problem
CUDA
Support for CUDA cuSPARSELt v0.6.3