Supported ROCm and NVIDIA CUDA functions

Supported ROCm and NVIDIA CUDA functions#

Here is a list of the ROCm and NVIDIA CUDA functions supported by hipBLASLt:

ROCm
- AMD sparse MFMA matrix core support
- Mixed-precision computation support:
  - FP16 input/output, FP32 matrix core accumulate
  - BFLOAT16 input/output, FP32 matrix core accumulate
  - INT8 input/output, INT32 matrix core accumulate
  - INT8 input, FP16 output, INT32 matrix core accumulate
  - FP8 input, FP32 output, FP32 matrix core accumulate
  - BF8 input, FP32 output, FP32 matrix core accumulate
- Matrix pruning and compression functionalities
- Auto-tuning functionality (see hipsparseLtMatmulSearch())
- Batched sparse GEMM support:
  - Single sparse matrix/multiple dense matrices (broadcast)
  - Multiple sparse and dense matrices
  - Batched bias vector
- Activation function fuse in SpMM kernel support:
  - ReLU
  - ClippedReLU (ReLU with upper bound and threshold setting)
  - GeLU
  - GeLU scaling (implied enable GeLU)
  - Abs
  - LeakyReLU
  - Sigmoid
  - Tanh
- Ongoing feature development
  - Add support for mixed-precision computation:
    - FP8 input/output, FP32 matrix core accumulate
    - BF8 input/output, FP32 matrix core accumulate
    - Add kernel selection and generator, used to provide the appropriate solution for the specific problem
CUDA
- Support for CUDA cuSPARSELt v0.6.3