ROCm & CUDA supported functions#
- ROCm - AMD sparse MFMA matrix core support 
- Mixed-precision computation support: - FP16 input/output, FP32 Matrix Core accumulate 
- BFLOAT16 input/output, FP32 Matrix Core accumulate 
- INT8 input/output, INT32 Matrix Core accumulate 
- INT8 input, FP16 output, INT32 Matrix Core accumulate 
 
- Matrix pruning and compression functionalities 
- Auto-tuning functionality (see - hipsparseLtMatmulSearch())
- Batched sparse Gemm support: - Single sparse matrix/Multiple dense matrices (Broadcast) 
- Multiple sparse and dense matrices 
- Batched bias vector 
 
- Activation function fuse in SpMM kernel support: - ReLU 
- ClippedReLU (ReLU with upper bound and threshold setting) 
- GeLU 
- GeLU Scaling (Implied enable GeLU) 
- Abs 
- LeakyReLU 
- Sigmoid 
- Tanh 
 
- On-going feature development - Add support for Mixed-precision computation - FP8 input/output, FP32 Matrix Core accumulate 
- BF8 input/output, FP32 Matrix Core accumulate 
- Add kernel selection and generator, used to provide the appropriate solution for the specific problem. 
 
 
 
- CUDA - Support cuSPARSELt v0.4