ROCm & CUDA supported functions

ROCm & CUDA supported functions#

  • ROCm

    • AMD sparse MFMA matrix core support

    • Mixed-precision computation support:

      • FP16 input/output, FP32 Matrix Core accumulate

      • BFLOAT16 input/output, FP32 Matrix Core accumulate

      • INT8 input/output, INT32 Matrix Core accumulate

      • INT8 input, FP16 output, INT32 Matrix Core accumulate

    • Matrix pruning and compression functionalities

    • Auto-tuning functionality (see hipsparseLtMatmulSearch())

    • Batched sparse Gemm support:

      • Single sparse matrix/Multiple dense matrices (Broadcast)

      • Multiple sparse and dense matrices

      • Batched bias vector

    • Activation function fuse in SpMM kernel support:

      • ReLU

      • ClippedReLU (ReLU with upper bound and threshold setting)

      • GeLU

      • GeLU Scaling (Implied enable GeLU)

      • Abs

      • LeakyReLU

      • Sigmoid

      • Tanh

    • On-going feature development

      • Add support for Mixed-precision computation

        • FP8 input/output, FP32 Matrix Core accumulate

        • BF8 input/output, FP32 Matrix Core accumulate

        • Add kernel selection and generator, used to provide the appropriate solution for the specific problem.

  • CUDA

    • Support cuSPARSELt v0.4