This page contains proposed changes for a future release of ROCm. Read the latest Linux release of ROCm documentation for your production environments.

hipBLASLtExt operation API reference#

hipBLASLt has the following extension operation APIs that are independent to gemm operations. These extensions support:

  1. hipblasltExtSoftmax

    Softmax for 2D-tensor. Currently, it performs softmax on the second dimension of input tensor and assumes the input to be contigious on the second dimension. For sample code, refer to client_extop_softmax.cpp.

  2. hipblasltExtLayerNorm

    Converts a 2D tensor using LayerNorm to generate a new 2D normalized tensor. it is an independent function used to just call and get result. For sample code, refer to sample_hipblaslt_ext_op_layernorm.cpp.

  3. hipblasltExtAMax

    Abs maximum value of a 2D tensor. it is an independent function used to just call and get result. For sample code, refer to sample_hipblaslt_ext_op_amax.cpp.

  4. hipblasltExtAMaxWithScale

    Abs maximum value and scaled output of a 2D tensor. it is an independent function used to just call and get result. For sample code, refer to sample_hipblaslt_ext_op_amax_with_scale.cpp.

These APIs are explained in detail below.

hipblasltExtSoftmax()#

hipblasStatus_t hipblasltExtSoftmax(hipDataType datatype, uint32_t m, uint32_t n, uint32_t dim, void *output, void *input, hipStream_t stream)#

Perform softmax on given tensor.

This function computes softmax on given 2D-tensor along specified dimension.

Parameters:
  • datatype[in] Datatype of input/output tensor, currently support HIP_R_32F only.

  • m[in] The first dimension of input/output tensor.

  • n[in] The second dimension of input/output tensor. Currently only values less than or equal to 256 are supported.

  • dim[in] Specified dimension to perform softmax on. Currently 1 is the only valid value.

  • input[in] input tensor buffer.

  • stream[in] The HIP stream where all the GPU work will be submitted.

  • output[out] output tensor buffer.

Return values:
  • HIPBLAS_STATUS_SUCCESS – If it runs successfully.

  • HIPBLAS_STATUS_INVALID_VALUE – If n is greater than 256.

  • HIPBLAS_STATUS_NOT_SUPPORTED – If dim is not 1 or datatype is not HIP_R_32F.

hipblasltExtLayerNorm()#

hipblasStatus_t hipblasltExtLayerNorm(hipDataType datatype, void *output, void *mean, void *invvar, void *input, uint32_t m, uint32_t n, float eps, void *gamma, void *beta, hipStream_t stream)#

Perform 2-D layernorm on with source input tensor and result output tensor.

This function computes layernorm on given 2D-tensor.

Parameters:
  • datatype[in] Datatype of input/output tensor, currently support HIP_R_32F only.

  • output[out] output tensor buffer. can’t be nullptr.

  • mean[out] tensor buffer. can’t be nullptr.

  • invvar[out] tensor buffer. 1 / sqrt(std). can’t be nullptr.

  • input[in] tensor buffer. can’t be nullptr.

  • m[in] The first dimension of input/output tensor.

  • n[in] The second dimension of input/output tensor.

  • eps[in] for sqrt to avoid inf value.

  • gamma[in] tensor buffer. nullptr means calculation doesn’t involve gamma.

  • beta[in] tensor buffer. nullptr means calculation doesn’t involve beta.

  • stream[in] The HIP stream where all the GPU work will be submitted.

Return values:
  • HIPBLAS_STATUS_SUCCESS – If it runs successfully.

  • HIPBLAS_STATUS_INVALID_VALUE – If m is greater than 4096.

  • HIPBLAS_STATUS_NOT_SUPPORTED – if datatype is not HIP_R_32F.

hipblasltExtAMax()#

hipblasStatus_t hipblasltExtAMax(const hipDataType datatype, const hipDataType outDatatype, void *output, void *input, uint32_t m, uint32_t n, hipStream_t stream)#

Perform absmax on given 2-D tensor and output one value absmax(tensor) value.

This function computes amax on given 2D-tensor.

Parameters:
  • datatype[in] Datatype of input tensor, currently support HIP_R_32F and HIP_R_16F only.

  • outDatatype[in] Datatype of output tensor, currently support HIP_R_32F and HIP_R_16F only.

  • output[out] Amax tensor buffer. can’t be nullptr.

  • input[in] 2-D tensor buffer. can’t be nullptr.

  • m[in] The first dimension of input/output tensor.

  • n[in] The second dimension of input/output tensor.

  • stream[in] The HIP stream where all the GPU work will be submitted.

Return values:
  • HIPBLAS_STATUS_SUCCESS – If it runs successfully.

  • HIPBLAS_STATUS_INVALID_VALUE – If m or n is 0, or input or output is nullptr.

  • HIPBLAS_STATUS_NOT_SUPPORTED – If datatype is not (HIP_R_32F or HIP_R_16F).

hipblasltExtAMaxWithScale()#

hipblasStatus_t hipblasltExtAMaxWithScale(const hipDataType datatype, const hipDataType outDatatype, const hipDataType scaleDatatype, void *output, void *outputD, void *input, void *inputScale, uint32_t m, uint32_t n, hipStream_t stream)#

Perform absmax and scaling on given 2-D tensor. Generate one absmax value and scaled 2-D tensor output.

This function computes amax and scaling on given 2D-tensor.

Parameters:
  • datatype[in] Datatype of input tensor, currently support HIP_R_32F only.

  • outDatatype[in] Datatype of output tensor, currently support HIP_R_32F and HIP_R_16F only.

  • scaleDatatype[in] Datatype of outputD tensor, currently support HIP_R_8F_E4M3_FNUZ and HIP_R_8F_E5M2_FNUZ only.

  • output[out] Amax tensor buffer. can’t be nullptr.

  • outputD[out] scaled 2-D tensor buffer. can’t be nullptr.

  • input[in] 2-D tensor buffer. can’t be nullptr.

  • inputScale[in] 1-D tensor buffer. can’t be nullptr. only support float.

  • m[in] The first dimension of input/output tensor.

  • n[in] The second dimension of input/output tensor.

  • stream[in] The HIP stream where all the GPU work will be submitted.

Return values:
  • HIPBLAS_STATUS_SUCCESS – If it runs successfully.

  • HIPBLAS_STATUS_INVALID_VALUE – If m or n is 0, or input, inputScale, output, or outputD is nullptr.

  • HIPBLAS_STATUS_NOT_SUPPORTED – If datatype is not HIP_R_32F, or scaleDatatype is not HIP_R_8F_E4M3_FNUZ or HIP_R_8F_E5M2_FNUZ.