hipBLASLtExt operation API reference#
hipBLASLt has the following extension operation APIs that are independent of GEMM operations. These extensions support the following:
hipblasltExtSoftmaxSoftmax for 2D tensor. It performs softmax on the second dimension of input tensor and assumes the input is contiguous on the second dimension. For sample code, see Softmax for a 2D tensor.
hipblasltExtLayerNormConverts a 2D tensor using LayerNorm to generate a new 2D normalized tensor. This is an independent function used to call and get the result. For sample code, see Converting a 2D tensor using LayerNorm.
hipblasltExtAMaxDetermines the absolute maximum value of a 2D tensor. This is an independent function used to call and get the result. For sample code, see Absolute maximum value of a 2D Tensor.
These APIs are explained in detail below.
hipblasltExtSoftmax()#
-
hipblasStatus_t hipblasltExtSoftmax(hipDataType datatype, uint32_t m, uint32_t n, uint32_t dim, void *output, void *input, hipStream_t stream)#
Perform softmax on a given tensor.
This function computes softmax on a given 2D-tensor along a specified dimension.
- Parameters:
datatype – [in] Data type of the input and output tensors. Only supports HIP_R_32F.
m – [in] The first dimension of the input and output tensors.
n – [in] The second dimension of the input and output tensors. Only supports values less than or equal to 256.
dim – [in] Specified dimension to perform softmax on. Currently 1 is the only valid value.
input – [in] Input tensor buffer.
stream – [in] The HIP stream where all the GPU work will be submitted.
output – [out] Output tensor buffer.
- Return values:
HIPBLAS_STATUS_SUCCESS – If it runs successfully.
HIPBLAS_STATUS_INVALID_VALUE – If
nis greater than 256.HIPBLAS_STATUS_NOT_SUPPORTED – If
dimis not 1 ordatatypeis not HIP_R_32F.
hipblasltExtLayerNorm()#
-
hipblasStatus_t hipblasltExtLayerNorm(hipDataType datatype, void *output, void *mean, void *invvar, void *input, uint32_t m, uint32_t n, float eps, void *gamma, void *beta, hipStream_t stream)#
Perform 2-D layernorm on a source input tensor, with the result placed in the output tensor.
This function computes layernorm on a given 2D-tensor.
- Parameters:
datatype – [in] Data type of the input and output tensors. Only supports HIP_R_32F.
output – [out] Output tensor buffer. Can’t be a nullptr.
mean – [out] Tensor buffer. Can’t be a nullptr.
invvar – [out] Tensor buffer. 1 / sqrt(std). Can’t be a nullptr.
input – [in] Tensor buffer. Can’t be a nullptr.
m – [in] The first dimension of the input and output tensors.
n – [in] The second dimension of the input and output tensors.
eps – [in] For sqrt to avoid inf value.
gamma – [in] Tensor buffer. nullptr means the calculation doesn’t involve gamma.
beta – [in] Tensor buffer. nullptr means the calculation doesn’t involve beta.
stream – [in] The HIP stream where all the GPU work will be submitted.
- Return values:
HIPBLAS_STATUS_SUCCESS – If it runs successfully.
HIPBLAS_STATUS_INVALID_VALUE – If
mis greater than 4096.HIPBLAS_STATUS_NOT_SUPPORTED – If
datatypeis not HIP_R_32F.
hipblasltExtAMax()#
-
hipblasStatus_t hipblasltExtAMax(const hipDataType datatype, const hipDataType outDatatype, void *output, void *input, uint32_t m, uint32_t n, hipStream_t stream)#
Perform absmax on a given 2-D tensor and output one absmax(tensor) value.
This function computes amax on a given 2D-tensor.
- Parameters:
datatype – [in] Data type of the input tensor. Only supports HIP_R_32F and HIP_R_16F.
outDatatype – [in] Data type of the output tensor. Only supports HIP_R_32F and HIP_R_16F.
output – [out] Amax tensor buffer. Can’t be a nullptr.
input – [in] 2-D tensor buffer. Can’t be a nullptr.
m – [in] The first dimension of the input and output tensors.
n – [in] The second dimension of the input and output tensors.
stream – [in] The HIP stream where all the GPU work will be submitted.
- Return values:
HIPBLAS_STATUS_SUCCESS – If it runs successfully.
HIPBLAS_STATUS_INVALID_VALUE – If
mor n is 0, or input or output is nullptr.HIPBLAS_STATUS_NOT_SUPPORTED – If
datatypeis not HIP_R_32F or HIP_R_16F.