rocblas_gemm_ex Interface Reference#
hipfort_rocblas::rocblas_gemm_ex Interface Reference
BLAS EX API. More...
Public Member Functions | |
integer(kind(rocblas_status_success)) function | rocblas_gemm_ex_ (handle, transA, transB, m, n, k, alpha, a, a_type, lda, b, b_type, ldb, beta, c, c_type, ldc, d, d_type, ldd, compute_type, algo, solution_index, flags) |
Detailed Description
BLAS EX API.
gemm_ex performs one of the matrix-matrix operations
D = alpha*op( A )*op( B ) + beta*C,
where op( X ) is one of
op( X ) = X or op( X ) = X**T or op( X ) = X**H,
alpha and beta are scalars, and A, B, C, and D are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C and D are m by n matrices.
Supported types are as follows:
- rocblas_datatype_f64_r = a_type = b_type = c_type = d_type = compute_type
- rocblas_datatype_f32_r = a_type = b_type = c_type = d_type = compute_type
- rocblas_datatype_f16_r = a_type = b_type = c_type = d_type = compute_type
- rocblas_datatype_f16_r = a_type = b_type = c_type = d_type; rocblas_datatype_f32_r = compute_type
- rocblas_datatype_f16_r = a_type = b_type; rocblas_datatype_f32_r = c_type = d_type = compute_type
- rocblas_datatype_bf16_r = a_type = b_type = c_type = d_type; rocblas_datatype_f32_r = compute_type
- rocblas_datatype_bf16_r = a_type = b_type; rocblas_datatype_f32_r = c_type = d_type = compute_type
- rocblas_datatype_i8_r = a_type = b_type; rocblas_datatype_i32_r = c_type = d_type = compute_type
- rocblas_datatype_f32_c = a_type = b_type = c_type = d_type = compute_type
- rocblas_datatype_f64_c = a_type = b_type = c_type = d_type = compute_type
ROCm 4.2 supports two different versions of a = b = i8_r (in) and c = d = i32_r (out):
- Both versions are rocblas_datatype_i8_r = a_type = b_type; rocblas_datatype_i32_r = c_type = d_type = compute_type, in addition to a last flag param indicating packing input or not.
- Without setting the last param 'flags' (default=none), this is supported for gfx908 or
later GPUs only. Input a/b won't be packed into int8x4. So the following size restrictions and packing pseudo-code is not neccessary.- Set the last param 'flags' |= rocblas_gemm_flags_pack_int8x4. Input a/b would be packed
into int8x4, and this will impose some size restrictions on A or B (See below.) For GPUs before gfx908, only packed-int8 version is supported so this flag and packing is required, while gfx908 GPUs support both versions.
Below are restrictions for rocblas_datatype_i8_r = a_type = b_type; rocblas_datatype_i32_r = c_type = d_type = compute_type; flags |= rocblas_gemm_flags_pack_int8x4:
- k must be a multiple of 4
- lda must be a multiple of 4 if transA == rocblas_operation_transpose
- ldb must be a multiple of 4 if transB == rocblas_operation_none
- for transA == rocblas_operation_none or transB == rocblas_operation_transpose the matrices A and B must have each 4 consecutive values in the k dimension packed. This packing can be achieved with the following pseudo-code. The code assumes the original matrices are in A and B, and the packed matrices are A_packed and B_packed. The size of the A_packed matrix is the same as the size of the A matrix, and the size of the B_packed matrix is the same as the size of the B matrix.
if(transa == rocblas_operation_none)
{
int nb = 4;
for(int i_m = 0; i_m < m; i_m++)
{
for(int i_k = 0; i_k < k; i_k++)
{
a_packed[i_k % nb + (i_m + (i_k nb) * lda) * nb] = a[i_m + i_k * lda];
}
}
}
else
{
a_packed = a;
}
if(transb == rocblas_operation_transpose)
{
int nb = 4;
for(int i_n = 0; i_n < m; i_n++)
{
for(int i_k = 0; i_k < k; i_k++)
{
b_packed[i_k % nb + (i_n + (i_k nb) * ldb) * nb] = b[i_n + i_k * ldb];
}
}
}
else
{
b_packed = b;
}
- Parameters
-
[in] handle [rocblas_handle] handle to the rocblas library context queue. [in] transA [rocblas_operation] specifies the form of op( A ). [in] transB [rocblas_operation] specifies the form of op( B ). [in] m [rocblas_int] matrix dimension m. [in] n [rocblas_int] matrix dimension n. [in] k [rocblas_int] matrix dimension k. [in] alpha [ void *] device pointer or host pointer specifying the scalar alpha. Same datatype as compute_type. [in] a [void *] device pointer storing matrix A. [in] a_type [rocblas_datatype] specifies the datatype of matrix A. [in] lda [rocblas_int] specifies the leading dimension of A. [in] b [void *] device pointer storing matrix B. [in] b_type [rocblas_datatype] specifies the datatype of matrix B. [in] ldb [rocblas_int] specifies the leading dimension of B. [in] beta [ void *] device pointer or host pointer specifying the scalar beta. Same datatype as compute_type. [in] c [void *] device pointer storing matrix C. [in] c_type [rocblas_datatype] specifies the datatype of matrix C. [in] ldc [rocblas_int] specifies the leading dimension of C. [out] d [void *] device pointer storing matrix D. [in] d_type [rocblas_datatype] specifies the datatype of matrix D. [in] ldd [rocblas_int] specifies the leading dimension of D. [in] compute_type [rocblas_datatype] specifies the datatype of computation. [in] algo [rocblas_gemm_algo] enumerant specifying the algorithm type. [in] solution_index [int32_t] reserved for future use. [in] flags [uint32_t] optional gemm flags.
Member Function/Subroutine Documentation
◆ rocblas_gemm_ex_()
integer(kind(rocblas_status_success)) function hipfort_rocblas::rocblas_gemm_ex::rocblas_gemm_ex_ | ( | type(c_ptr), value | handle, |
integer(kind(rocblas_operation_none)), value | transA, | ||
integer(kind(rocblas_operation_none)), value | transB, | ||
integer(c_int), value | m, | ||
integer(c_int), value | n, | ||
integer(c_int), value | k, | ||
type(c_ptr), value | alpha, | ||
type(c_ptr), value | a, | ||
integer(kind(rocblas_datatype_f16_r)), value | a_type, | ||
integer(c_int), value | lda, | ||
type(c_ptr), value | b, | ||
integer(kind(rocblas_datatype_f16_r)), value | b_type, | ||
integer(c_int), value | ldb, | ||
type(c_ptr), value | beta, | ||
type(c_ptr), value | c, | ||
integer(kind(rocblas_datatype_f16_r)), value | c_type, | ||
integer(c_int), value | ldc, | ||
type(c_ptr), value | d, | ||
integer(kind(rocblas_datatype_f16_r)), value | d_type, | ||
integer(c_int), value | ldd, | ||
integer(kind(rocblas_datatype_f16_r)), value | compute_type, | ||
integer(kind(rocblas_gemm_algo_standard)), value | algo, | ||
integer(c_int32_t), value | solution_index, | ||
integer(c_int), value | flags | ||
) |
The documentation for this interface was generated from the following file: