rocBLAS beta features

rocBLAS beta features#

To allow for future growth and changes, the features in this section are not subject to the same level of backwards compatibility and support as the normal rocBLAS API. These features are subject to change and removal in future release of rocBLAS.

To use the following beta API features, define ROCBLAS_BETA_FEATURES_API before including rocblas.h.

rocblas_gemm_ex_get_solutions + batched, strided_batched#

rocblas_status rocblas_gemm_ex_get_solutions(rocblas_handle handle, rocblas_operation transA, rocblas_operation transB, rocblas_int m, rocblas_int n, rocblas_int k, const void *alpha, const void *a, rocblas_datatype a_type, rocblas_int lda, const void *b, rocblas_datatype b_type, rocblas_int ldb, const void *beta, const void *c, rocblas_datatype c_type, rocblas_int ldc, void *d, rocblas_datatype d_type, rocblas_int ldd, rocblas_datatype compute_type, rocblas_gemm_algo algo, uint32_t flags, rocblas_int *list_array, rocblas_int *list_size)#

BLAS BETA API

gemm_ex_get_solutions gets the indices for all the solutions that can solve a corresponding call to gemm_ex. Which solution is used by gemm_ex is controlled by the solution_index parameter.

All parameters correspond to gemm_ex except for list_array and list_size, which are used as input and output for getting the solution indices. If list_array is NULL, list_size is an output and will be filled with the number of solutions that can solve the GEMM. If list_array is not NULL, then it must be pointing to an array with at least list_size elements and will be filled with the solution indices that can solve the GEMM: the number of elements filled is min(list_size, # of solutions).

Parameters:

handle – [in] [rocblas_handle] handle to the rocblas library context queue.
transA – [in] [rocblas_operation] specifies the form of op( A ).
transB – [in] [rocblas_operation] specifies the form of op( B ).
m – [in] [rocblas_int] matrix dimension m.
n – [in] [rocblas_int] matrix dimension n.
k – [in] [rocblas_int] matrix dimension k.
alpha – [in] [const void *] device pointer or host pointer specifying the scalar alpha. Same datatype as compute_type.
a – [in] [void *] device pointer storing matrix A.
a_type – [in] [rocblas_datatype] specifies the datatype of matrix A.
lda – [in] [rocblas_int] specifies the leading dimension of A.
b – [in] [void *] device pointer storing matrix B.
b_type – [in] [rocblas_datatype] specifies the datatype of matrix B.
ldb – [in] [rocblas_int] specifies the leading dimension of B.
beta – [in] [const void *] device pointer or host pointer specifying the scalar beta. Same datatype as compute_type.
c – [in] [void *] device pointer storing matrix C.
c_type – [in] [rocblas_datatype] specifies the datatype of matrix C.
ldc – [in] [rocblas_int] specifies the leading dimension of C.
d – [out] [void *] device pointer storing matrix D. If d and c pointers are to the same matrix then d_type must equal c_type and ldd must equal ldc or the respective invalid status will be returned.
d_type – [in] [rocblas_datatype] specifies the datatype of matrix D.
ldd – [in] [rocblas_int] specifies the leading dimension of D.
compute_type – [in] [rocblas_datatype] specifies the datatype of computation.
algo – [in] [rocblas_gemm_algo] enumerant specifying the algorithm type.
flags – [in] [uint32_t] optional gemm flags.
list_array – [out] [rocblas_int *] output array for solution indices or NULL if getting number of solutions
list_size – [inout] [rocblas_int *] size of list_array if getting solution indices or output with number of solutions if list_array is NULL

rocblas_status rocblas_gemm_ex_get_solutions_by_type(rocblas_handle handle, rocblas_datatype input_type, rocblas_datatype output_type, rocblas_datatype compute_type, uint32_t flags, rocblas_int *list_array, rocblas_int *list_size)#

BLAS BETA API

rocblas_gemm_ex_get_solutions_by_type gets the indices for all the solutions that match the given types for gemm_ex. Which solution is used by gemm_ex is controlled by the solution_index parameter.

If list_array is NULL, list_size is an output and will be filled with the number of solutions that can solve the GEMM. If list_array is not NULL, then it must be pointing to an array with at least list_size elements and will be filled with the solution indices that can solve the GEMM: the number of elements filled is min(list_size, # of solutions).

Parameters:

handle – [in] [rocblas_handle] handle to the rocblas library context queue.
input_type – [in] [rocblas_datatype] specifies the datatype of matrix A.
output_type – [in] [rocblas_datatype] specifies the datatype of matrix D.
compute_type – [in] [rocblas_datatype] specifies the datatype of computation.
flags – [in] [uint32_t] optional gemm flags.
list_array – [out] [rocblas_int *] output array for solution indices or NULL if getting number of solutions
list_size – [inout] [rocblas_int *] size of list_array if getting solution indices or output with number of solutions if list_array is NULL

rocblas_status rocblas_gemm_batched_ex_get_solutions(rocblas_handle handle, rocblas_operation transA, rocblas_operation transB, rocblas_int m, rocblas_int n, rocblas_int k, const void *alpha, const void *a, rocblas_datatype a_type, rocblas_int lda, const void *b, rocblas_datatype b_type, rocblas_int ldb, const void *beta, const void *c, rocblas_datatype c_type, rocblas_int ldc, void *d, rocblas_datatype d_type, rocblas_int ldd, rocblas_int batch_count, rocblas_datatype compute_type, rocblas_gemm_algo algo, uint32_t flags, rocblas_int *list_array, rocblas_int *list_size)#

BLAS BETA API

rocblas_gemm_batched_ex_get_solutions gets the indices for all the solutions that can solve a corresponding call to gemm_batched_ex. Which solution is used by gemm_batched_ex is controlled by the solution_index parameter.

All parameters correspond to gemm_batched_ex except for list_array and list_size, which are used as input and output for getting the solution indices. If list_array is NULL, list_size is an output and will be filled with the number of solutions that can solve the GEMM. If list_array is not NULL, then it must be pointing to an array with at least list_size elements and will be filled with the solution indices that can solve the GEMM: the number of elements filled is min(list_size, # of solutions).

Parameters:

handle – [in] [rocblas_handle] handle to the rocblas library context queue.
transA – [in] [rocblas_operation] specifies the form of op( A ).
transB – [in] [rocblas_operation] specifies the form of op( B ).
m – [in] [rocblas_int] matrix dimension m.
n – [in] [rocblas_int] matrix dimension n.
k – [in] [rocblas_int] matrix dimension k.
alpha – [in] [const void *] device pointer or host pointer specifying the scalar alpha. Same datatype as compute_type.
a – [in] [void *] device pointer storing array of pointers to each matrix A_i.
a_type – [in] [rocblas_datatype] specifies the datatype of each matrix A_i.
lda – [in] [rocblas_int] specifies the leading dimension of each A_i.
b – [in] [void *] device pointer storing array of pointers to each matrix B_i.
b_type – [in] [rocblas_datatype] specifies the datatype of each matrix B_i.
ldb – [in] [rocblas_int] specifies the leading dimension of each B_i.
beta – [in] [const void *] device pointer or host pointer specifying the scalar beta. Same datatype as compute_type.
c – [in] [void *] device array of device pointers to each matrix C_i.
c_type – [in] [rocblas_datatype] specifies the datatype of each matrix C_i.
ldc – [in] [rocblas_int] specifies the leading dimension of each C_i.
d – [out] [void *] device array of device pointers to each matrix D_i. If d and c are the same array of matrix pointers then d_type must equal c_type and ldd must equal ldc or the respective invalid status will be returned.
d_type – [in] [rocblas_datatype] specifies the datatype of each matrix D_i.
ldd – [in] [rocblas_int] specifies the leading dimension of each D_i.
batch_count – [in] [rocblas_int] number of gemm operations in the batch.
compute_type – [in] [rocblas_datatype] specifies the datatype of computation.
algo – [in] [rocblas_gemm_algo] enumerant specifying the algorithm type.
flags – [in] [uint32_t] optional gemm flags.
list_array – [out] [rocblas_int *] output array for solution indices or NULL if getting number of solutions
list_size – [inout] [rocblas_int *] size of list_array if getting solution indices or output with number of solutions if list_array is NULL

rocblas_status rocblas_gemm_batched_ex_get_solutions_by_type(rocblas_handle handle, rocblas_datatype input_type, rocblas_datatype output_type, rocblas_datatype compute_type, uint32_t flags, rocblas_int *list_array, rocblas_int *list_size)#

BLAS BETA API

rocblas_gemm_batched_ex_get_solutions_by_type gets the indices for all the solutions that match the given types for gemm_batched_ex. Which solution is used by gemm_ex is controlled by the solution_index parameter.

If list_array is NULL, list_size is an output and will be filled with the number of solutions that can solve the GEMM. If list_array is not NULL, then it must be pointing to an array with at least list_size elements and will be filled with the solution indices that can solve the GEMM: the number of elements filled is min(list_size, # of solutions).

Parameters:

handle – [in] [rocblas_handle] handle to the rocblas library context queue.
input_type – [in] [rocblas_datatype] specifies the datatype of matrix A.
output_type – [in] [rocblas_datatype] specifies the datatype of matrix D.
compute_type – [in] [rocblas_datatype] specifies the datatype of computation.
flags – [in] [uint32_t] optional gemm flags.
list_array – [out] [rocblas_int *] output array for solution indices or NULL if getting number of solutions
list_size – [inout] [rocblas_int *] size of list_array if getting solution indices or output with number of solutions if list_array is NULL

rocblas_status rocblas_gemm_strided_batched_ex_get_solutions(rocblas_handle handle, rocblas_operation transA, rocblas_operation transB, rocblas_int m, rocblas_int n, rocblas_int k, const void *alpha, const void *a, rocblas_datatype a_type, rocblas_int lda, rocblas_stride stride_a, const void *b, rocblas_datatype b_type, rocblas_int ldb, rocblas_stride stride_b, const void *beta, const void *c, rocblas_datatype c_type, rocblas_int ldc, rocblas_stride stride_c, void *d, rocblas_datatype d_type, rocblas_int ldd, rocblas_stride stride_d, rocblas_int batch_count, rocblas_datatype compute_type, rocblas_gemm_algo algo, uint32_t flags, rocblas_int *list_array, rocblas_int *list_size)#

BLAS BETA API

gemm_strided_batched_ex_get_solutions gets the indices for all the solutions that can solve a corresponding call to gemm_strided_batched_ex. Which solution is used by gemm_strided_batched_ex is controlled by the solution_index parameter.

All parameters correspond to gemm_strided_batched_ex except for list_array and list_size, which are used as input and output for getting the solution indices. If list_array is NULL, list_size is an output and will be filled with the number of solutions that can solve the GEMM. If list_array is not NULL, then it must be pointing to an array with at least list_size elements and will be filled with the solution indices that can solve the GEMM: the number of elements filled is min(list_size, # of solutions).

Parameters:

handle – [in] [rocblas_handle] handle to the rocblas library context queue.
transA – [in] [rocblas_operation] specifies the form of op( A ).
transB – [in] [rocblas_operation] specifies the form of op( B ).
m – [in] [rocblas_int] matrix dimension m.
n – [in] [rocblas_int] matrix dimension n.
k – [in] [rocblas_int] matrix dimension k.
alpha – [in] [const void *] device pointer or host pointer specifying the scalar alpha. Same datatype as compute_type.
a – [in] [void *] device pointer pointing to first matrix A_1.
a_type – [in] [rocblas_datatype] specifies the datatype of each matrix A_i.
lda – [in] [rocblas_int] specifies the leading dimension of each A_i.
stride_a – [in] [rocblas_stride] specifies stride from start of one A_i matrix to the next A_(i + 1).
b – [in] [void *] device pointer pointing to first matrix B_1.
b_type – [in] [rocblas_datatype] specifies the datatype of each matrix B_i.
ldb – [in] [rocblas_int] specifies the leading dimension of each B_i.
stride_b – [in] [rocblas_stride] specifies stride from start of one B_i matrix to the next B_(i + 1).
beta – [in] [const void *] device pointer or host pointer specifying the scalar beta. Same datatype as compute_type.
c – [in] [void *] device pointer pointing to first matrix C_1.
c_type – [in] [rocblas_datatype] specifies the datatype of each matrix C_i.
ldc – [in] [rocblas_int] specifies the leading dimension of each C_i.
stride_c – [in] [rocblas_stride] specifies stride from start of one C_i matrix to the next C_(i + 1).
d – [out] [void *] device pointer storing each matrix D_i. If d and c pointers are to the same matrix then d_type must equal c_type and ldd must equal ldc and stride_d must equal stride_c or the respective invalid status will be returned.
d_type – [in] [rocblas_datatype] specifies the datatype of each matrix D_i.
ldd – [in] [rocblas_int] specifies the leading dimension of each D_i.
stride_d – [in] [rocblas_stride] specifies stride from start of one D_i matrix to the next D_(i + 1).
batch_count – [in] [rocblas_int] number of gemm operations in the batch.
compute_type – [in] [rocblas_datatype] specifies the datatype of computation.
algo – [in] [rocblas_gemm_algo] enumerant specifying the algorithm type.
flags – [in] [uint32_t] optional gemm flags.
list_array – [out] [rocblas_int *] output array for solution indices or NULL if getting number of solutions
list_size – [inout] [rocblas_int *] size of list_array if getting solution indices or output with number of solutions if list_array is NULL

Graph support for rocBLAS#

Most of the rocBLAS functions can be captured into a graph node via Graph Management HIP APIs, except those listed in Functions unsupported with Graph Capture. For a list of graph related HIP APIs, see Graph Management HIP API.

The following code creates a graph with rocblas_function() as graph node.

CHECK_HIP_ERROR((hipStreamBeginCapture(stream, hipStreamCaptureModeGlobal));
rocblas_<function>(<arguments>);
CHECK_HIP_ERROR(hipStreamEndCapture(stream, &graph));

The captured graph can be launched as shown below:

CHECK_HIP_ERROR(hipGraphInstantiate(&instance, graph, NULL, NULL, 0));
CHECK_HIP_ERROR(hipGraphLaunch(instance, stream));

Graph support requires asynchronous HIP APIs, so users must enable stream-order memory allocation. For more details, see Stream-ordered memory allocation.

During stream capture, rocBLAS stores the allocated host and device memory in the handle. The allocated memory is freed when the handle is destroyed.

Functions unsupported with Graph Capture#

The following Level-1 functions place results into host buffers (in pointer mode host) which enforces synchronization.

dot
asum
nrm2
imax
imin

BLAS Level-3 and BLAS-EX functions in pointer mode device do not support HIP Graph. Support will be added in a future release.

HIP Graph known issues in rocBLAS#

On the Windows platform, batched functions (Level-1, Level-2, and Level-3) produce incorrect results.

rocBLAS beta features

Contents

rocBLAS beta features#

rocblas_gemm_ex_get_solutions + batched, strided_batched#

Graph support for rocBLAS#

Functions unsupported with Graph Capture#

HIP Graph known issues in rocBLAS#