rocBLAS helper functions#
Auxiliary functions#
-
rocblas_status rocblas_create_handle(rocblas_handle *handle)#
Create handle.
-
rocblas_status rocblas_destroy_handle(rocblas_handle handle)#
Destroy handle.
-
rocblas_status rocblas_set_stream(rocblas_handle handle, hipStream_t stream)#
Set stream for handle.
-
rocblas_status rocblas_get_stream(rocblas_handle handle, hipStream_t *stream)#
Get stream [0] from handle.
-
rocblas_status rocblas_set_pointer_mode(rocblas_handle handle, rocblas_pointer_mode pointer_mode)#
Set
rocblas_pointer_mode.
-
rocblas_status rocblas_get_pointer_mode(rocblas_handle handle, rocblas_pointer_mode *pointer_mode)#
Get
rocblas_pointer_mode.
-
rocblas_status rocblas_set_atomics_mode(rocblas_handle handle, rocblas_atomics_mode atomics_mode)#
Set
rocblas_atomics_modeSome rocBLAS functions have implementations which use atomic operations to increase performance. By using atomic operations, results are not guaranteed to be identical between multiple runs. Results are accurate with or without atomic operations. Atomic operations in rocBLAS are turned off by default. They can be turned on or off on a per-handle basis by calling
rocblas_set_atomics_mode.
-
rocblas_status rocblas_get_atomics_mode(rocblas_handle handle, rocblas_atomics_mode *atomics_mode)#
Get
rocblas_atomics_mode.
-
rocblas_pointer_mode rocblas_pointer_to_mode(void *ptr)#
Indicates whether the pointer is on the host or device.
-
void rocblas_initialize(void)#
Initialize rocBLAS on the current HIP device to avoid costly startup time for the first call on that device.
Calling
rocblas_initialize()allows upfront initialization, including device-specific kernel setup. Otherwise, this function is automatically called on the first function call that requires these initializations (mainly GEMM).
-
const char *rocblas_status_to_string(rocblas_status status)#
Returns a string representing the
rocblas_statusvalue.- Parameters:
status – [in] [rocblas_status] rocBLAS status to convert to string
-
rocblas_status rocblas_set_vector(rocblas_int n, rocblas_int elem_size, const void *x, rocblas_int incx, void *y, rocblas_int incy)#
Copy vector from host to device.
- Parameters:
n – [in] [rocblas_int] number of elements in the vector
elem_size – [in] [rocblas_int] number of bytes per element in the matrix
x – [in] pointer to vector on the host
incx – [in] [rocblas_int] specifies the increment for the elements of the vector
y – [out] pointer to vector on the device
incy – [in] [rocblas_int] specifies the increment for the elements of the vector
-
rocblas_status rocblas_get_vector(rocblas_int n, rocblas_int elem_size, const void *x, rocblas_int incx, void *y, rocblas_int incy)#
Copy vector from device to host.
- Parameters:
n – [in] [rocblas_int] number of elements in the vector
elem_size – [in] [rocblas_int] number of bytes per element in the matrix
x – [in] pointer to vector on the device
incx – [in] [rocblas_int] specifies the increment for the elements of the vector
y – [out] pointer to vector on the host
incy – [in] [rocblas_int] specifies the increment for the elements of the vector
-
rocblas_status rocblas_set_vector_async(rocblas_int n, rocblas_int elem_size, const void *x, rocblas_int incx, void *y, rocblas_int incy, hipStream_t stream)#
Asynchronously copy vector from host to device.
rocblas_set_vector_asynccopies a vector from pinned host memory to device memory asynchronously. Memory on the host must be allocated withhipHostMallocor the transfer will be synchronous.- Parameters:
n – [in] [rocblas_int] number of elements in the vector
elem_size – [in] [rocblas_int] number of bytes per element in the matrix
x – [in] pointer to vector on the host
incx – [in] [rocblas_int] specifies the increment for the elements of the vector
y – [out] pointer to vector on the device
incy – [in] [rocblas_int] specifies the increment for the elements of the vector
stream – [in] specifies the stream into which this transfer request is queued
-
rocblas_status rocblas_get_vector_async(rocblas_int n, rocblas_int elem_size, const void *x, rocblas_int incx, void *y, rocblas_int incy, hipStream_t stream)#
Asynchronously copy vector from device to host.
rocblas_get_vector_asynccopies a vector from device memory to pinned host memory asynchronously. Memory on the host must be allocated withhipHostMallocor the transfer will be synchronous.- Parameters:
n – [in] [rocblas_int] number of elements in the vector
elem_size – [in] [rocblas_int] number of bytes per element in the matrix
x – [in] pointer to vector on the device
incx – [in] [rocblas_int] specifies the increment for the elements of the vector
y – [out] pointer to vector on the host
incy – [in] [rocblas_int] specifies the increment for the elements of the vector
stream – [in] specifies the stream into which this transfer request is queued
-
rocblas_status rocblas_set_matrix(rocblas_int rows, rocblas_int cols, rocblas_int elem_size, const void *a, rocblas_int lda, void *b, rocblas_int ldb)#
Copy matrix from host to device.
- Parameters:
rows – [in] [rocblas_int] number of rows in matrices
cols – [in] [rocblas_int] number of columns in matrices
elem_size – [in] [rocblas_int] number of bytes per element in the matrix
a – [in] pointer to matrix on the host
lda – [in] [rocblas_int] specifies the leading dimension of A, lda >= rows
b – [out] pointer to matrix on the GPU
ldb – [in] [rocblas_int] specifies the leading dimension of B, ldb >= rows
-
rocblas_status rocblas_get_matrix(rocblas_int rows, rocblas_int cols, rocblas_int elem_size, const void *a, rocblas_int lda, void *b, rocblas_int ldb)#
Copy matrix from device to host.
- Parameters:
rows – [in] [rocblas_int] number of rows in matrices
cols – [in] [rocblas_int] number of columns in matrices
elem_size – [in] [rocblas_int] number of bytes per element in the matrix
a – [in] pointer to matrix on the GPU
lda – [in] [rocblas_int] specifies the leading dimension of A, lda >= rows
b – [out] pointer to matrix on the host
ldb – [in] [rocblas_int] specifies the leading dimension of B, ldb >= rows
-
rocblas_status rocblas_set_matrix_async(rocblas_int rows, rocblas_int cols, rocblas_int elem_size, const void *a, rocblas_int lda, void *b, rocblas_int ldb, hipStream_t stream)#
Asynchronously copy matrix from host to device.
rocblas_set_matrix_asynccopies a matrix from pinned host memory to device memory asynchronously. Memory on the host must be allocated withhipHostMallocor the transfer will be synchronous.- Parameters:
rows – [in] [rocblas_int] number of rows in matrices
cols – [in] [rocblas_int] number of columns in matrices
elem_size – [in] [rocblas_int] number of bytes per element in the matrix
a – [in] pointer to matrix on the host
lda – [in] [rocblas_int] specifies the leading dimension of A, lda >= rows
b – [out] pointer to matrix on the GPU
ldb – [in] [rocblas_int] specifies the leading dimension of B, ldb >= rows
stream – [in] specifies the stream into which this transfer request is queued
-
rocblas_status rocblas_get_matrix_async(rocblas_int rows, rocblas_int cols, rocblas_int elem_size, const void *a, rocblas_int lda, void *b, rocblas_int ldb, hipStream_t stream)#
Asynchronously copy matrix from device to host.
rocblas_get_matrix_asynccopies a matrix from device memory to pinned host memory asynchronously. Memory on the host must be allocated withhipHostMallocor the transfer will be synchronous.- Parameters:
rows – [in] [rocblas_int] number of rows in matrices
cols – [in] [rocblas_int] number of columns in matrices
elem_size – [in] [rocblas_int] number of bytes per element in the matrix
a – [in] pointer to matrix on the GPU
lda – [in] [rocblas_int] specifies the leading dimension of A, lda >= rows
b – [out] pointer to matrix on the host
ldb – [in] [rocblas_int] specifies the leading dimension of B, ldb >= rows
stream – [in] specifies the stream into which this transfer request is queued
The set/get_vector and set/get_matrix functions including their async forms support the _64 interface. See the ILP64 interface section.
Device memory allocation functions#
-
rocblas_status rocblas_start_device_memory_size_query(rocblas_handle handle)#
Indicates that subsequent rocBLAS kernel calls should start collecting the optimal device memory size in bytes for their given kernel arguments and keeping track of the maximum. Each kernel call can reuse temporary device memory on the same stream so the maximum is collected. Returns
rocblas_status_size_query_mismatchif another size query is already in progress. Returnsrocblas_status_successotherwise.- Parameters:
handle – [in] rocblas handle
-
rocblas_status rocblas_stop_device_memory_size_query(rocblas_handle handle, size_t *size)#
Stops collecting the optimal device memory size information. Returns
rocblas_status_size_query_mismatchif a collection is not underway,rocblas_status_invalid_handleifhandleis nullptr, androcblas_status_invalid_pointerifsizeis nullptr. Returnsrocblas_status_successotherwise.- Parameters:
handle – [in] rocblas handle
size – [out] maximum of the optimal sizes collected
-
rocblas_status rocblas_get_device_memory_size(rocblas_handle handle, size_t *size)#
Gets the current device memory size for the handle. Returns
rocblas_status_invalid_handleifhandleis nullptr,rocblas_status_invalid_pointerifsizeis nullptr, androcblas_status_successotherwise.- Parameters:
handle – [in] rocblas handle
size – [out] current device memory size for the handle
-
rocblas_status rocblas_set_workspace(rocblas_handle handle, void *addr, size_t size)#
Sets the device workspace for the handle to use.
Any previously allocated device memory managed by the handle is freed.
Returns
rocblas_status_invalid_handleifhandleis nullptr androcblas_status_successotherwise.- Parameters:
handle – [in] rocblas handle
addr – [in] address of workspace memory
size – [in] size of workspace memory
-
bool rocblas_is_managing_device_memory(rocblas_handle handle)#
Returns
truewhen the device memory inhandleis managed by rocBLAS.- Parameters:
handle – [in] rocblas handle
For more detailed information, see the Device memory allocation in rocBLAS and Device memory allocation sections.
Build information functions#
-
rocblas_status rocblas_get_version_string_size(size_t *len)#
Queries the minimum buffer size for a successful call to rocblas_get_version_string.
- Parameters:
len – [out] pointer to size_t for storing the length
-
rocblas_status rocblas_get_version_string(char *buf, size_t len)#
Loads
char* bufwith the rocBLAS library version.size_t lenis the maximum length of thechar* buf.- Parameters:
buf – [inout] pointer to buffer for version string
len – [in] length of buf