rocBLAS Helper Functions#

Auxiliary Functions#

rocblas_status rocblas_create_handle(rocblas_handle *handle)#

Create handle.

rocblas_status rocblas_destroy_handle(rocblas_handle handle)#

Destroy handle.

rocblas_status rocblas_set_stream(rocblas_handle handle, hipStream_t stream)#

Set stream for handle.

rocblas_status rocblas_get_stream(rocblas_handle handle, hipStream_t *stream)#

Get stream [0] from handle.

rocblas_status rocblas_set_pointer_mode(rocblas_handle handle, rocblas_pointer_mode pointer_mode)#

Set rocblas_pointer_mode.

rocblas_status rocblas_get_pointer_mode(rocblas_handle handle, rocblas_pointer_mode *pointer_mode)#

Get rocblas_pointer_mode.

rocblas_status rocblas_set_atomics_mode(rocblas_handle handle, rocblas_atomics_mode atomics_mode)#

Set rocblas_atomics_mode.

Some rocBLAS functions may have implementations which use atomic operations to increase performance. By using atomic operations, results are not guaranteed to be identical between multiple runs. Results will be accurate with or without atomic operations, but if it is required to have bit-wise reproducible results, atomic operations should not be used.

Atomic operations can be turned on or off for a handle by calling rocblas_set_atomics_mode. By default, this is set to rocblas_atomics_allowed.

rocblas_status rocblas_get_atomics_mode(rocblas_handle handle, rocblas_atomics_mode *atomics_mode)#

Get rocblas_atomics_mode.

rocblas_pointer_mode rocblas_pointer_to_mode(void *ptr)#

Indicates whether the pointer is on the host or device.

void rocblas_initialize(void)#

Initialize rocBLAS on the current HIP device, to avoid costly startup time at the first call on that device.

Calling rocblas_initialize() allows upfront initialization including device specific kernel setup. Otherwise this function is automatically called on the first function call that requires these initializations (mainly GEMM).

const char *rocblas_status_to_string(rocblas_status status)#

BLAS Auxiliary API

rocblas_status_to_string

Returns string representing rocblas_status value

Parameters:

status[in] [rocblas_status] rocBLAS status to convert to string

rocblas_status rocblas_set_vector(rocblas_int n, rocblas_int elem_size, const void *x, rocblas_int incx, void *y, rocblas_int incy)#

Copy vector from host to device.

Parameters:
  • n[in] [rocblas_int] number of elements in the vector

  • elem_size[in] [rocblas_int] number of bytes per element in the matrix

  • x[in] pointer to vector on the host

  • incx[in] [rocblas_int] specifies the increment for the elements of the vector

  • y[out] pointer to vector on the device

  • incy[in] [rocblas_int] specifies the increment for the elements of the vector

rocblas_status rocblas_get_vector(rocblas_int n, rocblas_int elem_size, const void *x, rocblas_int incx, void *y, rocblas_int incy)#

Copy vector from device to host.

Parameters:
  • n[in] [rocblas_int] number of elements in the vector

  • elem_size[in] [rocblas_int] number of bytes per element in the matrix

  • x[in] pointer to vector on the device

  • incx[in] [rocblas_int] specifies the increment for the elements of the vector

  • y[out] pointer to vector on the host

  • incy[in] [rocblas_int] specifies the increment for the elements of the vector

rocblas_status rocblas_set_vector_async(rocblas_int n, rocblas_int elem_size, const void *x, rocblas_int incx, void *y, rocblas_int incy, hipStream_t stream)#

Asynchronously copy vector from host to device.

rocblas_set_vector_async copies a vector from pinned host memory to device memory asynchronously. Memory on the host must be allocated with hipHostMalloc or the transfer will be synchronous.

Parameters:
  • n[in] [rocblas_int] number of elements in the vector

  • elem_size[in] [rocblas_int] number of bytes per element in the matrix

  • x[in] pointer to vector on the host

  • incx[in] [rocblas_int] specifies the increment for the elements of the vector

  • y[out] pointer to vector on the device

  • incy[in] [rocblas_int] specifies the increment for the elements of the vector

  • stream[in] specifies the stream into which this transfer request is queued

rocblas_status rocblas_get_vector_async(rocblas_int n, rocblas_int elem_size, const void *x, rocblas_int incx, void *y, rocblas_int incy, hipStream_t stream)#

Asynchronously copy vector from device to host.

rocblas_get_vector_async copies a vector from pinned host memory to device memory asynchronously. Memory on the host must be allocated with hipHostMalloc or the transfer will be synchronous.

Parameters:
  • n[in] [rocblas_int] number of elements in the vector

  • elem_size[in] [rocblas_int] number of bytes per element in the matrix

  • x[in] pointer to vector on the device

  • incx[in] [rocblas_int] specifies the increment for the elements of the vector

  • y[out] pointer to vector on the host

  • incy[in] [rocblas_int] specifies the increment for the elements of the vector

  • stream[in] specifies the stream into which this transfer request is queued

rocblas_status rocblas_set_matrix(rocblas_int rows, rocblas_int cols, rocblas_int elem_size, const void *a, rocblas_int lda, void *b, rocblas_int ldb)#

Copy matrix from host to device.

Parameters:
  • rows[in] [rocblas_int] number of rows in matrices

  • cols[in] [rocblas_int] number of columns in matrices

  • elem_size[in] [rocblas_int] number of bytes per element in the matrix

  • a[in] pointer to matrix on the host

  • lda[in] [rocblas_int] specifies the leading dimension of A, lda >= rows

  • b[out] pointer to matrix on the GPU

  • ldb[in] [rocblas_int] specifies the leading dimension of B, ldb >= rows

rocblas_status rocblas_get_matrix(rocblas_int rows, rocblas_int cols, rocblas_int elem_size, const void *a, rocblas_int lda, void *b, rocblas_int ldb)#

Copy matrix from device to host.

Parameters:
  • rows[in] [rocblas_int] number of rows in matrices

  • cols[in] [rocblas_int] number of columns in matrices

  • elem_size[in] [rocblas_int] number of bytes per element in the matrix

  • a[in] pointer to matrix on the GPU

  • lda[in] [rocblas_int] specifies the leading dimension of A, lda >= rows

  • b[out] pointer to matrix on the host

  • ldb[in] [rocblas_int] specifies the leading dimension of B, ldb >= rows

rocblas_status rocblas_set_matrix_async(rocblas_int rows, rocblas_int cols, rocblas_int elem_size, const void *a, rocblas_int lda, void *b, rocblas_int ldb, hipStream_t stream)#

Asynchronously copy matrix from host to device.

rocblas_set_matrix_async copies a matrix from pinned host memory to device memory asynchronously. Memory on the host must be allocated with hipHostMalloc or the transfer will be synchronous.

Parameters:
  • rows[in] [rocblas_int] number of rows in matrices

  • cols[in] [rocblas_int] number of columns in matrices

  • elem_size[in] [rocblas_int] number of bytes per element in the matrix

  • a[in] pointer to matrix on the host

  • lda[in] [rocblas_int] specifies the leading dimension of A, lda >= rows

  • b[out] pointer to matrix on the GPU

  • ldb[in] [rocblas_int] specifies the leading dimension of B, ldb >= rows

  • stream[in] specifies the stream into which this transfer request is queued

rocblas_status rocblas_get_matrix_async(rocblas_int rows, rocblas_int cols, rocblas_int elem_size, const void *a, rocblas_int lda, void *b, rocblas_int ldb, hipStream_t stream)#

asynchronously copy matrix from device to host

rocblas_get_matrix_async copies a matrix from device memory to pinned host memory asynchronously. Memory on the host must be allocated with hipHostMalloc or the transfer will be synchronous.

Parameters:
  • rows[in] [rocblas_int] number of rows in matrices

  • cols[in] [rocblas_int] number of columns in matrices

  • elem_size[in] [rocblas_int] number of bytes per element in the matrix

  • a[in] pointer to matrix on the GPU

  • lda[in] [rocblas_int] specifies the leading dimension of A, lda >= rows

  • b[out] pointer to matrix on the host

  • ldb[in] [rocblas_int] specifies the leading dimension of B, ldb >= rows

  • stream[in] specifies the stream into which this transfer request is queued

The set/get_vector and set/get_matrix functions including their async forms support the _64 interface. Refer to section ILP64 Interface.

Device Memory Allocation Functions#

rocblas_status rocblas_start_device_memory_size_query(rocblas_handle handle)#

Indicates that subsequent rocBLAS kernel calls should collect the optimal device memory size in bytes for their given kernel arguments and keep track of the maximum. Each kernel call can reuse temporary device memory on the same stream so the maximum is collected. Returns rocblas_status_size_query_mismatch if another size query is already in progress; returns rocblas_status_success otherwise

Parameters:

handle[in] rocblas handle

rocblas_status rocblas_stop_device_memory_size_query(rocblas_handle handle, size_t *size)#

Stops collecting optimal device memory size information. Returns rocblas_status_size_query_mismatch if a collection is not underway; rocblas_status_invalid_handle if handle is nullptr; rocblas_status_invalid_pointer if size is nullptr; rocblas_status_success otherwise

Parameters:
  • handle[in] rocblas handle

  • size[out] maximum of the optimal sizes collected

rocblas_status rocblas_get_device_memory_size(rocblas_handle handle, size_t *size)#

Gets the current device memory size for the handle. Returns rocblas_status_invalid_handle if handle is nullptr; rocblas_status_invalid_pointer if size is nullptr; rocblas_status_success otherwise

Parameters:
  • handle[in] rocblas handle

  • size[out] current device memory size for the handle

rocblas_status rocblas_set_device_memory_size(rocblas_handle handle, size_t size)#

Changes the size of allocated device memory at runtime.

Any previously allocated device memory managed by the handle is freed.

If size > 0 sets the device memory size to the specified size (in bytes). If size == 0, frees the memory allocated so far, and lets rocBLAS manage device memory in the future, expanding it when necessary. Returns rocblas_status_invalid_handle if handle is nullptr; rocblas_status_invalid_pointer if size is nullptr; rocblas_status_success otherwise

Parameters:
  • handle[in] rocblas handle

  • size[in] size of allocated device memory

rocblas_status rocblas_set_workspace(rocblas_handle handle, void *addr, size_t size)#

Sets the device workspace for the handle to use.

Any previously allocated device memory managed by the handle is freed.

Returns rocblas_status_invalid_handle if handle is nullptr; rocblas_status_success otherwise

Parameters:
  • handle[in] rocblas handle

  • addr[in] address of workspace memory

  • size[in] size of workspace memory

bool rocblas_is_managing_device_memory(rocblas_handle handle)#

Returns true when device memory in handle is managed by rocBLAS

Parameters:

handle[in] rocblas handle

bool rocblas_is_user_managing_device_memory(rocblas_handle handle)#

Returns true when device memory in handle is managed by the user

Parameters:

handle[in] rocblas handle

For more detailed information, refer to sections Device Memory Allocation in rocBLAS and Device Memory Allocation.

Build Information Functions#

rocblas_status rocblas_get_version_string_size(size_t *len)#

Queries the minimum buffer size for a successful call to rocblas_get_version_string.

Parameters:

len[out] pointer to size_t for storing the length

rocblas_status rocblas_get_version_string(char *buf, size_t len)#

Loads char* buf with the rocblas library version. size_t len is the maximum length of char* buf.

Parameters:
  • buf[inout] pointer to buffer for version string

  • len[in] length of buf