rocblas_trsm_strided_batched_ex Interface Reference#
BLAS EX API. More...
Public Member Functions | |
integer(kind(rocblas_status_success)) function | rocblas_trsm_strided_batched_ex_ (handle, side, uplo, transA, diag, m, n, alpha, A, lda, stride_A, B, ldb, stride_B, batch_count, invA, invA_size, stride_invA, compute_type) |
Detailed Description
BLAS EX API.
trsm_strided_batched_ex solves
op(A_i)*X_i = alpha*B_i or X_i*op(A_i) = alpha*B_i,
for i = 1, ..., batch_count; and where alpha is a scalar, X and B are strided batched m by n matrices, A is a strided batched triangular matrix and op(A_i) is one of
op( A_i ) = A_i or op( A_i ) = A_i^T or op( A_i ) = A_i^H.
Each matrix X_i is overwritten on B_i.
This function gives the user the ability to reuse each invA_i matrix between runs. If invA == NULL, rocblas_trsm_batched_ex will automatically calculate each invA_i on every run.
Setting up invA: Each accepted invA_i matrix consists of the packed 128x128 inverses of the diagonal blocks of matrix A_i, followed by any smaller diagonal block that remains. To set up invA_i it is recommended that rocblas_trtri_batched be used with matrix A_i as the input. invA is a contiguous piece of memory holding each invA_i.
Device memory of size 128 x k should be allocated for each invA_i ahead of time, where k is m when rocblas_side_left and is n when rocblas_side_right. The actual number of elements in each invA_i should be passed as invA_size.
To begin, rocblas_trtri_batched must be called on the full 128x128 sized diagonal blocks of each matrix A_i. Below are the restricted parameters:
- n = 128
- ldinvA = 128
- stride_invA = 128x128
- batch_count = k 128,
Then any remaining block may be added:
- n = k % 128
- invA = invA + stride_invA * previous_batch_count
- ldinvA = 128
- batch_count = 1
- Parameters
-
[in] handle [rocblas_handle] handle to the rocblas library context queue. [in] side [rocblas_side] rocblas_side_left: op(A)*X = alpha*B. rocblas_side_right: X*op(A) = alpha*B. [in] uplo [rocblas_fill] rocblas_fill_upper: each A_i is an upper triangular matrix. rocblas_fill_lower: each A_i is a lower triangular matrix. [in] transA [rocblas_operation] transB: op(A) = A. rocblas_operation_transpose: op(A) = A^T. rocblas_operation_conjugate_transpose: op(A) = A^H. [in] diag [rocblas_diagonal] rocblas_diagonal_unit: each A_i is assumed to be unit triangular. rocblas_diagonal_non_unit: each A_i is not assumed to be unit triangular. [in] m [rocblas_int] m specifies the number of rows of each B_i. m >= 0. [in] n [rocblas_int] n specifies the number of columns of each B_i. n >= 0. [in] alpha [void *] device pointer or host pointer specifying the scalar alpha. When alpha is &zero then A is not referenced, and B need not be set before entry. [in] A [void *] device pointer storing matrix A. of dimension ( lda, k ), where k is m when rocblas_side_left and is n when rocblas_side_right only the upper/lower triangular part is accessed. [in] lda [rocblas_int] lda specifies the first dimension of A. if side = rocblas_side_left, lda >= max( 1, m ), if side = rocblas_side_right, lda >= max( 1, n ). [in] stride_A [rocblas_stride] The stride between each A matrix. [in,out] B [void *] device pointer pointing to first matrix B_i. each B_i is of dimension ( ldb, n ). Before entry, the leading m by n part of each array B_i must contain the right-hand side of matrix B_i, and on exit is overwritten by the solution matrix X_i. [in] ldb [rocblas_int] ldb specifies the first dimension of each B_i. ldb >= max( 1, m ). [in] stride_B [rocblas_stride] The stride between each B_i matrix. [in] batch_count [rocblas_int] specifies how many batches. [in] invA [void *] device pointer storing the inverse diagonal blocks of each A_i. invA points to the first invA_1. each invA_i is of dimension ( ld_invA, k ), where k is m when rocblas_side_left and is n when rocblas_side_right. ld_invA must be equal to 128. [in] invA_size [rocblas_int] invA_size specifies the number of elements of device memory in each invA_i. [in] stride_invA [rocblas_stride] The stride between each invA matrix. [in] compute_type [rocblas_datatype] specifies the datatype of computation
Member Function/Subroutine Documentation
◆ rocblas_trsm_strided_batched_ex_()
integer(kind(rocblas_status_success)) function hipfort_rocblas::rocblas_trsm_strided_batched_ex::rocblas_trsm_strided_batched_ex_ | ( | type(c_ptr), value | handle, |
integer(kind(rocblas_side_left)), value | side, | ||
integer(kind(rocblas_fill_upper)), value | uplo, | ||
integer(kind(rocblas_operation_none)), value | transA, | ||
integer(kind(rocblas_diagonal_non_unit)), value | diag, | ||
integer(c_int), value | m, | ||
integer(c_int), value | n, | ||
type(c_ptr), value | alpha, | ||
type(c_ptr), value | A, | ||
integer(c_int), value | lda, | ||
integer(c_int64_t), value | stride_A, | ||
type(c_ptr), value | B, | ||
integer(c_int), value | ldb, | ||
integer(c_int64_t), value | stride_B, | ||
integer(c_int), value | batch_count, | ||
type(c_ptr), value | invA, | ||
integer(c_int), value | invA_size, | ||
integer(c_int64_t), value | stride_invA, | ||
integer(kind(rocblas_datatype_f16_r)), value | compute_type | ||
) |
The documentation for this interface was generated from the following file: