rocblas_trsm_ex Interface Reference#
BLAS EX API. More...
Public Member Functions | |
integer(kind(rocblas_status_success)) function | rocblas_trsm_ex_ (handle, side, uplo, transA, diag, m, n, alpha, A, lda, B, ldb, invA, invA_size, compute_type) |
Detailed Description
BLAS EX API.
trsm_ex solves
op(A)*X = alpha*B or X*op(A) = alpha*B,
where alpha is a scalar, X and B are m by n matrices, A is triangular matrix and op(A) is one of
op( A ) = A or op( A ) = A^T or op( A ) = A^H.
The matrix X is overwritten on B.
This function gives the user the ability to reuse the invA matrix between runs. If invA == NULL, rocblas_trsm_ex will automatically calculate invA on every run.
Setting up invA: The accepted invA matrix consists of the packed 128x128 inverses of the diagonal blocks of matrix A, followed by any smaller diagonal block that remains. To set up invA it is recommended that rocblas_trtri_batched be used with matrix A as the input.
Device memory of size 128 x k should be allocated for invA ahead of time, where k is m when rocblas_side_left and is n when rocblas_side_right. The actual number of elements in invA should be passed as invA_size.
To begin, rocblas_trtri_batched must be called on the full 128x128 sized diagonal blocks of matrix A. Below are the restricted parameters:
- n = 128
- ldinvA = 128
- stride_invA = 128x128
- batch_count = k 128,
Then any remaining block may be added:
- n = k % 128
- invA = invA + stride_invA * previous_batch_count
- ldinvA = 128
- batch_count = 1
- Parameters
-
[in] handle [rocblas_handle] handle to the rocblas library context queue. [in] side [rocblas_side] rocblas_side_left: op(A)*X = alpha*B. rocblas_side_right: X*op(A) = alpha*B. [in] uplo [rocblas_fill] rocblas_fill_upper: A is an upper triangular matrix. rocblas_fill_lower: A is a lower triangular matrix. [in] transA [rocblas_operation] transB: op(A) = A. rocblas_operation_transpose: op(A) = A^T. rocblas_operation_conjugate_transpose: op(A) = A^H. [in] diag [rocblas_diagonal] rocblas_diagonal_unit: A is assumed to be unit triangular. rocblas_diagonal_non_unit: A is not assumed to be unit triangular. [in] m [rocblas_int] m specifies the number of rows of B. m >= 0. [in] n [rocblas_int] n specifies the number of columns of B. n >= 0. [in] alpha [void *] device pointer or host pointer specifying the scalar alpha. When alpha is &zero then A is not referenced, and B need not be set before entry. [in] A [void *] device pointer storing matrix A. of dimension ( lda, k ), where k is m when rocblas_side_left and is n when rocblas_side_right only the upper/lower triangular part is accessed. [in] lda [rocblas_int] lda specifies the first dimension of A. if side = rocblas_side_left, lda >= max( 1, m ), if side = rocblas_side_right, lda >= max( 1, n ). [in,out] B [void *] device pointer storing matrix B. B is of dimension ( ldb, n ). Before entry, the leading m by n part of the array B must contain the right-hand side matrix B, and on exit is overwritten by the solution matrix X. [in] ldb [rocblas_int] ldb specifies the first dimension of B. ldb >= max( 1, m ). [in] invA [void *] device pointer storing the inverse diagonal blocks of A. invA is of dimension ( ld_invA, k ), where k is m when rocblas_side_left and is n when rocblas_side_right. ld_invA must be equal to 128. [in] invA_size [rocblas_int] invA_size specifies the number of elements of device memory in invA. [in] compute_type [rocblas_datatype] specifies the datatype of computation
Member Function/Subroutine Documentation
◆ rocblas_trsm_ex_()
integer(kind(rocblas_status_success)) function hipfort_rocblas::rocblas_trsm_ex::rocblas_trsm_ex_ | ( | type(c_ptr), value | handle, |
integer(kind(rocblas_side_left)), value | side, | ||
integer(kind(rocblas_fill_upper)), value | uplo, | ||
integer(kind(rocblas_operation_none)), value | transA, | ||
integer(kind(rocblas_diagonal_non_unit)), value | diag, | ||
integer(c_int), value | m, | ||
integer(c_int), value | n, | ||
type(c_ptr), value | alpha, | ||
type(c_ptr), value | A, | ||
integer(c_int), value | lda, | ||
type(c_ptr), value | B, | ||
integer(c_int), value | ldb, | ||
type(c_ptr), value | invA, | ||
integer(c_int), value | invA_size, | ||
integer(kind(rocblas_datatype_f16_r)), value | compute_type | ||
) |
The documentation for this interface was generated from the following file: