rocblas_trsm_ex Interface Reference

rocblas_trsm_ex Interface Reference#

HIPFORT API Reference: hipfort_rocblas::rocblas_trsm_ex Interface Reference
hipfort_rocblas::rocblas_trsm_ex Interface Reference

BLAS EX API. More...

Public Member Functions

integer(kind(rocblas_status_success)) function rocblas_trsm_ex_ (handle, side, uplo, transA, diag, m, n, alpha, A, lda, B, ldb, invA, invA_size, compute_type)
 

Detailed Description

BLAS EX API.

trsm_ex solves

op(A)*X = alpha*B or X*op(A) = alpha*B,

where alpha is a scalar, X and B are m by n matrices, A is triangular matrix and op(A) is one of

op( A ) = A   or   op( A ) = A^T   or   op( A ) = A^H.

The matrix X is overwritten on B.

This function gives the user the ability to reuse the invA matrix between runs. If invA == NULL, rocblas_trsm_ex will automatically calculate invA on every run.

Setting up invA: The accepted invA matrix consists of the packed 128x128 inverses of the diagonal blocks of matrix A, followed by any smaller diagonal block that remains. To set up invA it is recommended that rocblas_trtri_batched be used with matrix A as the input.

Device memory of size 128 x k should be allocated for invA ahead of time, where k is m when rocblas_side_left and is n when rocblas_side_right. The actual number of elements in invA should be passed as invA_size.

To begin, rocblas_trtri_batched must be called on the full 128x128 sized diagonal blocks of matrix A. Below are the restricted parameters:

  • n = 128
  • ldinvA = 128
  • stride_invA = 128x128
  • batch_count = k 128,

Then any remaining block may be added:

  • n = k % 128
  • invA = invA + stride_invA * previous_batch_count
  • ldinvA = 128
  • batch_count = 1
Parameters
[in]handle[rocblas_handle] handle to the rocblas library context queue.
[in]side[rocblas_side] rocblas_side_left: op(A)*X = alpha*B. rocblas_side_right: X*op(A) = alpha*B.
[in]uplo[rocblas_fill] rocblas_fill_upper: A is an upper triangular matrix. rocblas_fill_lower: A is a lower triangular matrix.
[in]transA[rocblas_operation] transB: op(A) = A. rocblas_operation_transpose: op(A) = A^T. rocblas_operation_conjugate_transpose: op(A) = A^H.
[in]diag[rocblas_diagonal] rocblas_diagonal_unit: A is assumed to be unit triangular. rocblas_diagonal_non_unit: A is not assumed to be unit triangular.
[in]m[rocblas_int] m specifies the number of rows of B. m >= 0.
[in]n[rocblas_int] n specifies the number of columns of B. n >= 0.
[in]alpha[void *] device pointer or host pointer specifying the scalar alpha. When alpha is &zero then A is not referenced, and B need not be set before entry.
[in]A[void *] device pointer storing matrix A. of dimension ( lda, k ), where k is m when rocblas_side_left and is n when rocblas_side_right only the upper/lower triangular part is accessed.
[in]lda[rocblas_int] lda specifies the first dimension of A. if side = rocblas_side_left, lda >= max( 1, m ), if side = rocblas_side_right, lda >= max( 1, n ).
[in,out]B[void *] device pointer storing matrix B. B is of dimension ( ldb, n ). Before entry, the leading m by n part of the array B must contain the right-hand side matrix B, and on exit is overwritten by the solution matrix X.
[in]ldb[rocblas_int] ldb specifies the first dimension of B. ldb >= max( 1, m ).
[in]invA[void *] device pointer storing the inverse diagonal blocks of A. invA is of dimension ( ld_invA, k ), where k is m when rocblas_side_left and is n when rocblas_side_right. ld_invA must be equal to 128.
[in]invA_size[rocblas_int] invA_size specifies the number of elements of device memory in invA.
[in]compute_type[rocblas_datatype] specifies the datatype of computation

Member Function/Subroutine Documentation

◆ rocblas_trsm_ex_()

integer(kind(rocblas_status_success)) function hipfort_rocblas::rocblas_trsm_ex::rocblas_trsm_ex_ ( type(c_ptr), value  handle,
integer(kind(rocblas_side_left)), value  side,
integer(kind(rocblas_fill_upper)), value  uplo,
integer(kind(rocblas_operation_none)), value  transA,
integer(kind(rocblas_diagonal_non_unit)), value  diag,
integer(c_int), value  m,
integer(c_int), value  n,
type(c_ptr), value  alpha,
type(c_ptr), value  A,
integer(c_int), value  lda,
type(c_ptr), value  B,
integer(c_int), value  ldb,
type(c_ptr), value  invA,
integer(c_int), value  invA_size,
integer(kind(rocblas_datatype_f16_r)), value  compute_type 
)

The documentation for this interface was generated from the following file: