rocblas_trsm_strided_batched_ex Interface Reference

rocblas_trsm_strided_batched_ex Interface Reference#

HIPFORT API Reference: hipfort_rocblas::rocblas_trsm_strided_batched_ex Interface Reference
hipfort_rocblas::rocblas_trsm_strided_batched_ex Interface Reference

BLAS EX API. More...

Public Member Functions

integer(kind(rocblas_status_success)) function rocblas_trsm_strided_batched_ex_ (handle, side, uplo, transA, diag, m, n, alpha, A, lda, stride_A, B, ldb, stride_B, batch_count, invA, invA_size, stride_invA, compute_type)
 

Detailed Description

BLAS EX API.

trsm_strided_batched_ex solves

op(A_i)*X_i = alpha*B_i or X_i*op(A_i) = alpha*B_i,

for i = 1, ..., batch_count; and where alpha is a scalar, X and B are strided batched m by n matrices, A is a strided batched triangular matrix and op(A_i) is one of

op( A_i ) = A_i   or   op( A_i ) = A_i^T   or   op( A_i ) = A_i^H.

Each matrix X_i is overwritten on B_i.

This function gives the user the ability to reuse each invA_i matrix between runs. If invA == NULL, rocblas_trsm_batched_ex will automatically calculate each invA_i on every run.

Setting up invA: Each accepted invA_i matrix consists of the packed 128x128 inverses of the diagonal blocks of matrix A_i, followed by any smaller diagonal block that remains. To set up invA_i it is recommended that rocblas_trtri_batched be used with matrix A_i as the input. invA is a contiguous piece of memory holding each invA_i.

Device memory of size 128 x k should be allocated for each invA_i ahead of time, where k is m when rocblas_side_left and is n when rocblas_side_right. The actual number of elements in each invA_i should be passed as invA_size.

To begin, rocblas_trtri_batched must be called on the full 128x128 sized diagonal blocks of each matrix A_i. Below are the restricted parameters:

  • n = 128
  • ldinvA = 128
  • stride_invA = 128x128
  • batch_count = k 128,

Then any remaining block may be added:

  • n = k % 128
  • invA = invA + stride_invA * previous_batch_count
  • ldinvA = 128
  • batch_count = 1
Parameters
[in]handle[rocblas_handle] handle to the rocblas library context queue.
[in]side[rocblas_side] rocblas_side_left: op(A)*X = alpha*B. rocblas_side_right: X*op(A) = alpha*B.
[in]uplo[rocblas_fill] rocblas_fill_upper: each A_i is an upper triangular matrix. rocblas_fill_lower: each A_i is a lower triangular matrix.
[in]transA[rocblas_operation] transB: op(A) = A. rocblas_operation_transpose: op(A) = A^T. rocblas_operation_conjugate_transpose: op(A) = A^H.
[in]diag[rocblas_diagonal] rocblas_diagonal_unit: each A_i is assumed to be unit triangular. rocblas_diagonal_non_unit: each A_i is not assumed to be unit triangular.
[in]m[rocblas_int] m specifies the number of rows of each B_i. m >= 0.
[in]n[rocblas_int] n specifies the number of columns of each B_i. n >= 0.
[in]alpha[void *] device pointer or host pointer specifying the scalar alpha. When alpha is &zero then A is not referenced, and B need not be set before entry.
[in]A[void *] device pointer storing matrix A. of dimension ( lda, k ), where k is m when rocblas_side_left and is n when rocblas_side_right only the upper/lower triangular part is accessed.
[in]lda[rocblas_int] lda specifies the first dimension of A. if side = rocblas_side_left, lda >= max( 1, m ), if side = rocblas_side_right, lda >= max( 1, n ).
[in]stride_A[rocblas_stride] The stride between each A matrix.
[in,out]B[void *] device pointer pointing to first matrix B_i. each B_i is of dimension ( ldb, n ). Before entry, the leading m by n part of each array B_i must contain the right-hand side of matrix B_i, and on exit is overwritten by the solution matrix X_i.
[in]ldb[rocblas_int] ldb specifies the first dimension of each B_i. ldb >= max( 1, m ).
[in]stride_B[rocblas_stride] The stride between each B_i matrix.
[in]batch_count[rocblas_int] specifies how many batches.
[in]invA[void *] device pointer storing the inverse diagonal blocks of each A_i. invA points to the first invA_1. each invA_i is of dimension ( ld_invA, k ), where k is m when rocblas_side_left and is n when rocblas_side_right. ld_invA must be equal to 128.
[in]invA_size[rocblas_int] invA_size specifies the number of elements of device memory in each invA_i.
[in]stride_invA[rocblas_stride] The stride between each invA matrix.
[in]compute_type[rocblas_datatype] specifies the datatype of computation

Member Function/Subroutine Documentation

◆ rocblas_trsm_strided_batched_ex_()

integer(kind(rocblas_status_success)) function hipfort_rocblas::rocblas_trsm_strided_batched_ex::rocblas_trsm_strided_batched_ex_ ( type(c_ptr), value  handle,
integer(kind(rocblas_side_left)), value  side,
integer(kind(rocblas_fill_upper)), value  uplo,
integer(kind(rocblas_operation_none)), value  transA,
integer(kind(rocblas_diagonal_non_unit)), value  diag,
integer(c_int), value  m,
integer(c_int), value  n,
type(c_ptr), value  alpha,
type(c_ptr), value  A,
integer(c_int), value  lda,
integer(c_int64_t), value  stride_A,
type(c_ptr), value  B,
integer(c_int), value  ldb,
integer(c_int64_t), value  stride_B,
integer(c_int), value  batch_count,
type(c_ptr), value  invA,
integer(c_int), value  invA_size,
integer(c_int64_t), value  stride_invA,
integer(kind(rocblas_datatype_f16_r)), value  compute_type 
)

The documentation for this interface was generated from the following file: