rocblas_trsm_strided_batched_ex Interface Reference

rocblas_trsm_strided_batched_ex Interface Reference#

HIPFORT API Reference: hipfort_rocblas::rocblas_trsm_strided_batched_ex Interface Reference

BLAS EX API. More...

Public Member Functions
integer(kind(rocblas_status_success)) function	rocblas_trsm_strided_batched_ex_ (handle, side, uplo, transA, diag, m, n, alpha, A, lda, stride_A, B, ldb, stride_B, batch_count, invA, invA_size, stride_invA, compute_type)

Detailed Description

BLAS EX API.

trsm_strided_batched_ex solves

op(A_i)*X_i = alpha*B_i or X_i*op(A_i) = alpha*B_i,

for i = 1, ..., batch_count; and where alpha is a scalar, X and B are strided batched m by n matrices, A is a strided batched triangular matrix and op(A_i) is one of

op( A_i ) = A_i   or   op( A_i ) = A_i^T   or   op( A_i ) = A_i^H.

Each matrix X_i is overwritten on B_i.

This function gives the user the ability to reuse each invA_i matrix between runs. If invA == NULL, rocblas_trsm_batched_ex will automatically calculate each invA_i on every run.

Setting up invA: Each accepted invA_i matrix consists of the packed 128x128 inverses of the diagonal blocks of matrix A_i, followed by any smaller diagonal block that remains. To set up invA_i it is recommended that rocblas_trtri_batched be used with matrix A_i as the input. invA is a contiguous piece of memory holding each invA_i.

Device memory of size 128 x k should be allocated for each invA_i ahead of time, where k is m when rocblas_side_left and is n when rocblas_side_right. The actual number of elements in each invA_i should be passed as invA_size.

To begin, rocblas_trtri_batched must be called on the full 128x128 sized diagonal blocks of each matrix A_i. Below are the restricted parameters:

n = 128
ldinvA = 128
stride_invA = 128x128
batch_count = k 128,

Then any remaining block may be added:

n = k % 128
invA = invA + stride_invA * previous_batch_count
ldinvA = 128
batch_count = 1

Parameters

[in]	handle	[rocblas_handle] handle to the rocblas library context queue.
[in]	side	[rocblas_side] rocblas_side_left: op(A)X = alphaB. rocblas_side_right: Xop(A) = alphaB.
[in]	uplo	[rocblas_fill] rocblas_fill_upper: each A_i is an upper triangular matrix. rocblas_fill_lower: each A_i is a lower triangular matrix.
[in]	transA	[rocblas_operation] transB: op(A) = A. rocblas_operation_transpose: op(A) = A^T. rocblas_operation_conjugate_transpose: op(A) = A^H.
[in]	diag	[rocblas_diagonal] rocblas_diagonal_unit: each A_i is assumed to be unit triangular. rocblas_diagonal_non_unit: each A_i is not assumed to be unit triangular.
[in]	m	[rocblas_int] m specifies the number of rows of each B_i. m >= 0.
[in]	n	[rocblas_int] n specifies the number of columns of each B_i. n >= 0.
[in]	alpha	[void *] device pointer or host pointer specifying the scalar alpha. When alpha is &zero then A is not referenced, and B need not be set before entry.
[in]	A	[void *] device pointer storing matrix A. of dimension ( lda, k ), where k is m when rocblas_side_left and is n when rocblas_side_right only the upper/lower triangular part is accessed.
[in]	lda	[rocblas_int] lda specifies the first dimension of A. if side = rocblas_side_left, lda >= max( 1, m ), if side = rocblas_side_right, lda >= max( 1, n ).
[in]	stride_A	[rocblas_stride] The stride between each A matrix.
[in,out]	B	[void *] device pointer pointing to first matrix B_i. each B_i is of dimension ( ldb, n ). Before entry, the leading m by n part of each array B_i must contain the right-hand side of matrix B_i, and on exit is overwritten by the solution matrix X_i.
[in]	ldb	[rocblas_int] ldb specifies the first dimension of each B_i. ldb >= max( 1, m ).
[in]	stride_B	[rocblas_stride] The stride between each B_i matrix.
[in]	batch_count	[rocblas_int] specifies how many batches.
[in]	invA	[void *] device pointer storing the inverse diagonal blocks of each A_i. invA points to the first invA_1. each invA_i is of dimension ( ld_invA, k ), where k is m when rocblas_side_left and is n when rocblas_side_right. ld_invA must be equal to 128.
[in]	invA_size	[rocblas_int] invA_size specifies the number of elements of device memory in each invA_i.
[in]	stride_invA	[rocblas_stride] The stride between each invA matrix.
[in]	compute_type	[rocblas_datatype] specifies the datatype of computation

Member Function/Subroutine Documentation