rocBLAS Level-2 functions

Contents

rocBLAS Level-2 functions#

rocBLAS Level-2 functions perform matrix-vector operations. [Level2]

Level-2 functions support the ILP64 API. For more information on these _64 functions, refer to section ILP64 Interface.

gfx12 Known Issues in rocBLAS#

  • On gfx12 batched and strided_batched functions with batch_count greater than 65536 require using the ILP64 API if returning rocblas_status_invalid_size.

rocblas_Xgbmv + batched, strided_batched#

rocblas_status rocblas_sgbmv(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, rocblas_int kl, rocblas_int ku, const float *alpha, const float *A, rocblas_int lda, const float *x, rocblas_int incx, const float *beta, float *y, rocblas_int incy)#
rocblas_status rocblas_dgbmv(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, rocblas_int kl, rocblas_int ku, const double *alpha, const double *A, rocblas_int lda, const double *x, rocblas_int incx, const double *beta, double *y, rocblas_int incy)#
rocblas_status rocblas_cgbmv(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, rocblas_int kl, rocblas_int ku, const rocblas_float_complex *alpha, const rocblas_float_complex *A, rocblas_int lda, const rocblas_float_complex *x, rocblas_int incx, const rocblas_float_complex *beta, rocblas_float_complex *y, rocblas_int incy)#
rocblas_status rocblas_zgbmv(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, rocblas_int kl, rocblas_int ku, const rocblas_double_complex *alpha, const rocblas_double_complex *A, rocblas_int lda, const rocblas_double_complex *x, rocblas_int incx, const rocblas_double_complex *beta, rocblas_double_complex *y, rocblas_int incy)#

BLAS Level 2 API

gbmv performs one of the matrix-vector operations:

y := alpha*A*x    + beta*y,   or
y := alpha*A**T*x + beta*y,   or
y := alpha*A**H*x + beta*y,
where alpha and beta are scalars, x and y are vectors and A is an
m by n banded matrix with kl sub-diagonals and ku super-diagonals.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • trans[in] [rocblas_operation] indicates whether matrix A is tranposed (conjugated) or not.

  • m[in] [rocblas_int] number of rows of matrix A.

  • n[in] [rocblas_int] number of columns of matrix A.

  • kl[in] [rocblas_int] number of sub-diagonals of A.

  • ku[in] [rocblas_int] number of super-diagonals of A.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • A[in] device pointer storing banded matrix A. Leading (kl + ku + 1) by n part of the matrix contains the coefficients of the banded matrix. The leading diagonal resides in row (ku + 1) with the first super-diagonal above on the RHS of row ku. The first sub-diagonal resides below on the LHS of row ku + 2. This propagates up and down across sub/super-diagonals.

      Ex: (m = n = 7; ku = 2, kl = 2)
      1 2 3 0 0 0 0             0 0 3 3 3 3 3
      4 1 2 3 0 0 0             0 2 2 2 2 2 2
      5 4 1 2 3 0 0    ---->    1 1 1 1 1 1 1
      0 5 4 1 2 3 0             4 4 4 4 4 4 0
      0 0 5 4 1 2 3             5 5 5 5 5 0 0
      0 0 0 5 4 1 2             0 0 0 0 0 0 0
      0 0 0 0 5 4 1             0 0 0 0 0 0 0
    
    Note that the empty elements which do not correspond to data will not be referenced.

  • lda[in] [rocblas_int] specifies the leading dimension of A. Must be >= (kl + ku + 1).

  • x[in] device pointer storing vector x.

  • incx[in] [rocblas_int] specifies the increment for the elements of x.

  • beta[in] device pointer or host pointer to scalar beta.

  • y[inout] device pointer storing vector y.

  • incy[in] [rocblas_int] specifies the increment for the elements of y.

gbmv functions support the _64 interface. Parameters m,`n`,`kl` and ku larger than int32_t max value are not currently supported. Refer to section ILP64 Interface.

rocblas_status rocblas_sgbmv_batched(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, rocblas_int kl, rocblas_int ku, const float *alpha, const float *const A[], rocblas_int lda, const float *const x[], rocblas_int incx, const float *beta, float *const y[], rocblas_int incy, rocblas_int batch_count)#
rocblas_status rocblas_dgbmv_batched(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, rocblas_int kl, rocblas_int ku, const double *alpha, const double *const A[], rocblas_int lda, const double *const x[], rocblas_int incx, const double *beta, double *const y[], rocblas_int incy, rocblas_int batch_count)#
rocblas_status rocblas_cgbmv_batched(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, rocblas_int kl, rocblas_int ku, const rocblas_float_complex *alpha, const rocblas_float_complex *const A[], rocblas_int lda, const rocblas_float_complex *const x[], rocblas_int incx, const rocblas_float_complex *beta, rocblas_float_complex *const y[], rocblas_int incy, rocblas_int batch_count)#
rocblas_status rocblas_zgbmv_batched(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, rocblas_int kl, rocblas_int ku, const rocblas_double_complex *alpha, const rocblas_double_complex *const A[], rocblas_int lda, const rocblas_double_complex *const x[], rocblas_int incx, const rocblas_double_complex *beta, rocblas_double_complex *const y[], rocblas_int incy, rocblas_int batch_count)#

BLAS Level 2 API

gbmv_batched performs one of the matrix-vector operations:

y_i := alpha*A_i*x_i    + beta*y_i,   or
y_i := alpha*A_i**T*x_i + beta*y_i,   or
y_i := alpha*A_i**H*x_i + beta*y_i,
where (A_i, x_i, y_i) is the i-th instance of the batch.
alpha and beta are scalars, x_i and y_i are vectors and A_i is an
m by n banded matrix with kl sub-diagonals and ku super-diagonals,
for i = 1, ..., batch_count.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • trans[in] [rocblas_operation] indicates whether matrix A is tranposed (conjugated) or not.

  • m[in] [rocblas_int] number of rows of each matrix A_i.

  • n[in] [rocblas_int] number of columns of each matrix A_i.

  • kl[in] [rocblas_int] number of sub-diagonals of each A_i.

  • ku[in] [rocblas_int] number of super-diagonals of each A_i.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • A[in] device array of device pointers storing each banded matrix A_i. Leading (kl + ku + 1) by n part of the matrix contains the coefficients of the banded matrix. The leading diagonal resides in row (ku + 1) with the first super-diagonal above on the RHS of row ku. The first sub-diagonal resides below on the LHS of row ku + 2. This propagates up and down across sub/super-diagonals.

      Ex: (m = n = 7; ku = 2, kl = 2)
      1 2 3 0 0 0 0             0 0 3 3 3 3 3
      4 1 2 3 0 0 0             0 2 2 2 2 2 2
      5 4 1 2 3 0 0    ---->    1 1 1 1 1 1 1
      0 5 4 1 2 3 0             4 4 4 4 4 4 0
      0 0 5 4 1 2 3             5 5 5 5 5 0 0
      0 0 0 5 4 1 2             0 0 0 0 0 0 0
      0 0 0 0 5 4 1             0 0 0 0 0 0 0
    
    Note that the empty elements which do not correspond to data will not be referenced.

  • lda[in] [rocblas_int] specifies the leading dimension of each A_i. Must be >= (kl + ku + 1)

  • x[in] device array of device pointers storing each vector x_i.

  • incx[in] [rocblas_int] specifies the increment for the elements of each x_i.

  • beta[in] device pointer or host pointer to scalar beta.

  • y[inout] device array of device pointers storing each vector y_i.

  • incy[in] [rocblas_int] specifies the increment for the elements of each y_i.

  • batch_count[in] [rocblas_int] specifies the number of instances in the batch.

gbmv_batched functions support the _64 interface. Parameters m,`n`,`kl` and ku larger than int32_t max value are not currently supported. Refer to section ILP64 Interface.

rocblas_status rocblas_sgbmv_strided_batched(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, rocblas_int kl, rocblas_int ku, const float *alpha, const float *A, rocblas_int lda, rocblas_stride stride_A, const float *x, rocblas_int incx, rocblas_stride stride_x, const float *beta, float *y, rocblas_int incy, rocblas_stride stride_y, rocblas_int batch_count)#
rocblas_status rocblas_dgbmv_strided_batched(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, rocblas_int kl, rocblas_int ku, const double *alpha, const double *A, rocblas_int lda, rocblas_stride stride_A, const double *x, rocblas_int incx, rocblas_stride stride_x, const double *beta, double *y, rocblas_int incy, rocblas_stride stride_y, rocblas_int batch_count)#
rocblas_status rocblas_cgbmv_strided_batched(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, rocblas_int kl, rocblas_int ku, const rocblas_float_complex *alpha, const rocblas_float_complex *A, rocblas_int lda, rocblas_stride stride_A, const rocblas_float_complex *x, rocblas_int incx, rocblas_stride stride_x, const rocblas_float_complex *beta, rocblas_float_complex *y, rocblas_int incy, rocblas_stride stride_y, rocblas_int batch_count)#
rocblas_status rocblas_zgbmv_strided_batched(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, rocblas_int kl, rocblas_int ku, const rocblas_double_complex *alpha, const rocblas_double_complex *A, rocblas_int lda, rocblas_stride stride_A, const rocblas_double_complex *x, rocblas_int incx, rocblas_stride stride_x, const rocblas_double_complex *beta, rocblas_double_complex *y, rocblas_int incy, rocblas_stride stride_y, rocblas_int batch_count)#

BLAS Level 2 API

gbmv_strided_batched performs one of the matrix-vector operations:

y_i := alpha*A_i*x_i    + beta*y_i,   or
y_i := alpha*A_i**T*x_i + beta*y_i,   or
y_i := alpha*A_i**H*x_i + beta*y_i,
where (A_i, x_i, y_i) is the i-th instance of the batch.
alpha and beta are scalars, x_i and y_i are vectors and A_i is an
m by n banded matrix with kl sub-diagonals and ku super-diagonals,
for i = 1, ..., batch_count.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • trans[in] [rocblas_operation] indicates whether matrix A is tranposed (conjugated) or not.

  • m[in] [rocblas_int] number of rows of matrix A.

  • n[in] [rocblas_int] number of columns of matrix A.

  • kl[in] [rocblas_int] number of sub-diagonals of A.

  • ku[in] [rocblas_int] number of super-diagonals of A.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • A[in] device pointer to first banded matrix (A_1). Leading (kl + ku + 1) by n part of the matrix contains the coefficients of the banded matrix. The leading diagonal resides in row (ku + 1) with the first super-diagonal above on the RHS of row ku. The first sub-diagonal resides below on the LHS of row ku + 2. This propagates up and down across sub/super-diagonals.

      Ex: (m = n = 7; ku = 2, kl = 2)
      1 2 3 0 0 0 0             0 0 3 3 3 3 3
      4 1 2 3 0 0 0             0 2 2 2 2 2 2
      5 4 1 2 3 0 0    ---->    1 1 1 1 1 1 1
      0 5 4 1 2 3 0             4 4 4 4 4 4 0
      0 0 5 4 1 2 3             5 5 5 5 5 0 0
      0 0 0 5 4 1 2             0 0 0 0 0 0 0
      0 0 0 0 5 4 1             0 0 0 0 0 0 0
    
    Note that the empty elements which do not correspond to data will not be referenced.

  • lda[in] [rocblas_int] specifies the leading dimension of A. Must be >= (kl + ku + 1).

  • stride_A[in] [rocblas_stride] stride from the start of one matrix (A_i) and the next one (A_i+1).

  • x[in] device pointer to first vector (x_1).

  • incx[in] [rocblas_int] specifies the increment for the elements of x.

  • stride_x[in] [rocblas_stride] stride from the start of one vector (x_i) and the next one (x_i+1).

  • beta[in] device pointer or host pointer to scalar beta.

  • y[inout] device pointer to first vector (y_1).

  • incy[in] [rocblas_int] specifies the increment for the elements of y.

  • stride_y[in] [rocblas_stride] stride from the start of one vector (y_i) and the next one (x_i+1).

  • batch_count[in] [rocblas_int] specifies the number of instances in the batch.

gbmv_strided_batched functions support the _64 interface. Parameters m,`n`,`kl` and ku larger than int32_t max value are not currently supported. Refer to section ILP64 Interface.

rocblas_Xgemv + batched, strided_batched#

rocblas_status rocblas_sgemv(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, const float *alpha, const float *A, rocblas_int lda, const float *x, rocblas_int incx, const float *beta, float *y, rocblas_int incy)#
rocblas_status rocblas_dgemv(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, const double *alpha, const double *A, rocblas_int lda, const double *x, rocblas_int incx, const double *beta, double *y, rocblas_int incy)#
rocblas_status rocblas_cgemv(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *A, rocblas_int lda, const rocblas_float_complex *x, rocblas_int incx, const rocblas_float_complex *beta, rocblas_float_complex *y, rocblas_int incy)#
rocblas_status rocblas_zgemv(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *A, rocblas_int lda, const rocblas_double_complex *x, rocblas_int incx, const rocblas_double_complex *beta, rocblas_double_complex *y, rocblas_int incy)#

BLAS Level 2 API

gemv performs one of the matrix-vector operations:

y := alpha*A*x    + beta*y,   or
y := alpha*A**T*x + beta*y,   or
y := alpha*A**H*x + beta*y,
where alpha and beta are scalars, x and y are vectors and A is an
m by n matrix.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • trans[in] [rocblas_operation] indicates whether matrix A is tranposed (conjugated) or not.

  • m[in] [rocblas_int] number of rows of matrix A.

  • n[in] [rocblas_int] number of columns of matrix A.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • A[in] device pointer storing matrix A.

  • lda[in] [rocblas_int] specifies the leading dimension of A.

  • x[in] device pointer storing vector x.

  • incx[in] [rocblas_int] specifies the increment for the elements of x.

  • beta[in] device pointer or host pointer to scalar beta.

  • y[inout] device pointer storing vector y.

  • incy[in] [rocblas_int] specifies the increment for the elements of y.

gemv functions have an implementation which uses atomic operations. See section Atomic Operations for more information. The gemv functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_status rocblas_sgemv_batched(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, const float *alpha, const float *const A[], rocblas_int lda, const float *const x[], rocblas_int incx, const float *beta, float *const y[], rocblas_int incy, rocblas_int batch_count)#
rocblas_status rocblas_dgemv_batched(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, const double *alpha, const double *const A[], rocblas_int lda, const double *const x[], rocblas_int incx, const double *beta, double *const y[], rocblas_int incy, rocblas_int batch_count)#
rocblas_status rocblas_cgemv_batched(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *const A[], rocblas_int lda, const rocblas_float_complex *const x[], rocblas_int incx, const rocblas_float_complex *beta, rocblas_float_complex *const y[], rocblas_int incy, rocblas_int batch_count)#
rocblas_status rocblas_zgemv_batched(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *const A[], rocblas_int lda, const rocblas_double_complex *const x[], rocblas_int incx, const rocblas_double_complex *beta, rocblas_double_complex *const y[], rocblas_int incy, rocblas_int batch_count)#
rocblas_status rocblas_hshgemv_batched(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, const float *alpha, const rocblas_half *const A[], rocblas_int lda, const rocblas_half *const x[], rocblas_int incx, const float *beta, rocblas_half *const y[], rocblas_int incy, rocblas_int batch_count)#
rocblas_status rocblas_hssgemv_batched(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, const float *alpha, const rocblas_half *const A[], rocblas_int lda, const rocblas_half *const x[], rocblas_int incx, const float *beta, float *const y[], rocblas_int incy, rocblas_int batch_count)#
rocblas_status rocblas_tstgemv_batched(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, const float *alpha, const rocblas_bfloat16 *const A[], rocblas_int lda, const rocblas_bfloat16 *const x[], rocblas_int incx, const float *beta, rocblas_bfloat16 *const y[], rocblas_int incy, rocblas_int batch_count)#
rocblas_status rocblas_tssgemv_batched(rocblas_handle handle, rocblas_operation trans, rocblas_int m, rocblas_int n, const float *alpha, const rocblas_bfloat16 *const A[], rocblas_int lda, const rocblas_bfloat16 *const x[], rocblas_int incx, const float *beta, float *const y[], rocblas_int incy, rocblas_int batch_count)#

BLAS Level 2 API

gemv_batched performs a batch of matrix-vector operations:

y_i := alpha*A_i*x_i    + beta*y_i,   or
y_i := alpha*A_i**T*x_i + beta*y_i,   or
y_i := alpha*A_i**H*x_i + beta*y_i,
where (A_i, x_i, y_i) is the i-th instance of the batch.
alpha and beta are scalars, x_i and y_i are vectors and A_i is an
m by n matrix, for i = 1, ..., batch_count.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • trans[in] [rocblas_operation] indicates whether matrices A_i are tranposed (conjugated) or not.

  • m[in] [rocblas_int] number of rows of each matrix A_i.

  • n[in] [rocblas_int] number of columns of each matrix A_i.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • A[in] device array of device pointers storing each matrix A_i.

  • lda[in] [rocblas_int] specifies the leading dimension of each matrix A_i.

  • x[in] device array of device pointers storing each vector x_i.

  • incx[in] [rocblas_int] specifies the increment for the elements of each vector x_i.

  • beta[in] device pointer or host pointer to scalar beta.

  • y[inout] device array of device pointers storing each vector y_i.

  • incy[in] [rocblas_int] specifies the increment for the elements of each vector y_i.

  • batch_count[in] [rocblas_int] number of instances in the batch.

gemv_batched functions have an implementation which uses atomic operations. See section Atomic Operations for more information. The gemv_batched functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_status rocblas_sgemv_strided_batched(rocblas_handle handle, rocblas_operation transA, rocblas_int m, rocblas_int n, const float *alpha, const float *A, rocblas_int lda, rocblas_stride strideA, const float *x, rocblas_int incx, rocblas_stride stridex, const float *beta, float *y, rocblas_int incy, rocblas_stride stridey, rocblas_int batch_count)#
rocblas_status rocblas_dgemv_strided_batched(rocblas_handle handle, rocblas_operation transA, rocblas_int m, rocblas_int n, const double *alpha, const double *A, rocblas_int lda, rocblas_stride strideA, const double *x, rocblas_int incx, rocblas_stride stridex, const double *beta, double *y, rocblas_int incy, rocblas_stride stridey, rocblas_int batch_count)#
rocblas_status rocblas_cgemv_strided_batched(rocblas_handle handle, rocblas_operation transA, rocblas_int m, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *A, rocblas_int lda, rocblas_stride strideA, const rocblas_float_complex *x, rocblas_int incx, rocblas_stride stridex, const rocblas_float_complex *beta, rocblas_float_complex *y, rocblas_int incy, rocblas_stride stridey, rocblas_int batch_count)#
rocblas_status rocblas_zgemv_strided_batched(rocblas_handle handle, rocblas_operation transA, rocblas_int m, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *A, rocblas_int lda, rocblas_stride strideA, const rocblas_double_complex *x, rocblas_int incx, rocblas_stride stridex, const rocblas_double_complex *beta, rocblas_double_complex *y, rocblas_int incy, rocblas_stride stridey, rocblas_int batch_count)#
rocblas_status rocblas_hshgemv_strided_batched(rocblas_handle handle, rocblas_operation transA, rocblas_int m, rocblas_int n, const float *alpha, const rocblas_half *A, rocblas_int lda, rocblas_stride strideA, const rocblas_half *x, rocblas_int incx, rocblas_stride stridex, const float *beta, rocblas_half *y, rocblas_int incy, rocblas_stride stridey, rocblas_int batch_count)#
rocblas_status rocblas_hssgemv_strided_batched(rocblas_handle handle, rocblas_operation transA, rocblas_int m, rocblas_int n, const float *alpha, const rocblas_half *A, rocblas_int lda, rocblas_stride strideA, const rocblas_half *x, rocblas_int incx, rocblas_stride stridex, const float *beta, float *y, rocblas_int incy, rocblas_stride stridey, rocblas_int batch_count)#
rocblas_status rocblas_tstgemv_strided_batched(rocblas_handle handle, rocblas_operation transA, rocblas_int m, rocblas_int n, const float *alpha, const rocblas_bfloat16 *A, rocblas_int lda, rocblas_stride strideA, const rocblas_bfloat16 *x, rocblas_int incx, rocblas_stride stridex, const float *beta, rocblas_bfloat16 *y, rocblas_int incy, rocblas_stride stridey, rocblas_int batch_count)#
rocblas_status rocblas_tssgemv_strided_batched(rocblas_handle handle, rocblas_operation transA, rocblas_int m, rocblas_int n, const float *alpha, const rocblas_bfloat16 *A, rocblas_int lda, rocblas_stride strideA, const rocblas_bfloat16 *x, rocblas_int incx, rocblas_stride stridex, const float *beta, float *y, rocblas_int incy, rocblas_stride stridey, rocblas_int batch_count)#

BLAS Level 2 API

gemv_strided_batched performs a batch of matrix-vector operations:

y_i := alpha*A_i*x_i    + beta*y_i,   or
y_i := alpha*A_i**T*x_i + beta*y_i,   or
y_i := alpha*A_i**H*x_i + beta*y_i,
where (A_i, x_i, y_i) is the i-th instance of the batch.
alpha and beta are scalars, x_i and y_i are vectors and A_i is an
m by n matrix, for i = 1, ..., batch_count.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • transA[in] [rocblas_operation] indicates whether matrices A_i are tranposed (conjugated) or not.

  • m[in] [rocblas_int] number of rows of matrices A_i.

  • n[in] [rocblas_int] number of columns of matrices A_i.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • A[in] device pointer to the first matrix (A_1) in the batch.

  • lda[in] [rocblas_int] specifies the leading dimension of matrices A_i.

  • strideA[in] [rocblas_stride] stride from the start of one matrix (A_i) and the next one (A_i+1).

  • x[in] device pointer to the first vector (x_1) in the batch.

  • incx[in] [rocblas_int] specifies the increment for the elements of vectors x_i.

  • stridex[in] [rocblas_stride] stride from the start of one vector (x_i) and the next one (x_i+1). There are no restrictions placed on stride_x. However, ensure that stride_x is of appropriate size. When trans equals rocblas_operation_none this typically means stride_x >= n * incx, otherwise stride_x >= m * incx.

  • beta[in] device pointer or host pointer to scalar beta.

  • y[inout] device pointer to the first vector (y_1) in the batch.

  • incy[in] [rocblas_int] specifies the increment for the elements of vectors y_i.

  • stridey[in] [rocblas_stride] stride from the start of one vector (y_i) and the next one (y_i+1). There are no restrictions placed on stride_y. However, ensure that stride_y is of appropriate size. When trans equals rocblas_operation_none this typically means stride_y >= m * incy, otherwise stride_y >= n * incy. stridey should be non zero.

  • batch_count[in] [rocblas_int] number of instances in the batch.

gemv_strided_batched functions have an implementation which uses atomic operations. See section Atomic Operations for more information. The gemv_strided_batched functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_Xger + batched, strided_batched#

rocblas_status rocblas_sger(rocblas_handle handle, rocblas_int m, rocblas_int n, const float *alpha, const float *x, rocblas_int incx, const float *y, rocblas_int incy, float *A, rocblas_int lda)#
rocblas_status rocblas_dger(rocblas_handle handle, rocblas_int m, rocblas_int n, const double *alpha, const double *x, rocblas_int incx, const double *y, rocblas_int incy, double *A, rocblas_int lda)#
rocblas_status rocblas_cgeru(rocblas_handle handle, rocblas_int m, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *x, rocblas_int incx, const rocblas_float_complex *y, rocblas_int incy, rocblas_float_complex *A, rocblas_int lda)#
rocblas_status rocblas_zgeru(rocblas_handle handle, rocblas_int m, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *x, rocblas_int incx, const rocblas_double_complex *y, rocblas_int incy, rocblas_double_complex *A, rocblas_int lda)#
rocblas_status rocblas_cgerc(rocblas_handle handle, rocblas_int m, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *x, rocblas_int incx, const rocblas_float_complex *y, rocblas_int incy, rocblas_float_complex *A, rocblas_int lda)#
rocblas_status rocblas_zgerc(rocblas_handle handle, rocblas_int m, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *x, rocblas_int incx, const rocblas_double_complex *y, rocblas_int incy, rocblas_double_complex *A, rocblas_int lda)#

BLAS Level 2 API

ger,geru,gerc performs the matrix-vector operations:

A := A + alpha*x*y**T , OR
A := A + alpha*x*y**H for gerc
where alpha is a scalar, x and y are vectors, and A is an
m by n matrix.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • m[in] [rocblas_int] the number of rows of the matrix A.

  • n[in] [rocblas_int] the number of columns of the matrix A.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • x[in] device pointer storing vector x.

  • incx[in] [rocblas_int] specifies the increment for the elements of x.

  • y[in] device pointer storing vector y.

  • incy[in] [rocblas_int] specifies the increment for the elements of y.

  • A[inout] device pointer storing matrix A.

  • lda[in] [rocblas_int] specifies the leading dimension of A.

The ger, geru, and gerc functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_status rocblas_sger_batched(rocblas_handle handle, rocblas_int m, rocblas_int n, const float *alpha, const float *const x[], rocblas_int incx, const float *const y[], rocblas_int incy, float *const A[], rocblas_int lda, rocblas_int batch_count)#
rocblas_status rocblas_dger_batched(rocblas_handle handle, rocblas_int m, rocblas_int n, const double *alpha, const double *const x[], rocblas_int incx, const double *const y[], rocblas_int incy, double *const A[], rocblas_int lda, rocblas_int batch_count)#
rocblas_status rocblas_cgeru_batched(rocblas_handle handle, rocblas_int m, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *const x[], rocblas_int incx, const rocblas_float_complex *const y[], rocblas_int incy, rocblas_float_complex *const A[], rocblas_int lda, rocblas_int batch_count)#
rocblas_status rocblas_zgeru_batched(rocblas_handle handle, rocblas_int m, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *const x[], rocblas_int incx, const rocblas_double_complex *const y[], rocblas_int incy, rocblas_double_complex *const A[], rocblas_int lda, rocblas_int batch_count)#
rocblas_status rocblas_cgerc_batched(rocblas_handle handle, rocblas_int m, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *const x[], rocblas_int incx, const rocblas_float_complex *const y[], rocblas_int incy, rocblas_float_complex *const A[], rocblas_int lda, rocblas_int batch_count)#
rocblas_status rocblas_zgerc_batched(rocblas_handle handle, rocblas_int m, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *const x[], rocblas_int incx, const rocblas_double_complex *const y[], rocblas_int incy, rocblas_double_complex *const A[], rocblas_int lda, rocblas_int batch_count)#

BLAS Level 2 API

ger_batched,geru_batched,gerc_batched perform a batch of the matrix-vector operations:

A := A + alpha*x*y**T , OR
A := A + alpha*x*y**H for gerc
where (A_i, x_i, y_i) is the i-th instance of the batch.
alpha is a scalar, x_i and y_i are vectors and A_i is an
m by n matrix, for i = 1, ..., batch_count.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • m[in] [rocblas_int] the number of rows of each matrix A_i.

  • n[in] [rocblas_int] the number of columns of each matrix A_i.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • x[in] device array of device pointers storing each vector x_i.

  • incx[in] [rocblas_int] specifies the increment for the elements of each vector x_i.

  • y[in] device array of device pointers storing each vector y_i.

  • incy[in] [rocblas_int] specifies the increment for the elements of each vector y_i.

  • A[inout] device array of device pointers storing each matrix A_i.

  • lda[in] [rocblas_int] specifies the leading dimension of each A_i.

  • batch_count[in] [rocblas_int] number of instances in the batch.

The ger, geru, and gerc_batched functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_status rocblas_sger_strided_batched(rocblas_handle handle, rocblas_int m, rocblas_int n, const float *alpha, const float *x, rocblas_int incx, rocblas_stride stridex, const float *y, rocblas_int incy, rocblas_stride stridey, float *A, rocblas_int lda, rocblas_stride strideA, rocblas_int batch_count)#
rocblas_status rocblas_dger_strided_batched(rocblas_handle handle, rocblas_int m, rocblas_int n, const double *alpha, const double *x, rocblas_int incx, rocblas_stride stridex, const double *y, rocblas_int incy, rocblas_stride stridey, double *A, rocblas_int lda, rocblas_stride strideA, rocblas_int batch_count)#
rocblas_status rocblas_cgeru_strided_batched(rocblas_handle handle, rocblas_int m, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *x, rocblas_int incx, rocblas_stride stridex, const rocblas_float_complex *y, rocblas_int incy, rocblas_stride stridey, rocblas_float_complex *A, rocblas_int lda, rocblas_stride strideA, rocblas_int batch_count)#
rocblas_status rocblas_zgeru_strided_batched(rocblas_handle handle, rocblas_int m, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *x, rocblas_int incx, rocblas_stride stridex, const rocblas_double_complex *y, rocblas_int incy, rocblas_stride stridey, rocblas_double_complex *A, rocblas_int lda, rocblas_stride strideA, rocblas_int batch_count)#
rocblas_status rocblas_cgerc_strided_batched(rocblas_handle handle, rocblas_int m, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *x, rocblas_int incx, rocblas_stride stridex, const rocblas_float_complex *y, rocblas_int incy, rocblas_stride stridey, rocblas_float_complex *A, rocblas_int lda, rocblas_stride strideA, rocblas_int batch_count)#
rocblas_status rocblas_zgerc_strided_batched(rocblas_handle handle, rocblas_int m, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *x, rocblas_int incx, rocblas_stride stridex, const rocblas_double_complex *y, rocblas_int incy, rocblas_stride stridey, rocblas_double_complex *A, rocblas_int lda, rocblas_stride strideA, rocblas_int batch_count)#

BLAS Level 2 API

ger_strided_batched,geru_strided_batched,gerc_strided_batched performs the matrix-vector operations:

A_i := A_i + alpha*x_i*y_i**T, OR
A_i := A_i + alpha*x_i*y_i**H  for gerc
where (A_i, x_i, y_i) is the i-th instance of the batch.
alpha is a scalar, x_i and y_i are vectors and A_i is an
m by n matrix, for i = 1, ..., batch_count.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • m[in] [rocblas_int] the number of rows of each matrix A_i.

  • n[in] [rocblas_int] the number of columns of each matrix A_i.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • x[in] device pointer to the first vector (x_1) in the batch.

  • incx[in] [rocblas_int] specifies the increments for the elements of each vector x_i.

  • stridex[in] [rocblas_stride] stride from the start of one vector (x_i) and the next one (x_i+1). There are no restrictions placed on stride_x. However, ensure that stride_x is of appropriate size. For a typical case this means stride_x >= m * incx.

  • y[inout] device pointer to the first vector (y_1) in the batch.

  • incy[in] [rocblas_int] specifies the increment for the elements of each vector y_i.

  • stridey[in] [rocblas_stride] stride from the start of one vector (y_i) and the next one (y_i+1). There are no restrictions placed on stride_y. However, ensure that stride_y is of appropriate size. For a typical case this means stride_y >= n * incy.

  • A[inout] device pointer to the first matrix (A_1) in the batch.

  • lda[in] [rocblas_int] specifies the leading dimension of each A_i.

  • strideA[in] [rocblas_stride] stride from the start of one matrix (A_i) and the next one (A_i+1)

  • batch_count[in] [rocblas_int] number of instances in the batch.

The ger, geru, and gerc_strided_batched functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_Xsbmv + batched, strided_batched#

rocblas_status rocblas_ssbmv(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, rocblas_int k, const float *alpha, const float *A, rocblas_int lda, const float *x, rocblas_int incx, const float *beta, float *y, rocblas_int incy)#
rocblas_status rocblas_dsbmv(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, rocblas_int k, const double *alpha, const double *A, rocblas_int lda, const double *x, rocblas_int incx, const double *beta, double *y, rocblas_int incy)#

BLAS Level 2 API

sbmv performs the matrix-vector operation:

y := alpha*A*x + beta*y
where alpha and beta are scalars, x and y are n element vectors and
A should contain an upper or lower triangular n by n symmetric banded matrix.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] rocblas_fill specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • if rocblas_fill_upper, the lower part of A is not referenced

    • if rocblas_fill_lower, the upper part of A is not referenced

  • n[in] [rocblas_int]

  • k[in] [rocblas_int] specifies the number of sub- and super-diagonals.

  • alpha[in] specifies the scalar alpha.

  • A[in] pointer storing matrix A on the GPU.

  • lda[in] [rocblas_int] specifies the leading dimension of matrix A.

  • x[in] pointer storing vector x on the GPU.

  • incx[in] [rocblas_int] specifies the increment for the elements of x.

  • beta[in] specifies the scalar beta.

  • y[out] pointer storing vector y on the GPU.

  • incy[in] [rocblas_int] specifies the increment for the elements of y.

The sbmv functions support the _64 interface. Parameters n and k larger than int32_t max value are not currently supported. Refer to section ILP64 Interface.

rocblas_status rocblas_ssbmv_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, rocblas_int k, const float *alpha, const float *const A[], rocblas_int lda, const float *const x[], rocblas_int incx, const float *beta, float *const y[], rocblas_int incy, rocblas_int batch_count)#
rocblas_status rocblas_dsbmv_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, rocblas_int k, const double *alpha, const double *const A[], rocblas_int lda, const double *const x[], rocblas_int incx, const double *beta, double *const y[], rocblas_int incy, rocblas_int batch_count)#

BLAS Level 2 API

sbmv_batched performs the matrix-vector operation:

y_i := alpha*A_i*x_i + beta*y_i
where (A_i, x_i, y_i) is the i-th instance of the batch.
alpha and beta are scalars, x_i and y_i are vectors and A_i is an
n by n symmetric banded matrix, for i = 1, ..., batch_count.
A should contain an upper or lower triangular n by n symmetric banded matrix.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • if rocblas_fill_upper, the lower part of A is not referenced

    • if rocblas_fill_lower, the upper part of A is not referenced

  • n[in] [rocblas_int] number of rows and columns of each matrix A_i.

  • k[in] [rocblas_int] specifies the number of sub- and super-diagonals.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • A[in] device array of device pointers storing each matrix A_i.

  • lda[in] [rocblas_int] specifies the leading dimension of each matrix A_i.

  • x[in] device array of device pointers storing each vector x_i.

  • incx[in] [rocblas_int] specifies the increment for the elements of each vector x_i.

  • beta[in] device pointer or host pointer to scalar beta.

  • y[out] device array of device pointers storing each vector y_i.

  • incy[in] [rocblas_int] specifies the increment for the elements of each vector y_i.

  • batch_count[in] [rocblas_int] number of instances in the batch.

The sbmv_batched functions support the _64 interface. Parameters n and k larger than int32_t max value are not currently supported. Refer to section ILP64 Interface.

rocblas_status rocblas_ssbmv_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, rocblas_int k, const float *alpha, const float *A, rocblas_int lda, rocblas_stride strideA, const float *x, rocblas_int incx, rocblas_stride stridex, const float *beta, float *y, rocblas_int incy, rocblas_stride stridey, rocblas_int batch_count)#
rocblas_status rocblas_dsbmv_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, rocblas_int k, const double *alpha, const double *A, rocblas_int lda, rocblas_stride strideA, const double *x, rocblas_int incx, rocblas_stride stridex, const double *beta, double *y, rocblas_int incy, rocblas_stride stridey, rocblas_int batch_count)#

BLAS Level 2 API

sbmv_strided_batched performs the matrix-vector operation:

y_i := alpha*A_i*x_i + beta*y_i
where (A_i, x_i, y_i) is the i-th instance of the batch.
alpha and beta are scalars, x_i and y_i are vectors and A_i is an
n by n symmetric banded matrix, for i = 1, ..., batch_count.
A should contain an upper or lower triangular n by n symmetric banded matrix.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • if rocblas_fill_upper, the lower part of A is not referenced

    • if rocblas_fill_lower, the upper part of A is not referenced

  • n[in] [rocblas_int] number of rows and columns of each matrix A_i.

  • k[in] [rocblas_int] specifies the number of sub- and super-diagonals.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • A[in] Device pointer to the first matrix A_1 on the GPU.

  • lda[in] [rocblas_int] specifies the leading dimension of each matrix A_i.

  • strideA[in] [rocblas_stride] stride from the start of one matrix (A_i) and the next one (A_i+1).

  • x[in] Device pointer to the first vector x_1 on the GPU.

  • incx[in] [rocblas_int] specifies the increment for the elements of each vector x_i.

  • stridex[in] [rocblas_stride] stride from the start of one vector (x_i) and the next one (x_i+1). There are no restrictions placed on stridex. However, ensure that stridex is of appropriate size. This typically means stridex >= n * incx. stridex should be non zero.

  • beta[in] device pointer or host pointer to scalar beta.

  • y[out] Device pointer to the first vector y_1 on the GPU.

  • incy[in] [rocblas_int] specifies the increment for the elements of each vector y_i.

  • stridey[in] [rocblas_stride] stride from the start of one vector (y_i) and the next one (y_i+1). There are no restrictions placed on stridey. However, ensure that stridey is of appropriate size. This typically means stridey >= n * incy. stridey should be non zero.

  • batch_count[in] [rocblas_int] number of instances in the batch.

The sbmv_strided_batched functions support the _64 interface. Parameters n and k larger than int32_t max value are not currently supported. Refer to section ILP64 Interface.

rocblas_Xspmv + batched, strided_batched#

rocblas_status rocblas_sspmv(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *A, const float *x, rocblas_int incx, const float *beta, float *y, rocblas_int incy)#
rocblas_status rocblas_dspmv(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *A, const double *x, rocblas_int incx, const double *beta, double *y, rocblas_int incy)#

BLAS Level 2 API

spmv performs the matrix-vector operation:

y := alpha*A*x + beta*y
where alpha and beta are scalars, x and y are n element vectors and
A should contain an upper or lower triangular n by n packed symmetric matrix.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] rocblas_fill specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • if rocblas_fill_upper, the lower part of A is not referenced

    • if rocblas_fill_lower, the upper part of A is not referenced

  • n[in] [rocblas_int]

  • alpha[in] specifies the scalar alpha.

  • A[in] pointer storing matrix A on the GPU.

  • x[in] pointer storing vector x on the GPU.

  • incx[in] [rocblas_int] specifies the increment for the elements of x.

  • beta[in] specifies the scalar beta.

  • y[out] pointer storing vector y on the GPU.

  • incy[in] [rocblas_int] specifies the increment for the elements of y.

The spmv functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_status rocblas_sspmv_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *const A[], const float *const x[], rocblas_int incx, const float *beta, float *const y[], rocblas_int incy, rocblas_int batch_count)#
rocblas_status rocblas_dspmv_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *const A[], const double *const x[], rocblas_int incx, const double *beta, double *const y[], rocblas_int incy, rocblas_int batch_count)#

BLAS Level 2 API

spmv_batched performs the matrix-vector operation:

y_i := alpha*A_i*x_i + beta*y_i
where (A_i, x_i, y_i) is the i-th instance of the batch.
alpha and beta are scalars, x_i and y_i are vectors and A_i is an
n by n symmetric matrix, for i = 1, ..., batch_count.
A should contain an upper or lower triangular n by n packed symmetric matrix.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • if rocblas_fill_upper, the lower part of A is not referenced

    • if rocblas_fill_lower, the upper part of A is not referenced

  • n[in] [rocblas_int] number of rows and columns of each matrix A_i.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • A[in] device array of device pointers storing each matrix A_i.

  • x[in] device array of device pointers storing each vector x_i.

  • incx[in] [rocblas_int] specifies the increment for the elements of each vector x_i.

  • beta[in] device pointer or host pointer to scalar beta.

  • y[out] device array of device pointers storing each vector y_i.

  • incy[in] [rocblas_int] specifies the increment for the elements of each vector y_i.

  • batch_count[in] [rocblas_int] number of instances in the batch.

The spmv_batched functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_status rocblas_sspmv_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *A, rocblas_stride strideA, const float *x, rocblas_int incx, rocblas_stride stridex, const float *beta, float *y, rocblas_int incy, rocblas_stride stridey, rocblas_int batch_count)#
rocblas_status rocblas_dspmv_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *A, rocblas_stride strideA, const double *x, rocblas_int incx, rocblas_stride stridex, const double *beta, double *y, rocblas_int incy, rocblas_stride stridey, rocblas_int batch_count)#

BLAS Level 2 API

spmv_strided_batched performs the matrix-vector operation:

y_i := alpha*A_i*x_i + beta*y_i
where (A_i, x_i, y_i) is the i-th instance of the batch.
alpha and beta are scalars, x_i and y_i are vectors and A_i is an
n by n symmetric matrix, for i = 1, ..., batch_count.
A should contain an upper or lower triangular n by n packed symmetric matrix.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • if rocblas_fill_upper, the lower part of A is not referenced

    • if rocblas_fill_lower, the upper part of A is not referenced

  • n[in] [rocblas_int] number of rows and columns of each matrix A_i.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • A[in] Device pointer to the first matrix A_1 on the GPU.

  • strideA[in] [rocblas_stride] stride from the start of one matrix (A_i) and the next one (A_i+1).

  • x[in] Device pointer to the first vector x_1 on the GPU.

  • incx[in] [rocblas_int] specifies the increment for the elements of each vector x_i.

  • stridex[in] [rocblas_stride] stride from the start of one vector (x_i) and the next one (x_i+1). There are no restrictions placed on stridex. However, ensure that stridex is of appropriate size. This typically means stridex >= n * incx. stridex should be non zero.

  • beta[in] device pointer or host pointer to scalar beta.

  • y[out] Device pointer to the first vector y_1 on the GPU.

  • incy[in] [rocblas_int] specifies the increment for the elements of each vector y_i.

  • stridey[in] [rocblas_stride] stride from the start of one vector (y_i) and the next one (y_i+1). There are no restrictions placed on stridey. However, ensure that stridey is of appropriate size. This typically means stridey >= n * incy. stridey should be non zero.

  • batch_count[in] [rocblas_int] number of instances in the batch.

The spmv_strided_batched functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_Xspr + batched, strided_batched#

rocblas_status rocblas_sspr(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *x, rocblas_int incx, float *AP)#
rocblas_status rocblas_dspr(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *x, rocblas_int incx, double *AP)#
rocblas_status rocblas_cspr(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *x, rocblas_int incx, rocblas_float_complex *AP)#
rocblas_status rocblas_zspr(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *x, rocblas_int incx, rocblas_double_complex *AP)#

BLAS Level 2 API

spr performs the matrix-vector operations:

A := A + alpha*x*x**T
where alpha is a scalar, x is a vector, and A is an
n by n symmetric matrix, supplied in packed form.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • rocblas_fill_upper: The upper triangular part of A is supplied in AP.

    • rocblas_fill_lower: The lower triangular part of A is supplied in AP.

  • n[in] [rocblas_int] the number of rows and columns of matrix A. Must be at least 0.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • x[in] device pointer storing vector x.

  • incx[in] [rocblas_int] specifies the increment for the elements of x.

  • AP[inout] device pointer storing the packed version of the specified triangular portion of the symmetric matrix A. Of at least size ((n * (n + 1)) / 2).

      if uplo == rocblas_fill_upper:
          The upper triangular portion of the symmetric matrix A is supplied.
          The matrix is compacted so that AP contains the triangular portion
          column-by-column
          so that:
          AP(0) = A(0,0)
          AP(1) = A(0,1)
          AP(2) = A(1,1), etc.
              Ex: (rocblas_fill_upper; n = 4)
                  1 2 4 7
                  2 3 5 8   -----> [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
                  4 5 6 9
                  7 8 9 0
    
      if uplo == rocblas_fill_lower:
          The lower triangular portion of the symmetric matrix A is supplied.
          The matrix is compacted so that AP contains the triangular portion
          column-by-column
          so that:
          AP(0) = A(0,0)
          AP(1) = A(1,0)
          AP(2) = A(2,1), etc.
              Ex: (rocblas_fill_lower; n = 4)
                  1 2 3 4
                  2 5 6 7    -----> [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
                  3 6 8 9
                  4 7 9 0
    

The spr functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_status rocblas_sspr_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *const x[], rocblas_int incx, float *const AP[], rocblas_int batch_count)#
rocblas_status rocblas_dspr_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *const x[], rocblas_int incx, double *const AP[], rocblas_int batch_count)#
rocblas_status rocblas_cspr_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *const x[], rocblas_int incx, rocblas_float_complex *const AP[], rocblas_int batch_count)#
rocblas_status rocblas_zspr_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *const x[], rocblas_int incx, rocblas_double_complex *const AP[], rocblas_int batch_count)#

BLAS Level 2 API

spr_batched performs the matrix-vector operations:

A_i := A_i + alpha*x_i*x_i**T
where alpha is a scalar, x_i is a vector, and A_i is an
n by n symmetric matrix, supplied in packed form, for i = 1, ..., batch_count.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • rocblas_fill_upper: The upper triangular part of each A_i is supplied in AP.

    • rocblas_fill_lower: The lower triangular part of each A_i is supplied in AP.

  • n[in] [rocblas_int] the number of rows and columns of each matrix A_i. Must be at least 0.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • x[in] device array of device pointers storing each vector x_i.

  • incx[in] [rocblas_int] specifies the increment for the elements of each x_i.

  • AP[inout] device array of device pointers storing the packed version of the specified triangular portion of each symmetric matrix A_i of at least size ((n * (n + 1)) / 2). Array is of at least size batch_count.

      if uplo == rocblas_fill_upper:
          The upper triangular portion of each symmetric matrix A_i is supplied.
          The matrix is compacted so that AP contains the triangular portion
          column-by-column
          so that:
          AP(0) = A(0,0)
          AP(1) = A(0,1)
          AP(2) = A(1,1), etc.
              Ex: (rocblas_fill_upper; n = 4)
                  1 2 4 7
                  2 3 5 8   -----> [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
                  4 5 6 9
                  7 8 9 0
    
      if uplo == rocblas_fill_lower:
          The lower triangular portion of each symmetric matrix A_i is supplied.
          The matrix is compacted so that AP contains the triangular portion
          column-by-column
          so that:
          AP(0) = A(0,0)
          AP(1) = A(1,0)
          AP(2) = A(2,1), etc.
              Ex: (rocblas_fill_lower; n = 4)
                  1 2 3 4
                  2 5 6 7    -----> [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
                  3 6 8 9
                  4 7 9 0
    

  • batch_count[in] [rocblas_int] number of instances in the batch.

The spr_batched functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_status rocblas_sspr_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *x, rocblas_int incx, rocblas_stride stride_x, float *AP, rocblas_stride stride_A, rocblas_int batch_count)#
rocblas_status rocblas_dspr_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *x, rocblas_int incx, rocblas_stride stride_x, double *AP, rocblas_stride stride_A, rocblas_int batch_count)#
rocblas_status rocblas_cspr_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *x, rocblas_int incx, rocblas_stride stride_x, rocblas_float_complex *AP, rocblas_stride stride_A, rocblas_int batch_count)#
rocblas_status rocblas_zspr_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *x, rocblas_int incx, rocblas_stride stride_x, rocblas_double_complex *AP, rocblas_stride stride_A, rocblas_int batch_count)#

BLAS Level 2 API

spr_strided_batched performs the matrix-vector operations:

A_i := A_i + alpha*x_i*x_i**T
where alpha is a scalar, x_i is a vector, and A_i is an
n by n symmetric matrix, supplied in packed form, for i = 1, ..., batch_count.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • rocblas_fill_upper: The upper triangular part of each A_i is supplied in AP.

    • rocblas_fill_lower: The lower triangular part of each A_i is supplied in AP.

  • n[in] [rocblas_int] the number of rows and columns of each matrix A_i. Must be at least 0.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • x[in] device pointer pointing to the first vector (x_1).

  • incx[in] [rocblas_int] specifies the increment for the elements of each x_i.

  • stride_x[in] [rocblas_stride] stride from the start of one vector (x_i) and the next one (x_i+1).

  • AP[inout] device pointer storing the packed version of the specified triangular portion of each symmetric matrix A_i. Points to the first A_1.

      if uplo == rocblas_fill_upper:
          The upper triangular portion of each symmetric matrix A_i is supplied.
          The matrix is compacted so that AP contains the triangular portion
          column-by-column
          so that:
          AP(0) = A(0,0)
          AP(1) = A(0,1)
          AP(2) = A(1,1), etc.
              Ex: (rocblas_fill_upper; n = 4)
                  1 2 4 7
                  2 3 5 8   -----> [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
                  4 5 6 9
                  7 8 9 0
    
      if uplo == rocblas_fill_lower:
          The lower triangular portion of each symmetric matrix A_i is supplied.
          The matrix is compacted so that AP contains the triangular portion
          column-by-column
          so that:
          AP(0) = A(0,0)
          AP(1) = A(1,0)
          AP(2) = A(2,1), etc.
              Ex: (rocblas_fill_lower; n = 4)
                  1 2 3 4
                  2 5 6 7    -----> [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
                  3 6 8 9
                  4 7 9 0
    

  • stride_A[in] [rocblas_stride] stride from the start of one (A_i) and the next (A_i+1).

  • batch_count[in] [rocblas_int] number of instances in the batch.

The spr_strided_batched functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_Xspr2 + batched, strided_batched#

rocblas_status rocblas_sspr2(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *x, rocblas_int incx, const float *y, rocblas_int incy, float *AP)#
rocblas_status rocblas_dspr2(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *x, rocblas_int incx, const double *y, rocblas_int incy, double *AP)#

BLAS Level 2 API

spr2 performs the matrix-vector operation:

A := A + alpha*x*y**T + alpha*y*x**T
where alpha is a scalar, x and y are vectors, and A is an
n by n symmetric matrix, supplied in packed form.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • rocblas_fill_upper: The upper triangular part of A is supplied in AP.

    • rocblas_fill_lower: The lower triangular part of A is supplied in AP.

  • n[in] [rocblas_int] the number of rows and columns of matrix A. Must be at least 0.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • x[in] device pointer storing vector x.

  • incx[in] [rocblas_int] specifies the increment for the elements of x.

  • y[in] device pointer storing vector y.

  • incy[in] [rocblas_int] specifies the increment for the elements of y.

  • AP[inout] device pointer storing the packed version of the specified triangular portion of the symmetric matrix A. Of at least size ((n * (n + 1)) / 2).

      if uplo == rocblas_fill_upper:
          The upper triangular portion of the symmetric matrix A is supplied.
          The matrix is compacted so that AP contains the triangular portion
          column-by-column
          so that:
          AP(0) = A(0,0)
          AP(1) = A(0,1)
          AP(2) = A(1,1), etc.
              Ex: (rocblas_fill_upper; n = 4)
                  1 2 4 7
                  2 3 5 8   -----> [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
                  4 5 6 9
                  7 8 9 0
    
      if uplo == rocblas_fill_lower:
          The lower triangular portion of the symmetric matrix A is supplied.
          The matrix is compacted so that AP contains the triangular portion
          column-by-column
          so that:
          AP(0) = A(0,0)
          AP(1) = A(1,0)
          AP(n) = A(2,1), etc.
              Ex: (rocblas_fill_lower; n = 4)
                  1 2 3 4
                  2 5 6 7    -----> [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
                  3 6 8 9
                  4 7 9 0
    

The spr2 functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_status rocblas_sspr2_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *const x[], rocblas_int incx, const float *const y[], rocblas_int incy, float *const AP[], rocblas_int batch_count)#
rocblas_status rocblas_dspr2_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *const x[], rocblas_int incx, const double *const y[], rocblas_int incy, double *const AP[], rocblas_int batch_count)#

BLAS Level 2 API

spr2_batched performs the matrix-vector operation:

A_i := A_i + alpha*x_i*y_i**T + alpha*y_i*x_i**T
where alpha is a scalar, x_i and y_i are vectors, and A_i is an
n by n symmetric matrix, supplied in packed form, for i = 1, ..., batch_count.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • rocblas_fill_upper: The upper triangular part of each A_i is supplied in AP.

    • rocblas_fill_lower: The lower triangular part of each A_i is supplied in AP.

  • n[in] [rocblas_int] the number of rows and columns of each matrix A_i. Must be at least 0.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • x[in] device array of device pointers storing each vector x_i.

  • incx[in] [rocblas_int] specifies the increment for the elements of each x_i.

  • y[in] device array of device pointers storing each vector y_i.

  • incy[in] [rocblas_int] specifies the increment for the elements of each y_i.

  • AP[inout] device array of device pointers storing the packed version of the specified triangular portion of each symmetric matrix A_i of at least size ((n * (n + 1)) / 2). Array is of at least size batch_count.

      if uplo == rocblas_fill_upper:
          The upper triangular portion of each symmetric matrix A_i is supplied.
          The matrix is compacted so that AP contains the triangular portion
          column-by-column
          so that:
          AP(0) = A(0,0)
          AP(1) = A(0,1)
          AP(2) = A(1,1), etc.
              Ex: (rocblas_fill_upper; n = 4)
                  1 2 4 7
                  2 3 5 8   -----> [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
                  4 5 6 9
                  7 8 9 0
    
      if uplo == rocblas_fill_lower:
          The lower triangular portion of each symmetric matrix A_i is supplied.
          The matrix is compacted so that AP contains the triangular portion
          column-by-column
          so that:
          AP(0) = A(0,0)
          AP(1) = A(1,0)
          AP(n) = A(2,1), etc.
              Ex: (rocblas_fill_lower; n = 4)
                  1 2 3 4
                  2 5 6 7    -----> [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
                  3 6 8 9
                  4 7 9 0
    

  • batch_count[in] [rocblas_int] number of instances in the batch.

The spr2_batched functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_status rocblas_sspr2_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *x, rocblas_int incx, rocblas_stride stride_x, const float *y, rocblas_int incy, rocblas_stride stride_y, float *AP, rocblas_stride stride_A, rocblas_int batch_count)#
rocblas_status rocblas_dspr2_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *x, rocblas_int incx, rocblas_stride stride_x, const double *y, rocblas_int incy, rocblas_stride stride_y, double *AP, rocblas_stride stride_A, rocblas_int batch_count)#

BLAS Level 2 API

spr2_strided_batched performs the matrix-vector operation:

A_i := A_i + alpha*x_i*y_i**T + alpha*y_i*x_i**T
where alpha is a scalar, x_i and y_i are vectors, and A_i is an
n by n symmetric matrix, supplied in packed form, for i = 1, ..., batch_count.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • rocblas_fill_upper: The upper triangular part of each A_i is supplied in AP.

    • rocblas_fill_lower: The lower triangular part of each A_i is supplied in AP.

  • n[in] [rocblas_int] the number of rows and columns of each matrix A_i. Must be at least 0.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • x[in] device pointer pointing to the first vector (x_1).

  • incx[in] [rocblas_int] specifies the increment for the elements of each x_i.

  • stride_x[in] [rocblas_stride] stride from the start of one vector (x_i) and the next one (x_i+1).

  • y[in] device pointer pointing to the first vector (y_1).

  • incy[in] [rocblas_int] specifies the increment for the elements of each y_i.

  • stride_y[in] [rocblas_stride] stride from the start of one vector (y_i) and the next one (y_i+1).

  • AP[inout] device pointer storing the packed version of the specified triangular portion of each symmetric matrix A_i. Points to the first A_1.

      if uplo == rocblas_fill_upper:
          The upper triangular portion of each symmetric matrix A_i is supplied.
          The matrix is compacted so that AP contains the triangular portion
          column-by-column
          so that:
          AP(0) = A(0,0)
          AP(1) = A(0,1)
          AP(2) = A(1,1), etc.
              Ex: (rocblas_fill_upper; n = 4)
                  1 2 4 7
                  2 3 5 8   -----> [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
                  4 5 6 9
                  7 8 9 0
    
      if uplo == rocblas_fill_lower:
          The lower triangular portion of each symmetric matrix A_i is supplied.
          The matrix is compacted so that AP contains the triangular portion
          column-by-column
          so that:
          AP(0) = A(0,0)
          AP(1) = A(1,0)
          AP(n) = A(2,1), etc.
              Ex: (rocblas_fill_lower; n = 4)
                  1 2 3 4
                  2 5 6 7    -----> [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
                  3 6 8 9
                  4 7 9 0
    

  • stride_A[in] [rocblas_stride] stride from the start of one (A_i) and the next (A_i+1).

  • batch_count[in] [rocblas_int] number of instances in the batch.

The spr2_strided_batched functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_Xsymv + batched, strided_batched#

rocblas_status rocblas_ssymv(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *A, rocblas_int lda, const float *x, rocblas_int incx, const float *beta, float *y, rocblas_int incy)#
rocblas_status rocblas_dsymv(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *A, rocblas_int lda, const double *x, rocblas_int incx, const double *beta, double *y, rocblas_int incy)#
rocblas_status rocblas_csymv(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *A, rocblas_int lda, const rocblas_float_complex *x, rocblas_int incx, const rocblas_float_complex *beta, rocblas_float_complex *y, rocblas_int incy)#
rocblas_status rocblas_zsymv(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *A, rocblas_int lda, const rocblas_double_complex *x, rocblas_int incx, const rocblas_double_complex *beta, rocblas_double_complex *y, rocblas_int incy)#

BLAS Level 2 API

symv performs the matrix-vector operation:

y := alpha*A*x + beta*y
where alpha and beta are scalars, x and y are n element vectors and
A should contain an upper or lower triangular n by n symmetric matrix.
symv has an implementation which uses atomic operations. See Atomic Operations in the API Reference Guide for more information.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • if rocblas_fill_upper, the lower part of A is not referenced.

    • if rocblas_fill_lower, the upper part of A is not referenced.

  • n[in] [rocblas_int]

  • alpha[in] specifies the scalar alpha.

  • A[in] pointer storing matrix A on the GPU

  • lda[in] [rocblas_int] specifies the leading dimension of A.

  • x[in] pointer storing vector x on the GPU.

  • incx[in] [rocblas_int] specifies the increment for the elements of x.

  • beta[in] specifies the scalar beta

  • y[out] pointer storing vector y on the GPU.

  • incy[in] [rocblas_int] specifies the increment for the elements of y.

The symv functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_status rocblas_ssymv_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *const A[], rocblas_int lda, const float *const x[], rocblas_int incx, const float *beta, float *const y[], rocblas_int incy, rocblas_int batch_count)#
rocblas_status rocblas_dsymv_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *const A[], rocblas_int lda, const double *const x[], rocblas_int incx, const double *beta, double *const y[], rocblas_int incy, rocblas_int batch_count)#
rocblas_status rocblas_csymv_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *const A[], rocblas_int lda, const rocblas_float_complex *const x[], rocblas_int incx, const rocblas_float_complex *beta, rocblas_float_complex *const y[], rocblas_int incy, rocblas_int batch_count)#
rocblas_status rocblas_zsymv_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *const A[], rocblas_int lda, const rocblas_double_complex *const x[], rocblas_int incx, const rocblas_double_complex *beta, rocblas_double_complex *const y[], rocblas_int incy, rocblas_int batch_count)#

BLAS Level 2 API

symv_batched performs the matrix-vector operation:

y_i := alpha*A_i*x_i + beta*y_i
where (A_i, x_i, y_i) is the i-th instance of the batch.
alpha and beta are scalars, x_i and y_i are vectors and A_i is an
n by n symmetric matrix, for i = 1, ..., batch_count.
A a should contain an upper or lower triangular symmetric matrix
and the opposing triangular part of A is not referenced.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • if rocblas_fill_upper, the lower part of A is not referenced.

    • if rocblas_fill_lower, the upper part of A is not referenced.

  • n[in] [rocblas_int] number of rows and columns of each matrix A_i.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • A[in] device array of device pointers storing each matrix A_i.

  • lda[in] [rocblas_int] specifies the leading dimension of each matrix A_i.

  • x[in] device array of device pointers storing each vector x_i.

  • incx[in] [rocblas_int] specifies the increment for the elements of each vector x_i.

  • beta[in] device pointer or host pointer to scalar beta.

  • y[out] device array of device pointers storing each vector y_i.

  • incy[in] [rocblas_int] specifies the increment for the elements of each vector y_i.

  • batch_count[in] [rocblas_int] number of instances in the batch.

The symv_batched functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_status rocblas_ssymv_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *A, rocblas_int lda, rocblas_stride strideA, const float *x, rocblas_int incx, rocblas_stride stridex, const float *beta, float *y, rocblas_int incy, rocblas_stride stridey, rocblas_int batch_count)#
rocblas_status rocblas_dsymv_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *A, rocblas_int lda, rocblas_stride strideA, const double *x, rocblas_int incx, rocblas_stride stridex, const double *beta, double *y, rocblas_int incy, rocblas_stride stridey, rocblas_int batch_count)#
rocblas_status rocblas_csymv_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *A, rocblas_int lda, rocblas_stride strideA, const rocblas_float_complex *x, rocblas_int incx, rocblas_stride stridex, const rocblas_float_complex *beta, rocblas_float_complex *y, rocblas_int incy, rocblas_stride stridey, rocblas_int batch_count)#
rocblas_status rocblas_zsymv_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *A, rocblas_int lda, rocblas_stride strideA, const rocblas_double_complex *x, rocblas_int incx, rocblas_stride stridex, const rocblas_double_complex *beta, rocblas_double_complex *y, rocblas_int incy, rocblas_stride stridey, rocblas_int batch_count)#

BLAS Level 2 API

symv_strided_batched performs the matrix-vector operation:

y_i := alpha*A_i*x_i + beta*y_i
where (A_i, x_i, y_i) is the i-th instance of the batch.
alpha and beta are scalars, x_i and y_i are vectors and A_i is an
n by n symmetric matrix, for i = 1, ..., batch_count.
A a should contain an upper or lower triangular symmetric matrix
and the opposing triangular part of A is not referenced.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • if rocblas_fill_upper, the lower part of A is not referenced

    • if rocblas_fill_lower, the upper part of A is not referenced

  • n[in] [rocblas_int] number of rows and columns of each matrix A_i.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • A[in] Device pointer to the first matrix A_1 on the GPU.

  • lda[in] [rocblas_int] specifies the leading dimension of each matrix A_i.

  • strideA[in] [rocblas_stride] stride from the start of one matrix (A_i) and the next one (A_i+1).

  • x[in] Device pointer to the first vector x_1 on the GPU.

  • incx[in] [rocblas_int] specifies the increment for the elements of each vector x_i.

  • stridex[in] [rocblas_stride] stride from the start of one vector (x_i) and the next one (x_i+1). There are no restrictions placed on stride_x. However, ensure that stridex is of appropriate size. This typically means stridex >= n * incx. stridex should be non zero.

  • beta[in] device pointer or host pointer to scalar beta.

  • y[out] Device pointer to the first vector y_1 on the GPU.

  • incy[in] [rocblas_int] specifies the increment for the elements of each vector y_i.

  • stridey[in] [rocblas_stride] stride from the start of one vector (y_i) and the next one (y_i+1). There are no restrictions placed on stride_y. However, ensure that stridey is of appropriate size. This typically means stridey >= n * incy. stridey should be non zero.

  • batch_count[in] [rocblas_int] number of instances in the batch.

The symv_strided_batched functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_Xsyr + batched, strided_batched#

rocblas_status rocblas_ssyr(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *x, rocblas_int incx, float *A, rocblas_int lda)#
rocblas_status rocblas_dsyr(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *x, rocblas_int incx, double *A, rocblas_int lda)#
rocblas_status rocblas_csyr(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *x, rocblas_int incx, rocblas_float_complex *A, rocblas_int lda)#
rocblas_status rocblas_zsyr(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *x, rocblas_int incx, rocblas_double_complex *A, rocblas_int lda)#

BLAS Level 2 API

syr performs the matrix-vector operations:

A := A + alpha*x*x**T
where alpha is a scalar, x is a vector, and A is an
n by n symmetric matrix.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • if rocblas_fill_upper, the lower part of A is not referenced

    • if rocblas_fill_lower, the upper part of A is not referenced

  • n[in] [rocblas_int] the number of rows and columns of matrix A.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • x[in] device pointer storing vector x.

  • incx[in] [rocblas_int] specifies the increment for the elements of x.

  • A[inout] device pointer storing matrix A.

  • lda[in] [rocblas_int] specifies the leading dimension of A.

The syr functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_status rocblas_ssyr_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *const x[], rocblas_int incx, float *const A[], rocblas_int lda, rocblas_int batch_count)#
rocblas_status rocblas_dsyr_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *const x[], rocblas_int incx, double *const A[], rocblas_int lda, rocblas_int batch_count)#
rocblas_status rocblas_csyr_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *const x[], rocblas_int incx, rocblas_float_complex *const A[], rocblas_int lda, rocblas_int batch_count)#
rocblas_status rocblas_zsyr_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *const x[], rocblas_int incx, rocblas_double_complex *const A[], rocblas_int lda, rocblas_int batch_count)#

BLAS Level 2 API

syr_batched performs a batch of matrix-vector operations:

A[i] := A[i] + alpha*x[i]*x[i]**T
where alpha is a scalar, x is an array of vectors, and A is an array of
n by n symmetric matrices, for i = 1 , ... , batch_count.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • if rocblas_fill_upper, the lower part of A is not referenced

    • if rocblas_fill_lower, the upper part of A is not referenced

  • n[in] [rocblas_int] the number of rows and columns of matrix A.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • x[in] device array of device pointers storing each vector x_i.

  • incx[in] [rocblas_int] specifies the increment for the elements of each x_i.

  • A[inout] device array of device pointers storing each matrix A_i.

  • lda[in] [rocblas_int] specifies the leading dimension of each A_i.

  • batch_count[in] [rocblas_int] number of instances in the batch.

The syr_batched functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_status rocblas_ssyr_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *x, rocblas_int incx, rocblas_stride stridex, float *A, rocblas_int lda, rocblas_stride strideA, rocblas_int batch_count)#
rocblas_status rocblas_dsyr_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *x, rocblas_int incx, rocblas_stride stridex, double *A, rocblas_int lda, rocblas_stride strideA, rocblas_int batch_count)#
rocblas_status rocblas_csyr_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *x, rocblas_int incx, rocblas_stride stridex, rocblas_float_complex *A, rocblas_int lda, rocblas_stride strideA, rocblas_int batch_count)#
rocblas_status rocblas_zsyr_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *x, rocblas_int incx, rocblas_stride stridex, rocblas_double_complex *A, rocblas_int lda, rocblas_stride strideA, rocblas_int batch_count)#

BLAS Level 2 API

syr_strided_batched performs the matrix-vector operations:

A[i] := A[i] + alpha*x[i]*x[i]**T
where alpha is a scalar, vectors, and A is an array of
n by n symmetric matrices, for i = 1 , ... , batch_count.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • if rocblas_fill_upper, the lower part of A is not referenced

    • if rocblas_fill_lower, the upper part of A is not referenced

  • n[in] [rocblas_int] the number of rows and columns of each matrix A.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • x[in] device pointer to the first vector x_1.

  • incx[in] [rocblas_int] specifies the increment for the elements of each x_i.

  • stridex[in] [rocblas_stride] specifies the pointer increment between vectors (x_i) and (x_i+1).

  • A[inout] device pointer to the first matrix A_1.

  • lda[in] [rocblas_int] specifies the leading dimension of each A_i.

  • strideA[in] [rocblas_stride] stride from the start of one matrix (A_i) and the next one (A_i+1).

  • batch_count[in] [rocblas_int] number of instances in the batch.

The syr_strided_batched functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_Xsyr2 + batched, strided_batched#

rocblas_status rocblas_ssyr2(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *x, rocblas_int incx, const float *y, rocblas_int incy, float *A, rocblas_int lda)#
rocblas_status rocblas_dsyr2(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *x, rocblas_int incx, const double *y, rocblas_int incy, double *A, rocblas_int lda)#
rocblas_status rocblas_csyr2(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *x, rocblas_int incx, const rocblas_float_complex *y, rocblas_int incy, rocblas_float_complex *A, rocblas_int lda)#
rocblas_status rocblas_zsyr2(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *x, rocblas_int incx, const rocblas_double_complex *y, rocblas_int incy, rocblas_double_complex *A, rocblas_int lda)#

BLAS Level 2 API

syr2 performs the matrix-vector operations:

A := A + alpha*x*y**T + alpha*y*x**T
where alpha is a scalar, x and y are vectors, and A is an
n by n symmetric matrix.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • if rocblas_fill_upper, the lower part of A is not referenced

    • if rocblas_fill_lower, the upper part of A is not referenced

  • n[in] [rocblas_int] the number of rows and columns of matrix A.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • x[in] device pointer storing vector x.

  • incx[in] [rocblas_int] specifies the increment for the elements of x.

  • y[in] device pointer storing vector y.

  • incy[in] [rocblas_int] specifies the increment for the elements of y.

  • A[inout] device pointer storing matrix A.

  • lda[in] [rocblas_int] specifies the leading dimension of A.

The syr2 functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_status rocblas_ssyr2_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *const x[], rocblas_int incx, const float *const y[], rocblas_int incy, float *const A[], rocblas_int lda, rocblas_int batch_count)#
rocblas_status rocblas_dsyr2_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *const x[], rocblas_int incx, const double *const y[], rocblas_int incy, double *const A[], rocblas_int lda, rocblas_int batch_count)#
rocblas_status rocblas_csyr2_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *const x[], rocblas_int incx, const rocblas_float_complex *const y[], rocblas_int incy, rocblas_float_complex *const A[], rocblas_int lda, rocblas_int batch_count)#
rocblas_status rocblas_zsyr2_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *const x[], rocblas_int incx, const rocblas_double_complex *const y[], rocblas_int incy, rocblas_double_complex *const A[], rocblas_int lda, rocblas_int batch_count)#

BLAS Level 2 API

syr2_batched performs a batch of matrix-vector operations:

A[i] := A[i] + alpha*x[i]*y[i]**T + alpha*y[i]*x[i]**T
where alpha is a scalar, x[i] and y[i] are vectors, and A[i] is a
n by n symmetric matrix, for i = 1 , ... , batch_count.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • if rocblas_fill_upper, the lower part of A is not referenced

    • if rocblas_fill_lower, the upper part of A is not referenced

  • n[in] [rocblas_int] the number of rows and columns of matrix A.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • x[in] device array of device pointers storing each vector x_i.

  • incx[in] [rocblas_int] specifies the increment for the elements of each x_i.

  • y[in] device array of device pointers storing each vector y_i.

  • incy[in] [rocblas_int] specifies the increment for the elements of each y_i.

  • A[inout] device array of device pointers storing each matrix A_i.

  • lda[in] [rocblas_int] specifies the leading dimension of each A_i.

  • batch_count[in] [rocblas_int] number of instances in the batch.

The syr2_batched functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_status rocblas_ssyr2_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const float *alpha, const float *x, rocblas_int incx, rocblas_stride stridex, const float *y, rocblas_int incy, rocblas_stride stridey, float *A, rocblas_int lda, rocblas_stride strideA, rocblas_int batch_count)#
rocblas_status rocblas_dsyr2_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const double *alpha, const double *x, rocblas_int incx, rocblas_stride stridex, const double *y, rocblas_int incy, rocblas_stride stridey, double *A, rocblas_int lda, rocblas_stride strideA, rocblas_int batch_count)#
rocblas_status rocblas_csyr2_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_float_complex *alpha, const rocblas_float_complex *x, rocblas_int incx, rocblas_stride stridex, const rocblas_float_complex *y, rocblas_int incy, rocblas_stride stridey, rocblas_float_complex *A, rocblas_int lda, rocblas_stride strideA, rocblas_int batch_count)#
rocblas_status rocblas_zsyr2_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_int n, const rocblas_double_complex *alpha, const rocblas_double_complex *x, rocblas_int incx, rocblas_stride stridex, const rocblas_double_complex *y, rocblas_int incy, rocblas_stride stridey, rocblas_double_complex *A, rocblas_int lda, rocblas_stride strideA, rocblas_int batch_count)#

BLAS Level 2 API

syr2_strided_batched the matrix-vector operations:

A[i] := A[i] + alpha*x[i]*y[i]**T + alpha*y[i]*x[i]**T
where alpha is a scalar, x[i] and y[i] are vectors, and A[i] is a
n by n symmetric matrices, for i = 1 , ... , batch_count

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill] specifies whether the upper ‘rocblas_fill_upper’ or lower ‘rocblas_fill_lower’

    • if rocblas_fill_upper, the lower part of A is not referenced

    • if rocblas_fill_lower, the upper part of A is not referenced

  • n[in] [rocblas_int] the number of rows and columns of each matrix A.

  • alpha[in] device pointer or host pointer to scalar alpha.

  • x[in] device pointer to the first vector x_1.

  • incx[in] [rocblas_int] specifies the increment for the elements of each x_i.

  • stridex[in] [rocblas_stride] specifies the pointer increment between vectors (x_i) and (x_i+1).

  • y[in] device pointer to the first vector y_1.

  • incy[in] [rocblas_int] specifies the increment for the elements of each y_i.

  • stridey[in] [rocblas_stride] specifies the pointer increment between vectors (y_i) and (y_i+1).

  • A[inout] device pointer to the first matrix A_1.

  • lda[in] [rocblas_int] specifies the leading dimension of each A_i.

  • strideA[in] [rocblas_stride] stride from the start of one matrix (A_i) and the next one (A_i+1).

  • batch_count[in] [rocblas_int] number of instances in the batch.

The syr2_strided_batched functions support the _64 interface. Refer to section ILP64 Interface.

rocblas_Xtbmv + batched, strided_batched#

rocblas_status rocblas_stbmv(rocblas_handle handle, rocblas_fill uplo, rocblas_operation trans, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const float *A, rocblas_int lda, float *x, rocblas_int incx)#
rocblas_status rocblas_dtbmv(rocblas_handle handle, rocblas_fill uplo, rocblas_operation trans, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const double *A, rocblas_int lda, double *x, rocblas_int incx)#
rocblas_status rocblas_ctbmv(rocblas_handle handle, rocblas_fill uplo, rocblas_operation trans, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const rocblas_float_complex *A, rocblas_int lda, rocblas_float_complex *x, rocblas_int incx)#
rocblas_status rocblas_ztbmv(rocblas_handle handle, rocblas_fill uplo, rocblas_operation trans, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const rocblas_double_complex *A, rocblas_int lda, rocblas_double_complex *x, rocblas_int incx)#

BLAS Level 2 API

tbmv performs one of the matrix-vector operations:

x := A*x      or
x := A**T*x   or
x := A**H*x,
x is a vectors and A is a banded n by n matrix (see description below).

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill]

    • rocblas_fill_upper: A is an upper banded triangular matrix.

    • rocblas_fill_lower: A is a lower banded triangular matrix.

  • trans[in] [rocblas_operation] indicates whether matrix A is tranposed (conjugated) or not.

  • diag[in] [rocblas_diagonal]

    • rocblas_diagonal_unit: The main diagonal of A is assumed to consist of only 1’s and is not referenced.

    • rocblas_diagonal_non_unit: No assumptions are made of A’s main diagonal.

  • n[in] [rocblas_int] the number of rows and columns of the matrix represented by A.

  • k[in] [rocblas_int]

        if uplo == rocblas_fill_upper, k specifies the number of super-diagonals
        of the matrix A.
    
        if uplo == rocblas_fill_lower, k specifies the number of sub-diagonals
        of the matrix A.
        k must satisfy k > 0 && k < lda.
    

  • A[in] device pointer storing banded triangular matrix A.

        if uplo == rocblas_fill_upper:
            The matrix represented is an upper banded triangular matrix
            with the main diagonal and k super-diagonals, everything
            else can be assumed to be 0.
            The matrix is compacted so that the main diagonal resides on the k'th
            row, the first super diagonal resides on the RHS of the k-1'th row, etc,
            with the k'th diagonal on the RHS of the 0'th row.
               Ex: (rocblas_fill_upper; n = 5; k = 2)
                  1 6 9 0 0              0 0 9 8 7
                  0 2 7 8 0              0 6 7 8 9
                  0 0 3 8 7     ---->    1 2 3 4 5
                  0 0 0 4 9              0 0 0 0 0
                  0 0 0 0 5              0 0 0 0 0
    
        if uplo == rocblas_fill_lower:
            The matrix represnted is a lower banded triangular matrix
            with the main diagonal and k sub-diagonals, everything else can be
            assumed to be 0.
            The matrix is compacted so that the main diagonal resides on the 0'th row,
            working up to the k'th diagonal residing on the LHS of the k'th row.
               Ex: (rocblas_fill_lower; n = 5; k = 2)
                  1 0 0 0 0              1 2 3 4 5
                  6 2 0 0 0              6 7 8 9 0
                  9 7 3 0 0     ---->    9 8 7 0 0
                  0 8 8 4 0              0 0 0 0 0
                  0 0 7 9 5              0 0 0 0 0
    

  • lda[in] [rocblas_int] specifies the leading dimension of A. lda must satisfy lda > k.

  • x[inout] device pointer storing vector x.

  • incx[in] [rocblas_int] specifies the increment for the elements of x.

The tbmv functions support the _64 interface. Parameters n and k larger than int32_t max value are not currently supported. Refer to section ILP64 Interface.

rocblas_status rocblas_stbmv_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_operation trans, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const float *const A[], rocblas_int lda, float *const x[], rocblas_int incx, rocblas_int batch_count)#
rocblas_status rocblas_dtbmv_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_operation trans, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const double *const A[], rocblas_int lda, double *const x[], rocblas_int incx, rocblas_int batch_count)#
rocblas_status rocblas_ctbmv_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_operation trans, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const rocblas_float_complex *const A[], rocblas_int lda, rocblas_float_complex *const x[], rocblas_int incx, rocblas_int batch_count)#
rocblas_status rocblas_ztbmv_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_operation trans, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const rocblas_double_complex *const A[], rocblas_int lda, rocblas_double_complex *const x[], rocblas_int incx, rocblas_int batch_count)#

BLAS Level 2 API

tbmv_batched performs one of the matrix-vector operations:

x_i := A_i*x_i      or
x_i := A_i**T*x_i   or
x_i := A_i**H*x_i,
where (A_i, x_i) is the i-th instance of the batch.
x_i is a vector and A_i is an n by n matrix, for i = 1, ..., batch_count.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill]

    • rocblas_fill_upper: each A_i is an upper banded triangular matrix.

    • rocblas_fill_lower: each A_i is a lower banded triangular matrix.

  • trans[in] [rocblas_operation] indicates whether each matrix A_i is tranposed (conjugated) or not.

  • diag[in] [rocblas_diagonal]

    • rocblas_diagonal_unit: The main diagonal of each A_i is assumed to consist of only 1’s and is not referenced.

    • rocblas_diagonal_non_unit: No assumptions are made of each A_i’s main diagonal.

  • n[in] [rocblas_int] the number of rows and columns of the matrix represented by each A_i.

  • k[in] [rocblas_int]

        if uplo == rocblas_fill_upper, k specifies the number of super-diagonals
        of each matrix A_i.
    
        if uplo == rocblas_fill_lower, k specifies the number of sub-diagonals
        of each matrix A_i.
        k must satisfy k > 0 && k < lda.
    

  • A[in] device array of device pointers storing each banded triangular matrix A_i.

        if uplo == rocblas_fill_upper:
            The matrix represented is an upper banded triangular matrix
            with the main diagonal and k super-diagonals, everything
            else can be assumed to be 0.
            The matrix is compacted so that the main diagonal resides on the k'th
            row, the first super diagonal resides on the RHS of the k-1'th row, etc,
            with the k'th diagonal on the RHS of the 0'th row.
               Ex: (rocblas_fill_upper; n = 5; k = 2)
                  1 6 9 0 0              0 0 9 8 7
                  0 2 7 8 0              0 6 7 8 9
                  0 0 3 8 7     ---->    1 2 3 4 5
                  0 0 0 4 9              0 0 0 0 0
                  0 0 0 0 5              0 0 0 0 0
    
        if uplo == rocblas_fill_lower:
            The matrix represnted is a lower banded triangular matrix
            with the main diagonal and k sub-diagonals, everything else can be
            assumed to be 0.
            The matrix is compacted so that the main diagonal resides on the 0'th row,
            working up to the k'th diagonal residing on the LHS of the k'th row.
               Ex: (rocblas_fill_lower; n = 5; k = 2)
                  1 0 0 0 0              1 2 3 4 5
                  6 2 0 0 0              6 7 8 9 0
                  9 7 3 0 0     ---->    9 8 7 0 0
                  0 8 8 4 0              0 0 0 0 0
                  0 0 7 9 5              0 0 0 0 0
    

  • lda[in] [rocblas_int] specifies the leading dimension of each A_i. lda must satisfy lda > k.

  • x[inout] device array of device pointer storing each vector x_i.

  • incx[in] [rocblas_int] specifies the increment for the elements of each x_i.

  • batch_count[in] [rocblas_int] number of instances in the batch.

The tbmv_batched functions support the _64 interface. Parameters n and k larger than int32_t max value are not currently supported. Refer to section ILP64 Interface.

rocblas_status rocblas_stbmv_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_operation trans, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const float *A, rocblas_int lda, rocblas_stride stride_A, float *x, rocblas_int incx, rocblas_stride stride_x, rocblas_int batch_count)#
rocblas_status rocblas_dtbmv_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_operation trans, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const double *A, rocblas_int lda, rocblas_stride stride_A, double *x, rocblas_int incx, rocblas_stride stride_x, rocblas_int batch_count)#
rocblas_status rocblas_ctbmv_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_operation trans, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const rocblas_float_complex *A, rocblas_int lda, rocblas_stride stride_A, rocblas_float_complex *x, rocblas_int incx, rocblas_stride stride_x, rocblas_int batch_count)#
rocblas_status rocblas_ztbmv_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_operation trans, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const rocblas_double_complex *A, rocblas_int lda, rocblas_stride stride_A, rocblas_double_complex *x, rocblas_int incx, rocblas_stride stride_x, rocblas_int batch_count)#

BLAS Level 2 API

tbmv_strided_batched performs one of the matrix-vector operations:

x_i := A_i*x_i      or
x_i := A_i**T*x_i   or
x_i := A_i**H*x_i,
where (A_i, x_i) is the i-th instance of the batch.
x_i is a vector and A_i is an n by n matrix, for i = 1, ..., batch_count.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill]

    • rocblas_fill_upper: each A_i is an upper banded triangular matrix.

    • rocblas_fill_lower: each A_i is a lower banded triangular matrix.

  • trans[in] [rocblas_operation] indicates whether each matrix A_i is tranposed (conjugated) or not.

  • diag[in] [rocblas_diagonal]

    • rocblas_diagonal_unit: The main diagonal of each A_i is assumed to consist of only 1’s and is not referenced.

    • rocblas_diagonal_non_unit: No assumptions are made of each A_i’s main diagonal.

  • n[in] [rocblas_int] the number of rows and columns of the matrix represented by each A_i.

  • k[in] [rocblas_int]

        if uplo == rocblas_fill_upper, k specifies the number of super-diagonals
        of each matrix A_i.
    
        if uplo == rocblas_fill_lower, k specifies the number of sub-diagonals
        of each matrix A_i.
        k must satisfy k > 0 && k < lda.
    

  • A[in] device array to the first matrix A_i of the batch. Stores each banded triangular matrix A_i.

        if uplo == rocblas_fill_upper:
            The matrix represented is an upper banded triangular matrix
            with the main diagonal and k super-diagonals, everything
            else can be assumed to be 0.
            The matrix is compacted so that the main diagonal resides on the k'th
            row, the first super diagonal resides on the RHS of the k-1'th row, etc,
            with the k'th diagonal on the RHS of the 0'th row.
               Ex: (rocblas_fill_upper; n = 5; k = 2)
                  1 6 9 0 0              0 0 9 8 7
                  0 2 7 8 0              0 6 7 8 9
                  0 0 3 8 7     ---->    1 2 3 4 5
                  0 0 0 4 9              0 0 0 0 0
                  0 0 0 0 5              0 0 0 0 0
    
        if uplo == rocblas_fill_lower:
            The matrix represnted is a lower banded triangular matrix
            with the main diagonal and k sub-diagonals, everything else can be
            assumed to be 0.
            The matrix is compacted so that the main diagonal resides on the 0'th row,
            working up to the k'th diagonal residing on the LHS of the k'th row.
               Ex: (rocblas_fill_lower; n = 5; k = 2)
                  1 0 0 0 0              1 2 3 4 5
                  6 2 0 0 0              6 7 8 9 0
                  9 7 3 0 0     ---->    9 8 7 0 0
                  0 8 8 4 0              0 0 0 0 0
                  0 0 7 9 5              0 0 0 0 0
    

  • lda[in] [rocblas_int] specifies the leading dimension of each A_i. lda must satisfy lda > k.

  • stride_A[in] [rocblas_stride] stride from the start of one A_i matrix to the next A_(i + 1).

  • x[inout] device array to the first vector x_i of the batch.

  • incx[in] [rocblas_int] specifies the increment for the elements of each x_i.

  • stride_x[in] [rocblas_stride] stride from the start of one x_i matrix to the next x_(i + 1).

  • batch_count[in] [rocblas_int] number of instances in the batch.

The tbmv_strided_batched functions support the _64 interface. Parameters n and k larger than int32_t max value are not currently supported. Refer to section ILP64 Interface.

rocblas_Xtbsv + batched, strided_batched#

rocblas_status rocblas_stbsv(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const float *A, rocblas_int lda, float *x, rocblas_int incx)#
rocblas_status rocblas_dtbsv(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const double *A, rocblas_int lda, double *x, rocblas_int incx)#
rocblas_status rocblas_ctbsv(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const rocblas_float_complex *A, rocblas_int lda, rocblas_float_complex *x, rocblas_int incx)#
rocblas_status rocblas_ztbsv(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const rocblas_double_complex *A, rocblas_int lda, rocblas_double_complex *x, rocblas_int incx)#

BLAS Level 2 API

tbsv solves:

 A*x = b or
 A**T*x = b or
 A**H*x = b
 where x and b are vectors and A is a banded triangular matrix.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill]

    • rocblas_fill_upper: A is an upper triangular matrix.

    • rocblas_fill_lower: A is a lower triangular matrix.

  • transA[in] [rocblas_operation]

    • rocblas_operation_none: Solves A*x = b

    • rocblas_operation_transpose: Solves A**T*x = b

    • rocblas_operation_conjugate_transpose: Solves A**H*x = b

  • diag[in] [rocblas_diagonal]

    • rocblas_diagonal_unit: A is assumed to be unit triangular (i.e. the diagonal elements of A are not used in computations).

    • rocblas_diagonal_non_unit: A is not assumed to be unit triangular.

  • n[in] [rocblas_int] n specifies the number of rows of b. n >= 0.

  • k[in] [rocblas_int]

        if(uplo == rocblas_fill_upper)
            k specifies the number of super-diagonals of A.
        if(uplo == rocblas_fill_lower)
            k specifies the number of sub-diagonals of A.
        k >= 0.
    

  • A[in] device pointer storing the matrix A in banded format.

  • lda[in] [rocblas_int] specifies the leading dimension of A. lda >= (k + 1).

  • x[inout] device pointer storing input vector b. Overwritten by the output vector x.

  • incx[in] [rocblas_int] specifies the increment for the elements of x.

The tbsv functions support the _64 interface. Parameters n and k larger than int32_t max value are not currently supported. Refer to section ILP64 Interface.

rocblas_status rocblas_stbsv_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const float *const A[], rocblas_int lda, float *const x[], rocblas_int incx, rocblas_int batch_count)#
rocblas_status rocblas_dtbsv_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const double *const A[], rocblas_int lda, double *const x[], rocblas_int incx, rocblas_int batch_count)#
rocblas_status rocblas_ctbsv_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const rocblas_float_complex *const A[], rocblas_int lda, rocblas_float_complex *const x[], rocblas_int incx, rocblas_int batch_count)#
rocblas_status rocblas_ztbsv_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const rocblas_double_complex *const A[], rocblas_int lda, rocblas_double_complex *const x[], rocblas_int incx, rocblas_int batch_count)#

BLAS Level 2 API

tbsv_batched solves:

 A_i*x_i = b_i or
 A_i**T*x_i = b_i or
 A_i**H*x_i = b_i
 where x_i and b_i are vectors and A_i is a banded triangular matrix,
for i = [1, batch_count].
The input vectors b_i are overwritten by the output vectors x_i.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill]

    • rocblas_fill_upper: A_i is an upper triangular matrix.

    • rocblas_fill_lower: A_i is a lower triangular matrix.

  • transA[in] [rocblas_operation]

    • rocblas_operation_none: Solves A_i*x_i = b_i

    • rocblas_operation_transpose: Solves A_i**T*x_i = b_i

    • rocblas_operation_conjugate_transpose: Solves A_i**H*x_i = b_i

  • diag[in] [rocblas_diagonal]

    • rocblas_diagonal_unit: each A_i is assumed to be unit triangular (i.e. the diagonal elements of each A_i are not used in computations).

    • rocblas_diagonal_non_unit: each A_i is not assumed to be unit triangular.

  • n[in] [rocblas_int] n specifies the number of rows of each b_i. n >= 0.

  • k[in] [rocblas_int]

        if(uplo == rocblas_fill_upper)
            k specifies the number of super-diagonals of each A_i.
        if(uplo == rocblas_fill_lower)
            k specifies the number of sub-diagonals of each A_i.
        k >= 0.
    

  • A[in] device vector of device pointers storing each matrix A_i in banded format.

  • lda[in] [rocblas_int] specifies the leading dimension of each A_i. lda >= (k + 1).

  • x[inout] device vector of device pointers storing each input vector b_i. Overwritten by each output vector x_i.

  • incx[in] [rocblas_int] specifies the increment for the elements of each x_i.

  • batch_count[in] [rocblas_int] number of instances in the batch.

The tbsv_batched functions support the _64 interface. Parameters n and k larger than int32_t max value are not currently supported. Refer to section ILP64 Interface.

rocblas_status rocblas_stbsv_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const float *A, rocblas_int lda, rocblas_stride stride_A, float *x, rocblas_int incx, rocblas_stride stride_x, rocblas_int batch_count)#
rocblas_status rocblas_dtbsv_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const double *A, rocblas_int lda, rocblas_stride stride_A, double *x, rocblas_int incx, rocblas_stride stride_x, rocblas_int batch_count)#
rocblas_status rocblas_ctbsv_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const rocblas_float_complex *A, rocblas_int lda, rocblas_stride stride_A, rocblas_float_complex *x, rocblas_int incx, rocblas_stride stride_x, rocblas_int batch_count)#
rocblas_status rocblas_ztbsv_strided_batched(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int n, rocblas_int k, const rocblas_double_complex *A, rocblas_int lda, rocblas_stride stride_A, rocblas_double_complex *x, rocblas_int incx, rocblas_stride stride_x, rocblas_int batch_count)#

BLAS Level 2 API

tbsv_strided_batched solves:

 A_i*x_i = b_i or
 A_i**T*x_i = b_i or
 A_i**H*x_i = b_i
 where x_i and b_i are vectors and A_i is a banded triangular matrix,
for i = [1, batch_count].
The input vectors b_i are overwritten by the output vectors x_i.

Parameters:
  • handle[in] [rocblas_handle] handle to the rocblas library context queue.

  • uplo[in] [rocblas_fill]

    • rocblas_fill_upper: A_i is an upper triangular matrix.

    • rocblas_fill_lower: A_i is a lower triangular matrix.

  • transA[in] [rocblas_operation]

    • rocblas_operation_none: Solves A_i*x_i = b_i

    • rocblas_operation_transpose: Solves A_i**T*x_i = b_i

    • rocblas_operation_conjugate_transpose: Solves A_i**H*x_i = b_i

  • diag[in] [rocblas_diagonal]

    • rocblas_diagonal_unit: each A_i is assumed to be unit triangular (i.e. the diagonal elements of each A_i are not used in computations).

    • rocblas_diagonal_non_unit: each A_i is not assumed to be unit triangular.

  • n[in] [rocblas_int] n specifies the number of rows of each b_i. n >= 0.

  • k[in] [rocblas_int]

        if(uplo == rocblas_fill_upper)
            k specifies the number of super-diagonals of each A_i.
        if(uplo == rocblas_fill_lower)
            k specifies the number of sub-diagonals of each A_i.
        k >= 0.
    

  • A[in] device pointer pointing to the first banded matrix A_1.

  • lda[in] [rocblas_int] specifies the leading dimension of each A_i. lda >= (k + 1).

  • stride_A[in] [rocblas_stride] specifies the distance between the start of one matrix (A_i) and the next (A_i+1).

  • x[inout] device pointer pointing to the first input vector b_1. Overwritten by output vectors x.

  • incx[in] [rocblas_int] specifies the increment for the elements of each x_i.

  • stride_x[in] [rocblas_stride] specifies the distance between the start of one vector (x_i) and the next (x_i+1).

  • batch_count[in] [rocblas_int] number of instances in the batch.

The tbsv_strided_batched functions support the _64 interface. Parameters n and k larger than int32_t max value are not currently supported. Refer to section ILP64 Interface.

rocblas_Xtpmv + batched, strided_batched#

rocblas_status rocblas_stpmv(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int n, const float *A, float *x, rocblas_int incx)#
rocblas_status rocblas_dtpmv(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int n, const double *A, double *x, rocblas_int incx)#
rocblas_status rocblas_ctpmv(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int n, const rocblas_float_complex *A, rocblas_float_complex *x, rocblas_int incx)#
rocblas_status rocblas_ztpmv(rocblas_handle handle, rocblas_fill uplo, rocblas_operation transA, rocblas_diagonal diag, rocblas_int n, const rocblas_double_complex *A, rocblas_double_complex *x, rocblas_int incx)#

BLAS Level 2 API

tpmv performs one of the matrix-vector operations:

x = A*x or
x = A**