3.4. Lapack-like Functions#
Other Lapack-like routines provided by rocSOLVER. These are divided into the following subcategories:
Triangular factorizations. Based on Gaussian elimination.
Linear-systems solvers. Based on triangular factorizations.
Symmetric eigensolvers. Eigenproblems for symmetric matrices.
Singular value decomposition. Singular values and related problems for general matrices.
Note
Throughout the APIs’ descriptions, we use the following notations:
x[i] stands for the i-th element of vector x, while A[i,j] represents the element in the i-th row and j-th column of matrix A. Indices are 1-based, i.e. x[1] is the first element of x.
If X is a real vector or matrix, \(X^T\) indicates its transpose; if X is complex, then \(X^H\) represents its conjugate transpose. When X could be real or complex, we use X’ to indicate X transposed or X conjugate transposed, accordingly.
x_i \(=x_i\); we sometimes use both notations, \(x_i\) when displaying mathematical equations, and x_i in the text describing the function parameters.
3.4.1. Triangular factorizations#
3.4.1.1. rocsolver_<type>getf2_npvt()#
-
rocblas_status rocsolver_zgetf2_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)#
-
rocblas_status rocsolver_cgetf2_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)#
-
rocblas_status rocsolver_dgetf2_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)#
-
rocblas_status rocsolver_sgetf2_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)#
GETF2_NPVT computes the LU factorization of a general m-by-n matrix A without partial pivoting.
(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).
The factorization has the form
\[ A = LU \]where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
Note: Although this routine can offer better performance, Gaussian elimination without pivoting is not backward stable. If numerical accuracy is compromised, use the legacy-LAPACK-like API GETF2 routines instead.
- Parameters:
handle – [in] rocblas_handle.
m – [in]
rocblas_int. m >= 0.
The number of rows of the matrix A.
n – [in]
rocblas_int. n >= 0.
The number of columns of the matrix A.
A – [inout]
pointer to type. Array on the GPU of dimension lda*n.
On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.
lda – [in]
rocblas_int. lda >= m.
Specifies the leading dimension of A.
info – [out]
pointer to a rocblas_int on the GPU.
If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero element in the diagonal. The factorization from this point might be incomplete.
3.4.1.2. rocsolver_<type>getf2_npvt_batched()#
-
rocblas_status rocsolver_zgetf2_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cgetf2_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_dgetf2_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_sgetf2_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
GETF2_NPVT_BATCHED computes the LU factorization of a batch of general m-by-n matrices without partial pivoting.
(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).
The factorization of matrix \(A_j\) in the batch has the form
\[ A_j = L_jU_j \]where \(L_j\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_j\) is upper triangular (upper trapezoidal if m < n).
Note: Although this routine can offer better performance, Gaussian elimination without pivoting is not backward stable. If numerical accuracy is compromised, use the legacy-LAPACK-like API GETF2_BATCHED routines instead.
- Parameters:
handle – [in] rocblas_handle.
m – [in]
rocblas_int. m >= 0.
The number of rows of all matrices A_j in the batch.
n – [in]
rocblas_int. n >= 0.
The number of columns of all matrices A_j in the batch.
A – [inout]
array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.
On entry, the m-by-n matrices A_j to be factored. On exit, the factors L_j and U_j from the factorizations. The unit diagonal elements of L_j are not stored.
lda – [in]
rocblas_int. lda >= m.
Specifies the leading dimension of matrices A_j.
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit for factorization of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero element in the diagonal. The factorization from this point might be incomplete.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.1.3. rocsolver_<type>getf2_npvt_strided_batched()#
-
rocblas_status rocsolver_zgetf2_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cgetf2_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_dgetf2_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_sgetf2_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
GETF2_NPVT_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices without partial pivoting.
(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).
The factorization of matrix \(A_j\) in the batch has the form
\[ A_j = L_jU_j \]where \(L_j\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_j\) is upper triangular (upper trapezoidal if m < n).
Note: Although this routine can offer better performance, Gaussian elimination without pivoting is not backward stable. If numerical accuracy is compromised, use the legacy-LAPACK-like API GETF2_STRIDED_BATCHED routines instead.
- Parameters:
handle – [in] rocblas_handle.
m – [in]
rocblas_int. m >= 0.
The number of rows of all matrices A_j in the batch.
n – [in]
rocblas_int. n >= 0.
The number of columns of all matrices A_j in the batch.
A – [inout]
pointer to type. Array on the GPU (the size depends on the value of strideA).
On entry, the m-by-n matrices A_j to be factored. On exit, the factors L_j and U_j from the factorization. The unit diagonal elements of L_j are not stored.
lda – [in]
rocblas_int. lda >= m.
Specifies the leading dimension of matrices A_j.
strideA – [in]
rocblas_stride.
Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit for factorization of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero element in the diagonal. The factorization from this point might be incomplete.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.1.4. rocsolver_<type>getrf_npvt()#
-
rocblas_status rocsolver_zgetrf_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)#
-
rocblas_status rocsolver_cgetrf_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)#
-
rocblas_status rocsolver_dgetrf_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)#
-
rocblas_status rocsolver_sgetrf_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)#
GETRF_NPVT computes the LU factorization of a general m-by-n matrix A without partial pivoting.
(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).
The factorization has the form
\[ A = LU \]where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
Note: Although this routine can offer better performance, Gaussian elimination without pivoting is not backward stable. If numerical accuracy is compromised, use the legacy-LAPACK-like API GETRF routines instead.
- Parameters:
handle – [in] rocblas_handle.
m – [in]
rocblas_int. m >= 0.
The number of rows of the matrix A.
n – [in]
rocblas_int. n >= 0.
The number of columns of the matrix A.
A – [inout]
pointer to type. Array on the GPU of dimension lda*n.
On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.
lda – [in]
rocblas_int. lda >= m.
Specifies the leading dimension of A.
info – [out]
pointer to a rocblas_int on the GPU.
If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero element in the diagonal. The factorization from this point might be incomplete.
3.4.1.5. rocsolver_<type>getrf_npvt_batched()#
-
rocblas_status rocsolver_zgetrf_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cgetrf_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_dgetrf_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_sgetrf_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
GETRF_NPVT_BATCHED computes the LU factorization of a batch of general m-by-n matrices without partial pivoting.
(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).
The factorization of matrix \(A_j\) in the batch has the form
\[ A_j = L_jU_j \]where \(L_j\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_j\) is upper triangular (upper trapezoidal if m < n).
Note: Although this routine can offer better performance, Gaussian elimination without pivoting is not backward stable. If numerical accuracy is compromised, use the legacy-LAPACK-like API GETRF_BATCHED routines instead.
- Parameters:
handle – [in] rocblas_handle.
m – [in]
rocblas_int. m >= 0.
The number of rows of all matrices A_j in the batch.
n – [in]
rocblas_int. n >= 0.
The number of columns of all matrices A_j in the batch.
A – [inout]
array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.
On entry, the m-by-n matrices A_j to be factored. On exit, the factors L_j and U_j from the factorizations. The unit diagonal elements of L_j are not stored.
lda – [in]
rocblas_int. lda >= m.
Specifies the leading dimension of matrices A_j.
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit for factorization of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero element in the diagonal. The factorization from this point might be incomplete.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.1.6. rocsolver_<type>getrf_npvt_strided_batched()#
-
rocblas_status rocsolver_zgetrf_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cgetrf_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_dgetrf_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_sgetrf_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
GETRF_NPVT_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices without partial pivoting.
(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).
The factorization of matrix \(A_j\) in the batch has the form
\[ A_j = L_jU_j \]where \(L_j\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_j\) is upper triangular (upper trapezoidal if m < n).
Note: Although this routine can offer better performance, Gaussian elimination without pivoting is not backward stable. If numerical accuracy is compromised, use the legacy-LAPACK-like API GETRF_STRIDED_BATCHED routines instead.
- Parameters:
handle – [in] rocblas_handle.
m – [in]
rocblas_int. m >= 0.
The number of rows of all matrices A_j in the batch.
n – [in]
rocblas_int. n >= 0.
The number of columns of all matrices A_j in the batch.
A – [inout]
pointer to type. Array on the GPU (the size depends on the value of strideA).
On entry, the m-by-n matrices A_j to be factored. On exit, the factors L_j and U_j from the factorization. The unit diagonal elements of L_j are not stored.
lda – [in]
rocblas_int. lda >= m.
Specifies the leading dimension of matrices A_j.
strideA – [in]
rocblas_stride.
Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit for factorization of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero element in the diagonal. The factorization from this point might be incomplete.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.1.7. rocsolver_<type>geblttrf_npvt()#
-
rocblas_status rocsolver_zgeblttrf_npvt(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *B, const rocblas_int ldb, rocblas_double_complex *C, const rocblas_int ldc, rocblas_int *info)#
-
rocblas_status rocsolver_cgeblttrf_npvt(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *B, const rocblas_int ldb, rocblas_float_complex *C, const rocblas_int ldc, rocblas_int *info)#
-
rocblas_status rocsolver_dgeblttrf_npvt(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, double *A, const rocblas_int lda, double *B, const rocblas_int ldb, double *C, const rocblas_int ldc, rocblas_int *info)#
-
rocblas_status rocsolver_sgeblttrf_npvt(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, float *A, const rocblas_int lda, float *B, const rocblas_int ldb, float *C, const rocblas_int ldc, rocblas_int *info)#
GEBLTTRF_NPVT computes the LU factorization of a block tridiagonal matrix without partial pivoting.
The LU factorization of a block tridiagonal matrix
\[\begin{split} M = \left[\begin{array}{ccccc} B_1 & C_1\\ A_1 & B_2 & C_2\\ & \ddots & \ddots & \ddots \\ & & A_{n-2} & B_{n-1} & C_{n-1}\\ & & & A_{n-1} & B_n \end{array}\right] \end{split}\]with \(n = \mathrm{nblocks}\) diagonal blocks of size nb, can be represented as
\[\begin{split} M = \left[\begin{array}{cccc} L_1 \\ A_1 & L_2\\ & \ddots & \ddots \\ & & A_{n-1} & L_n \end{array}\right] \left[\begin{array}{cccc} I & U_1 \\ & \ddots & \ddots \\ & & I & U_{n-1}\\ & & & I \end{array}\right] = LU \end{split}\]where the blocks \(L_i\) and \(U_i\) are also general blocks of size nb.
- Parameters:
handle – [in] rocblas_handle.
nb – [in]
rocblas_int. nb >= 0.
The number of rows and columns of each block.
nblocks – [in]
rocblas_int. nblocks >= 0.
The number of blocks along the diagonal of the matrix.
A – [in]
pointer to type. Array on the GPU of dimension lda*nb*(nblocks-1).
Contains the blocks A_i arranged one after the other.
lda – [in]
rocblas_int. lda >= nb.
Specifies the leading dimension of blocks A_i.
B – [inout]
pointer to type. Array on the GPU of dimension ldb*nb*nblocks.
On entry, contains the blocks B_i arranged one after the other. On exit it is overwritten by blocks L_i in factorized form as returned by
GETRF_NPVTldb – [in]
rocblas_int. ldb >= nb.
Specifies the leading dimension of blocks B_i.
C – [inout]
pointer to type. Array on the GPU of dimension ldc*nb*(nblocks-1).
On entry, contains the blocks C_i arranged one after the other. On exit it is overwritten by blocks U_i.
ldc – [in]
rocblas_int. ldc >= nb.
Specifies the leading dimension of blocks C_i.
info – [out]
pointer to a rocblas_int on the GPU.
If info = 0, successful exit. If info = i > 0, the matrix is singular.
3.4.1.8. rocsolver_<type>geblttrf_npvt_batched()#
-
rocblas_status rocsolver_zgeblttrf_npvt_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *const B[], const rocblas_int ldb, rocblas_double_complex *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cgeblttrf_npvt_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *const B[], const rocblas_int ldb, rocblas_float_complex *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_dgeblttrf_npvt_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, double *const A[], const rocblas_int lda, double *const B[], const rocblas_int ldb, double *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_sgeblttrf_npvt_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, float *const A[], const rocblas_int lda, float *const B[], const rocblas_int ldb, float *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#
GEBLTTRF_NPVT_BATCHED computes the LU factorization of a batch of block tridiagonal matrices without partial pivoting.
The LU factorization of a block tridiagonal matrix \(M_j\) in the batch
\[\begin{split} M_j = \left[\begin{array}{ccccc} B_{j1} & C_{j1}\\ A_{j1} & B_{j2} & C_{j2}\\ & \ddots & \ddots & \ddots \\ & & A_{j(n-2)} & B_{j(n-1)} & C_{j(n-1)}\\ & & & A_{j(n-1)} & B_{jn} \end{array}\right] \end{split}\]with \(n = \mathrm{nblocks}\) diagonal blocks of size nb, can be represented as
\[\begin{split} M_j = \left[\begin{array}{cccc} L_{j1} \\ A_{j1} & L_{j2}\\ & \ddots & \ddots \\ & & A_{j(n-1)} & L_{jn} \end{array}\right] \left[\begin{array}{cccc} I & U_{j1} \\ & \ddots & \ddots \\ & & I & U_{j(n-1)}\\ & & & I \end{array}\right] = L_jU_j \end{split}\]where the blocks \(L_{ji}\) and \(U_{ji}\) are also general blocks of size nb.
- Parameters:
handle – [in] rocblas_handle.
nb – [in]
rocblas_int. nb >= 0.
The number of rows and columns of each block.
nblocks – [in]
rocblas_int. nblocks >= 0.
The number of blocks along the diagonal of each matrix in the batch.
A – [in]
array of pointers to type. Each pointer points to an array on the GPU of dimension lda*nb*(nblocks-1).
Contains the blocks A_{ji} arranged one after the other.
lda – [in]
rocblas_int. lda >= nb.
Specifies the leading dimension of blocks A_{ji}.
B – [inout]
array of pointers to type. Each pointer points to an array on the GPU of dimension ldb*nb*nblocks.
On entry, contains the blocks B_{ji} arranged one after the other. On exit it is overwritten by blocks L_{ji} in factorized form as returned by
GETRF_NPVTldb – [in]
rocblas_int. ldb >= nb.
Specifies the leading dimension of blocks B_{ji}.
C – [inout]
array of pointers to type. Each pointer points to an array on the GPU of dimension ldc*nb*(nblocks-1).
On entry, contains the blocks C_{ji} arranged one after the other. On exit it is overwritten by blocks U_{ji}.
ldc – [in]
rocblas_int. ldc >= nb.
Specifies the leading dimension of blocks C_{ji}.
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit for factorization of j-th batch instance. If info[j] = i > 0, the j-th batch instance is singular.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.1.9. rocsolver_<type>geblttrf_npvt_strided_batched()#
-
rocblas_status rocsolver_zgeblttrf_npvt_strided_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *B, const rocblas_int ldb, const rocblas_stride strideB, rocblas_double_complex *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cgeblttrf_npvt_strided_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *B, const rocblas_int ldb, const rocblas_stride strideB, rocblas_float_complex *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_dgeblttrf_npvt_strided_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, double *A, const rocblas_int lda, const rocblas_stride strideA, double *B, const rocblas_int ldb, const rocblas_stride strideB, double *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_sgeblttrf_npvt_strided_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, float *A, const rocblas_int lda, const rocblas_stride strideA, float *B, const rocblas_int ldb, const rocblas_stride strideB, float *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
GEBLTTRF_NPVT_STRIDED_BATCHED computes the LU factorization of a batch of block tridiagonal matrices without partial pivoting.
The LU factorization of a block tridiagonal matrix \(M_j\) in the batch
\[\begin{split} M_j = \left[\begin{array}{ccccc} B_{j1} & C_{j1}\\ A_{j1} & B_{j2} & C_{j2}\\ & \ddots & \ddots & \ddots \\ & & A_{j(n-2)} & B_{j(n-1)} & C_{j(n-1)}\\ & & & A_{j(n-1)} & B_{jn} \end{array}\right] \end{split}\]with \(n = \mathrm{nblocks}\) diagonal blocks of size nb, can be represented as
\[\begin{split} M_j = \left[\begin{array}{cccc} L_{j1} \\ A_{j1} & L_{j2}\\ & \ddots & \ddots \\ & & A_{j(n-1)} & L_{jn} \end{array}\right] \left[\begin{array}{cccc} I & U_{j1} \\ & \ddots & \ddots \\ & & I & U_{j(n-1)}\\ & & & I \end{array}\right] = L_jU_j \end{split}\]where the blocks \(L_{ji}\) and \(U_{ji}\) are also general blocks of size nb.
- Parameters:
handle – [in] rocblas_handle.
nb – [in]
rocblas_int. nb >= 0.
The number of rows and columns of each block.
nblocks – [in]
rocblas_int. nblocks >= 0.
The number of blocks along the diagonal of each matrix in the batch.
A – [in]
pointer to type. Array on the GPU (the size depends on the value of strideA).
Contains the blocks A_{ji} arranged one after the other.
lda – [in]
rocblas_int. lda >= nb.
Specifies the leading dimension of blocks A_{ji}.
strideA – [in]
rocblas_stride.
Stride from the start of one block A_{ji} to the same block in the next batch instance A_{(j+1)i}. There is no restriction for the value of strideA. Normal use case is strideA >= lda*nb*nblocks.
B – [inout]
pointer to type. Array on the GPU (the size depends on the value of strideB).
On entry, contains the blocks B_{ji} arranged one after the other. On exit it is overwritten by blocks L_{ji} in factorized form as returned by
GETRF_NPVTldb – [in]
rocblas_int. ldb >= nb.
Specifies the leading dimension of matrix blocks B_{ji}.
strideB – [in]
rocblas_stride.
Stride from the start of one block B_{ji} to the same block in the next batch instance B_{(j+1)i}. There is no restriction for the value of strideB. Normal use case is strideB >= ldb*nb*nblocks.
C – [inout]
pointer to type. Array on the GPU (the size depends on the value of strideC).
On entry, contains the blocks C_{ji} arranged one after the other. On exit it is overwritten by blocks U_{ji}.
ldc – [in]
rocblas_int. ldc >= nb.
Specifies the leading dimension of matrix blocks C_{ji}.
strideC – [in]
rocblas_stride.
Stride from the start of one block B_{ji} to the same block in the next batch instance B_{(j+1)i}. There is no restriction for the value of strideC. Normal use case is strideC >= ldc*nb*nblocks.
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit for factorization of j-th batch instance. If info[j] = i > 0, the j-th batch instance is singular.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.1.10. rocsolver_<type>geblttrf_npvt_interleaved_batched()#
-
rocblas_status rocsolver_zgeblttrf_npvt_interleaved_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, rocblas_double_complex *A, const rocblas_int inca, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *B, const rocblas_int incb, const rocblas_int ldb, const rocblas_stride strideB, rocblas_double_complex *C, const rocblas_int incc, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cgeblttrf_npvt_interleaved_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, rocblas_float_complex *A, const rocblas_int inca, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *B, const rocblas_int incb, const rocblas_int ldb, const rocblas_stride strideB, rocblas_float_complex *C, const rocblas_int incc, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_dgeblttrf_npvt_interleaved_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, double *A, const rocblas_int inca, const rocblas_int lda, const rocblas_stride strideA, double *B, const rocblas_int incb, const rocblas_int ldb, const rocblas_stride strideB, double *C, const rocblas_int incc, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_sgeblttrf_npvt_interleaved_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, float *A, const rocblas_int inca, const rocblas_int lda, const rocblas_stride strideA, float *B, const rocblas_int incb, const rocblas_int ldb, const rocblas_stride strideB, float *C, const rocblas_int incc, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
GEBLTTRF_NPVT_INTERLEAVED_BATCHED computes the LU factorization of a batch of block tridiagonal matrices without partial pivoting.
The LU factorization of a block tridiagonal matrix \(M_j\) in the batch
\[\begin{split} M_j = \left[\begin{array}{ccccc} B_{j1} & C_{j1}\\ A_{j1} & B_{j2} & C_{j2}\\ & \ddots & \ddots & \ddots \\ & & A_{j(n-2)} & B_{j(n-1)} & C_{j(n-1)}\\ & & & A_{j(n-1)} & B_{jn} \end{array}\right] \end{split}\]with \(n = \mathrm{nblocks}\) diagonal blocks of size nb, can be represented as
\[\begin{split} M_j = \left[\begin{array}{cccc} L_{j1} \\ A_{j1} & L_{j2}\\ & \ddots & \ddots \\ & & A_{j(n-1)} & L_{jn} \end{array}\right] \left[\begin{array}{cccc} I & U_{j1} \\ & \ddots & \ddots \\ & & I & U_{j(n-1)}\\ & & & I \end{array}\right] = L_jU_j \end{split}\]where the blocks \(L_{ji}\) and \(U_{ji}\) are also general blocks of size nb.
- Parameters:
handle – [in] rocblas_handle.
nb – [in]
rocblas_int. nb >= 0.
The number of rows and columns of each block.
nblocks – [in]
rocblas_int. nblocks >= 0.
The number of blocks along the diagonal of each matrix in the batch.
A – [in]
pointer to type. Array on the GPU (the size depends on the value of strideA).
Contains the blocks A_{ji} arranged one after the other.
inca – [in]
rocblas_int. inca > 0.
Stride from the start of one row of A_{ji} to the next. Normal use cases are inca = 1 (strided batched case) or inca = batch_count (interleaved batched case).
lda – [in]
rocblas_int. lda >= inca * nb.
Specifies the leading dimension of blocks A_{ji}, i.e. the stride from the start of one column of A_{ji} to the next.
strideA – [in]
rocblas_stride.
Stride from the start of one block A_{ji} to the same block in the next batch instance A_{(j+1)i}. There is no restriction for the value of strideA. Normal use cases are strideA >= lda*nb*nblocks (strided batched case) or strideA = 1 (interleaved batched case).
B – [inout]
pointer to type. Array on the GPU (the size depends on the value of strideB).
On entry, contains the blocks B_{ji} arranged one after the other. On exit it is overwritten by blocks L_{ji} in factorized form as returned by
GETRF_NPVTincb – [in]
rocblas_int. incb > 0.
Stride from the start of one row of B_{ji} to the next. Normal use cases are incb = 1 (strided batched case) or incb = batch_count (interleaved batched case).
ldb – [in]
rocblas_int. ldb >= incb * nb.
Specifies the leading dimension of blocks B_{ji}, i.e. the stride from the start of one column of B_{ji} to the next.
strideB – [in]
rocblas_stride.
Stride from the start of one block B_{ji} to the same block in the next batch instance B_{(j+1)i}. There is no restriction for the value of strideB. Normal use cases are strideB >= ldb*nb*nblocks (strided batched case) or strideB = 1 (interleaved batched case).
C – [inout]
pointer to type. Array on the GPU (the size depends on the value of strideC).
On entry, contains the blocks C_{ji} arranged one after the other. On exit it is overwritten by blocks U_{ji}.
incc – [in]
rocblas_int. incc > 0.
Stride from the start of one row of C_{ji} to the next. Normal use cases are incc = 1 (strided batched case) or incc = batch_count (interleaved batched case).
ldc – [in]
rocblas_int. ldc >= incc * nb.
Specifies the leading dimension of blocks C_{ji}, i.e. the stride from the start of one column of C_{ji} to the next.
strideC – [in]
rocblas_stride.
Stride from the start of one block B_{ji} to the same block in the next batch instance B_{(j+1)i}. There is no restriction for the value of strideC. Normal use cases are strideC >= ldc*nb*nblocks (strided batched case) or strideC = 1 (interleaved batched case).
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit for factorization of j-th batch instance. If info[j] = i > 0, the j-th batch instance is singular.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.2. Linear-systems solvers#
3.4.2.1. rocsolver_<type>getri_npvt()#
-
rocblas_status rocsolver_zgetri_npvt(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)#
-
rocblas_status rocsolver_cgetri_npvt(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)#
-
rocblas_status rocsolver_dgetri_npvt(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)#
-
rocblas_status rocsolver_sgetri_npvt(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)#
GETRI_NPVT inverts a general n-by-n matrix A using the LU factorization computed by GETRF_NPVT.
The inverse is computed by solving the linear system
\[ A^{-1}L = U^{-1} \]where L is the lower triangular factor of A with unit diagonal elements, and U is the upper triangular factor.
- Parameters:
handle – [in] rocblas_handle.
n – [in]
rocblas_int. n >= 0.
The number of rows and columns of the matrix A.
A – [inout]
pointer to type. Array on the GPU of dimension lda*n.
On entry, the factors L and U of the factorization A = L*U returned by
GETRF_NPVT. On exit, the inverse of A if info = 0; otherwise undefined.lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of A.
info – [out]
pointer to a rocblas_int on the GPU.
If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero pivot.
3.4.2.2. rocsolver_<type>getri_npvt_batched()#
-
rocblas_status rocsolver_zgetri_npvt_batched(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cgetri_npvt_batched(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_dgetri_npvt_batched(rocblas_handle handle, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_sgetri_npvt_batched(rocblas_handle handle, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
GETRI_NPVT_BATCHED inverts a batch of general n-by-n matrices using the LU factorization computed by GETRF_NPVT_BATCHED.
The inverse of matrix \(A_j\) in the batch is computed by solving the linear system
\[ A_j^{-1} L_j = U_j^{-1} \]where \(L_j\) is the lower triangular factor of \(A_j\) with unit diagonal elements, and \(U_j\) is the upper triangular factor.
- Parameters:
handle – [in] rocblas_handle.
n – [in]
rocblas_int. n >= 0.
The number of rows and columns of all matrices A_j in the batch.
A – [inout]
array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.
On entry, the factors L_j and U_j of the factorization A = L_j*U_j returned by
GETRF_NPVT_BATCHED. On exit, the inverses of A_j if info[j] = 0; otherwise undefined.lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of matrices A_j.
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit for inversion of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero pivot.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.2.3. rocsolver_<type>getri_npvt_strided_batched()#
-
rocblas_status rocsolver_zgetri_npvt_strided_batched(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cgetri_npvt_strided_batched(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_dgetri_npvt_strided_batched(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_sgetri_npvt_strided_batched(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
GETRI_NPVT_STRIDED_BATCHED inverts a batch of general n-by-n matrices using the LU factorization computed by GETRF_NPVT_STRIDED_BATCHED.
The inverse of matrix \(A_j\) in the batch is computed by solving the linear system
\[ A_j^{-1} L_j = U_j^{-1} \]where \(L_j\) is the lower triangular factor of \(A_j\) with unit diagonal elements, and \(U_j\) is the upper triangular factor.
- Parameters:
handle – [in] rocblas_handle.
n – [in]
rocblas_int. n >= 0.
The number of rows and columns of all matrices A_j in the batch.
A – [inout]
pointer to type. Array on the GPU (the size depends on the value of strideA).
On entry, the factors L_j and U_j of the factorization A_j = L_j*U_j returned by
GETRF_NPVT_STRIDED_BATCHED. On exit, the inverses of A_j if info[j] = 0; otherwise undefined.lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of matrices A_j.
strideA – [in]
rocblas_stride.
Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit for inversion of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero pivot.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.2.4. rocsolver_<type>getri_outofplace()#
-
rocblas_status rocsolver_zgetri_outofplace(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_double_complex *C, const rocblas_int ldc, rocblas_int *info)#
-
rocblas_status rocsolver_cgetri_outofplace(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_float_complex *C, const rocblas_int ldc, rocblas_int *info)#
-
rocblas_status rocsolver_dgetri_outofplace(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, double *C, const rocblas_int ldc, rocblas_int *info)#
-
rocblas_status rocsolver_sgetri_outofplace(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, float *C, const rocblas_int ldc, rocblas_int *info)#
GETRI_OUTOFPLACE computes the inverse \(C = A^{-1}\) of a general n-by-n matrix A.
The inverse is computed by solving the linear system
\[ AC = I \]where I is the identity matrix, and A is factorized as \(A = PLU\) as given by GETRF.
- Parameters:
handle – [in] rocblas_handle.
n – [in]
rocblas_int. n >= 0.
The number of rows and columns of the matrix A.
A – [in]
pointer to type. Array on the GPU of dimension lda*n.
The factors L and U of the factorization A = P*L*U returned by
GETRF.lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of A.
ipiv – [in]
pointer to rocblas_int. Array on the GPU of dimension n.
The pivot indices returned by
GETRF.C – [out]
pointer to type. Array on the GPU of dimension ldc*n.
If info = 0, the inverse of A. Otherwise, undefined.
ldc – [in]
rocblas_int. ldc >= n.
Specifies the leading dimension of C.
info – [out]
pointer to a rocblas_int on the GPU.
If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero pivot.
3.4.2.5. rocsolver_<type>getri_outofplace_batched()#
-
rocblas_status rocsolver_zgetri_outofplace_batched(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_double_complex *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cgetri_outofplace_batched(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_float_complex *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_dgetri_outofplace_batched(rocblas_handle handle, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, double *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_sgetri_outofplace_batched(rocblas_handle handle, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, float *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#
GETRI_OUTOFPLACE_BATCHED computes the inverse \(C_j = A_j^{-1}\) of a batch of general n-by-n matrices \(A_j\).
The inverse is computed by solving the linear system
\[ A_j C_j = I \]where I is the identity matrix, and \(A_j\) is factorized as \(A_j = P_j L_j U_j\) as given by GETRF_BATCHED.
- Parameters:
handle – [in] rocblas_handle.
n – [in]
rocblas_int. n >= 0.
The number of rows and columns of all matrices A_j in the batch.
A – [in]
array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.
The factors L_j and U_j of the factorization A_j = P_j*L_j*U_j returned by
GETRF_BATCHED.lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of matrices A_j.
ipiv – [in]
pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).
The pivot indices returned by
GETRF_BATCHED.strideP – [in]
rocblas_stride.
Stride from the start of one vector ipiv_j to the next one ipiv_(i+j). There is no restriction for the value of strideP. Normal use case is strideP >= n.
C – [out]
array of pointers to type. Each pointer points to an array on the GPU of dimension ldc*n.
If info[j] = 0, the inverse of matrices A_j. Otherwise, undefined.
ldc – [in]
rocblas_int. ldc >= n.
Specifies the leading dimension of C_j.
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit for inversion of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero pivot.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.2.6. rocsolver_<type>getri_outofplace_strided_batched()#
-
rocblas_status rocsolver_zgetri_outofplace_strided_batched(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_double_complex *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cgetri_outofplace_strided_batched(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_float_complex *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_dgetri_outofplace_strided_batched(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, double *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_sgetri_outofplace_strided_batched(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, float *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
GETRI_OUTOFPLACE_STRIDED_BATCHED computes the inverse \(C_j = A_j^{-1}\) of a batch of general n-by-n matrices \(A_j\).
The inverse is computed by solving the linear system
\[ A_j C_j = I \]where I is the identity matrix, and \(A_j\) is factorized as \(A_j = P_j L_j U_j\) as given by GETRF_STRIDED_BATCHED.
- Parameters:
handle – [in] rocblas_handle.
n – [in]
rocblas_int. n >= 0.
The number of rows and columns of all matrices A_j in the batch.
A – [in]
pointer to type. Array on the GPU (the size depends on the value of strideA).
The factors L_j and U_j of the factorization A_j = P_j*L_j*U_j returned by
GETRF_STRIDED_BATCHED.lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of matrices A_j.
strideA – [in]
rocblas_stride.
Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n
ipiv – [in]
pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).
The pivot indices returned by
GETRF_STRIDED_BATCHED.strideP – [in]
rocblas_stride.
Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.
C – [out]
pointer to type. Array on the GPU (the size depends on the value of strideC).
If info[j] = 0, the inverse of matrices A_j. Otherwise, undefined.
ldc – [in]
rocblas_int. ldc >= n.
Specifies the leading dimension of C_j.
strideC – [in]
rocblas_stride.
Stride from the start of one matrix C_j to the next one C_(j+1). There is no restriction for the value of strideC. Normal use case is strideC >= ldc*n
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit for inversion of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero pivot.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.2.7. rocsolver_<type>getri_npvt_outofplace()#
-
rocblas_status rocsolver_zgetri_npvt_outofplace(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *C, const rocblas_int ldc, rocblas_int *info)#
-
rocblas_status rocsolver_cgetri_npvt_outofplace(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *C, const rocblas_int ldc, rocblas_int *info)#
-
rocblas_status rocsolver_dgetri_npvt_outofplace(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, double *C, const rocblas_int ldc, rocblas_int *info)#
-
rocblas_status rocsolver_sgetri_npvt_outofplace(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, float *C, const rocblas_int ldc, rocblas_int *info)#
GETRI_NPVT_OUTOFPLACE computes the inverse \(C = A^{-1}\) of a general n-by-n matrix A without partial pivoting.
The inverse is computed by solving the linear system
\[ AC = I \]where I is the identity matrix, and A is factorized as \(A = LU\) as given by GETRF_NPVT.
- Parameters:
handle – [in] rocblas_handle.
n – [in]
rocblas_int. n >= 0.
The number of rows and columns of the matrix A.
A – [in]
pointer to type. Array on the GPU of dimension lda*n.
The factors L and U of the factorization A = L*U returned by
GETRF_NPVT.lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of A.
C – [out]
pointer to type. Array on the GPU of dimension ldc*n.
If info = 0, the inverse of A. Otherwise, undefined.
ldc – [in]
rocblas_int. ldc >= n.
Specifies the leading dimension of C.
info – [out]
pointer to a rocblas_int on the GPU.
If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero pivot.
3.4.2.8. rocsolver_<type>getri_npvt_outofplace_batched()#
-
rocblas_status rocsolver_zgetri_npvt_outofplace_batched(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cgetri_npvt_outofplace_batched(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_dgetri_npvt_outofplace_batched(rocblas_handle handle, const rocblas_int n, double *const A[], const rocblas_int lda, double *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_sgetri_npvt_outofplace_batched(rocblas_handle handle, const rocblas_int n, float *const A[], const rocblas_int lda, float *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#
GETRI_NPVT_OUTOFPLACE_BATCHED computes the inverse \(C_j = A_j^{-1}\) of a batch of general n-by-n matrices \(A_j\) without partial pivoting.
The inverse is computed by solving the linear system
\[ A_j C_j = I \]where I is the identity matrix, and \(A_j\) is factorized as \(A_j = L_j U_j\) as given by GETRF_NPVT_BATCHED.
- Parameters:
handle – [in] rocblas_handle.
n – [in]
rocblas_int. n >= 0.
The number of rows and columns of all matrices A_j in the batch.
A – [in]
array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.
The factors L_j and U_j of the factorization A_j = L_j*U_j returned by
GETRF_NPVT_BATCHED.lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of matrices A_j.
C – [out]
array of pointers to type. Each pointer points to an array on the GPU of dimension ldc*n.
If info[j] = 0, the inverse of matrices A_j. Otherwise, undefined.
ldc – [in]
rocblas_int. ldc >= n.
Specifies the leading dimension of C_j.
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit for inversion of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero pivot.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.2.9. rocsolver_<type>getri_npvt_outofplace_strided_batched()#
-
rocblas_status rocsolver_zgetri_npvt_outofplace_strided_batched(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cgetri_npvt_outofplace_strided_batched(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_dgetri_npvt_outofplace_strided_batched(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_sgetri_npvt_outofplace_strided_batched(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
GETRI_NPVT_OUTOFPLACE_STRIDED_BATCHED computes the inverse \(C_j = A_j^{-1}\) of a batch of general n-by-n matrices \(A_j\) without partial pivoting.
The inverse is computed by solving the linear system
\[ A_j C_j = I \]where I is the identity matrix, and \(A_j\) is factorized as \(A_j = L_j U_j\) as given by GETRF_NPVT_STRIDED_BATCHED.
- Parameters:
handle – [in] rocblas_handle.
n – [in]
rocblas_int. n >= 0.
The number of rows and columns of all matrices A_j in the batch.
A – [in]
pointer to type. Array on the GPU (the size depends on the value of strideA).
The factors L_j and U_j of the factorization A_j = L_j*U_j returned by
GETRF_NPVT_STRIDED_BATCHED.lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of matrices A_j.
strideA – [in]
rocblas_stride.
Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n
C – [out]
pointer to type. Array on the GPU (the size depends on the value of strideC).
If info[j] = 0, the inverse of matrices A_j. Otherwise, undefined.
ldc – [in]
rocblas_int. ldc >= n.
Specifies the leading dimension of C_j.
strideC – [in]
rocblas_stride.
Stride from the start of one matrix C_j to the next one C_(j+1). There is no restriction for the value of strideC. Normal use case is strideC >= ldc*n
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit for inversion of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero pivot.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.2.10. rocsolver_<type>geblttrs_npvt()#
-
rocblas_status rocsolver_zgeblttrs_npvt(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, const rocblas_int nrhs, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *B, const rocblas_int ldb, rocblas_double_complex *C, const rocblas_int ldc, rocblas_double_complex *X, const rocblas_int ldx)#
-
rocblas_status rocsolver_cgeblttrs_npvt(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, const rocblas_int nrhs, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *B, const rocblas_int ldb, rocblas_float_complex *C, const rocblas_int ldc, rocblas_float_complex *X, const rocblas_int ldx)#
-
rocblas_status rocsolver_dgeblttrs_npvt(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, const rocblas_int nrhs, double *A, const rocblas_int lda, double *B, const rocblas_int ldb, double *C, const rocblas_int ldc, double *X, const rocblas_int ldx)#
-
rocblas_status rocsolver_sgeblttrs_npvt(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, const rocblas_int nrhs, float *A, const rocblas_int lda, float *B, const rocblas_int ldb, float *C, const rocblas_int ldc, float *X, const rocblas_int ldx)#
GEBLTTRS_NPVT solves a system of linear equations given by a block tridiagonal matrix in its factorized form (without partial pivoting).
The linear system has the form
\[\begin{split} MX = \left[\begin{array}{ccccc} B_1 & C_1\\ A_1 & B_2 & C_2\\ & \ddots & \ddots & \ddots \\ & & A_{n-2} & B_{n-1} & C_{n-1}\\ & & & A_{n-1} & B_n \end{array}\right]\left[\begin{array}{c} X_1\\ X_2\\ X_3\\ \vdots\\ X_n \end{array}\right]=\left[\begin{array}{c} R_1\\ R_2\\ R_3\\ \vdots\\ R_n \end{array}\right]=R \end{split}\]where matrix M has \(n = \mathrm{nblocks}\) diagonal blocks of size nb, and the right-hand-side blocks \(R_i\) are general blocks of size nb-by-nrhs. The blocks of matrix M should be in the factorized form as returned by GEBLTTRF_NPVT.
- Parameters:
handle – [in] rocblas_handle.
nb – [in]
rocblas_int. nb >= 0.
The number of rows and columns of each block.
nblocks – [in]
rocblas_int. nblocks >= 0.
The number of blocks along the diagonal of the matrix.
nrhs – [in]
rocblas_int. nrhs >= 0.
The number of right hand sides, i.e., the number of columns of blocks R_i.
A – [in]
pointer to type. Array on the GPU of dimension lda*nb*(nblocks-1).
Contains the blocks A_i as returned by
GEBLTTRF_NPVT.lda – [in]
rocblas_int. lda >= nb.
Specifies the leading dimension of blocks A_i.
B – [in]
pointer to type. Array on the GPU of dimension ldb*nb*nblocks.
Contains the blocks B_i as returned by
GEBLTTRF_NPVT.ldb – [in]
rocblas_int. ldb >= nb.
Specifies the leading dimension of blocks B_i.
C – [in]
pointer to type. Array on the GPU of dimension ldc*nb*(nblocks-1).
Contains the blocks C_i as returned by
GEBLTTRF_NPVT.ldc – [in]
rocblas_int. ldc >= nb.
Specifies the leading dimension of blocks C_i.
X – [inout]
pointer to type. Array on the GPU of dimension ldx*nblocks*nrhs.
On entry, X contains the right-hand-side blocks R_i. It is overwritten by solution vectors X_i on exit.
ldx – [in]
rocblas_int. ldx >= nb.
Specifies the leading dimension of blocks X_i.
3.4.2.11. rocsolver_<type>geblttrs_npvt_batched()#
-
rocblas_status rocsolver_zgeblttrs_npvt_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, const rocblas_int nrhs, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *const B[], const rocblas_int ldb, rocblas_double_complex *const C[], const rocblas_int ldc, rocblas_double_complex *const X[], const rocblas_int ldx, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cgeblttrs_npvt_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, const rocblas_int nrhs, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *const B[], const rocblas_int ldb, rocblas_float_complex *const C[], const rocblas_int ldc, rocblas_float_complex *const X[], const rocblas_int ldx, const rocblas_int batch_count)#
-
rocblas_status rocsolver_dgeblttrs_npvt_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, const rocblas_int nrhs, double *const A[], const rocblas_int lda, double *const B[], const rocblas_int ldb, double *const C[], const rocblas_int ldc, double *const X[], const rocblas_int ldx, const rocblas_int batch_count)#
-
rocblas_status rocsolver_sgeblttrs_npvt_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, const rocblas_int nrhs, float *const A[], const rocblas_int lda, float *const B[], const rocblas_int ldb, float *const C[], const rocblas_int ldc, float *const X[], const rocblas_int ldx, const rocblas_int batch_count)#
GEBLTTRS_NPVT_BATCHED solves a batch of system of linear equations given by block tridiagonal matrices in its factorized form (without partial pivoting).
Each linear system has the form
\[\begin{split} M_jX_j = \left[\begin{array}{ccccc} B_{j1} & C_{j1}\\ A_{j1} & B_{j2} & C_{j2}\\ & \ddots & \ddots & \ddots \\ & & A_{j(n-2)} & B_{j(n-1)} & C_{j(n-1)}\\ & & & A_{j(n-1)} & B_{jn} \end{array}\right]\left[\begin{array}{c} X_{j1}\\ X_{j2}\\ X_{j3}\\ \vdots\\ X_{jn} \end{array}\right]=\left[\begin{array}{c} R_{j1}\\ R_{j2}\\ R_{j3}\\ \vdots\\ R_{jn} \end{array}\right]=R_j \end{split}\]where matrix \(M_j\) has \(n = \mathrm{nblocks}\) diagonal blocks of size nb, and the right-hand-side blocks \(R_{ji}\) are general blocks of size nb-by-nrhs. The blocks of matrix \(M_j\) should be in the factorized form as returned by GEBLTTRF_NPVT_BATCHED.
- Parameters:
handle – [in] rocblas_handle.
nb – [in]
rocblas_int. nb >= 0.
The number of rows and columns of each block.
nblocks – [in]
rocblas_int. nblocks >= 0.
The number of blocks along the diagonal of each matrix in the batch.
nrhs – [in]
rocblas_int. nrhs >= 0.
The number of right hand sides, i.e., the number of columns of blocks R_{ji}.
A – [in]
array of pointers to type. Each pointer points to an array on the GPU of dimension lda*nb*(nblocks-1).
Contains the blocks A_{ji} as returned by
GEBLTTRF_NPVT_BATCHED.lda – [in]
rocblas_int. lda >= nb.
Specifies the leading dimension of blocks A_{ji}.
B – [in]
array of pointers to type. Each pointer points to an array on the GPU of dimension lda*nb*nblocks.
Contains the blocks B_{ji} as returned by
GEBLTTRF_NPVT_BATCHED.ldb – [in]
rocblas_int. ldb >= nb.
Specifies the leading dimension of blocks B_{ji}.
C – [in]
array of pointers to type. Each pointer points to an array on the GPU of dimension ldc*nb*(nblocks-1).
Contains the blocks C_{ji} as returned by
GEBLTTRF_NPVT_BATCHED.ldc – [in]
rocblas_int. ldc >= nb.
Specifies the leading dimension of blocks C_{ji}.
X – [inout]
array of pointers to type. Each pointer points to an array on the GPU of dimension ldx*nblocks*nrhs.
On entry, X contains the right-hand-side blocks R_{ji}. It is overwritten by solution vectors X_{ji} on exit.
ldx – [in]
rocblas_int. ldx >= nb.
Specifies the leading dimension of blocks X_{ji}.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.2.12. rocsolver_<type>geblttrs_npvt_strided_batched()#
-
rocblas_status rocsolver_zgeblttrs_npvt_strided_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, const rocblas_int nrhs, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *B, const rocblas_int ldb, const rocblas_stride strideB, rocblas_double_complex *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_double_complex *X, const rocblas_int ldx, const rocblas_stride strideX, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cgeblttrs_npvt_strided_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, const rocblas_int nrhs, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *B, const rocblas_int ldb, const rocblas_stride strideB, rocblas_float_complex *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_float_complex *X, const rocblas_int ldx, const rocblas_stride strideX, const rocblas_int batch_count)#
-
rocblas_status rocsolver_dgeblttrs_npvt_strided_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, const rocblas_int nrhs, double *A, const rocblas_int lda, const rocblas_stride strideA, double *B, const rocblas_int ldb, const rocblas_stride strideB, double *C, const rocblas_int ldc, const rocblas_stride strideC, double *X, const rocblas_int ldx, const rocblas_stride strideX, const rocblas_int batch_count)#
-
rocblas_status rocsolver_sgeblttrs_npvt_strided_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, const rocblas_int nrhs, float *A, const rocblas_int lda, const rocblas_stride strideA, float *B, const rocblas_int ldb, const rocblas_stride strideB, float *C, const rocblas_int ldc, const rocblas_stride strideC, float *X, const rocblas_int ldx, const rocblas_stride strideX, const rocblas_int batch_count)#
GEBLTTRS_NPVT_STRIDED_BATCHED solves a batch of system of linear equations given by block tridiagonal matrices in its factorized form (without partial pivoting).
Each linear system has the form
\[\begin{split} M_jX_j = \left[\begin{array}{ccccc} B_{j1} & C_{j1}\\ A_{j1} & B_{j2} & C_{j2}\\ & \ddots & \ddots & \ddots \\ & & A_{j(n-2)} & B_{j(n-1)} & C_{j(n-1)}\\ & & & A_{j(n-1)} & B_{jn} \end{array}\right]\left[\begin{array}{c} X_{j1}\\ X_{j2}\\ X_{j3}\\ \vdots\\ X_{jn} \end{array}\right]=\left[\begin{array}{c} R_{j1}\\ R_{j2}\\ R_{j3}\\ \vdots\\ R_{jn} \end{array}\right]=R_j \end{split}\]where matrix \(M_j\) has \(n = \mathrm{nblocks}\) diagonal blocks of size nb, and the right-hand-side blocks \(R_{ji}\) are general blocks of size nb-by-nrhs. The blocks of matrix \(M_j\) should be in the factorized form as returned by GEBLTTRF_NPVT_STRIDED_BATCHED.
- Parameters:
handle – [in] rocblas_handle.
nb – [in]
rocblas_int. nb >= 0.
The number of rows and columns of each block.
nblocks – [in]
rocblas_int. nblocks >= 0.
The number of blocks along the diagonal of each matrix in the batch.
nrhs – [in]
rocblas_int. nrhs >= 0.
The number of right hand sides, i.e., the number of columns of blocks R_{ji}.
A – [in]
pointer to type. Array on the GPU (the size depends on the value of strideA).
Contains the blocks A_{ji} as returned by
GEBLTTRF_NPVT_STRIDED_BATCHED.lda – [in]
rocblas_int. lda >= nb.
Specifies the leading dimension of blocks A_{ji}.
strideA – [in]
rocblas_stride.
Stride from the start of one block A_{ji} to the same block in the next batch instance A_{(j+1)i}. There is no restriction for the value of strideA. Normal use case is strideA >= lda*nb*nblocks
B – [in]
pointer to type. Array on the GPU (the size depends on the value of strideB).
Contains the blocks B_{ji} as returned by
GEBLTTRF_NPVT_STRIDED_BATCHED.ldb – [in]
rocblas_int. ldb >= nb.
Specifies the leading dimension of blocks B_{ji}.
strideB – [in]
rocblas_stride.
Stride from the start of one block B_{ji} to the same block in the next batch instance B_{(j+1)i}. There is no restriction for the value of strideB. Normal use case is strideB >= ldb*nb*nblocks
C – [in]
pointer to type. Array on the GPU (the size depends on the value of strideC).
Contains the blocks C_{ji} as returned by
GEBLTTRF_NPVT_STRIDED_BATCHED.ldc – [in]
rocblas_int. ldc >= nb.
Specifies the leading dimension of blocks C_{ji}.
strideC – [in]
rocblas_stride.
Stride from the start of one block C_{ji} to the same block in the next batch instance C_{(j+1)i}. There is no restriction for the value of strideC. Normal use case is strideC >= ldc*nb*nblocks
X – [inout]
pointer to type. Array on the GPU (the size depends on the value of strideX).
On entry, X contains the right-hand-side blocks R_{ji}. It is overwritten by solution vectors X_{ji} on exit.
ldx – [in]
rocblas_int. ldx >= nb.
Specifies the leading dimension of blocks X_{ji}.
strideX – [in]
rocblas_stride.
Stride from the start of one block X_{ji} to the same block in the next batch instance X_{(j+1)i}. There is no restriction for the value of strideX. Normal use case is strideX >= ldx*nblocks*nrhs
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.2.13. rocsolver_<type>geblttrs_npvt_interleaved_batched()#
-
rocblas_status rocsolver_zgeblttrs_npvt_interleaved_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, const rocblas_int nrhs, rocblas_double_complex *A, const rocblas_int inca, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *B, const rocblas_int incb, const rocblas_int ldb, const rocblas_stride strideB, rocblas_double_complex *C, const rocblas_int incc, const rocblas_int ldc, const rocblas_stride strideC, rocblas_double_complex *X, const rocblas_int incx, const rocblas_int ldx, const rocblas_stride strideX, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cgeblttrs_npvt_interleaved_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, const rocblas_int nrhs, rocblas_float_complex *A, const rocblas_int inca, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *B, const rocblas_int incb, const rocblas_int ldb, const rocblas_stride strideB, rocblas_float_complex *C, const rocblas_int incc, const rocblas_int ldc, const rocblas_stride strideC, rocblas_float_complex *X, const rocblas_int incx, const rocblas_int ldx, const rocblas_stride strideX, const rocblas_int batch_count)#
-
rocblas_status rocsolver_dgeblttrs_npvt_interleaved_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, const rocblas_int nrhs, double *A, const rocblas_int inca, const rocblas_int lda, const rocblas_stride strideA, double *B, const rocblas_int incb, const rocblas_int ldb, const rocblas_stride strideB, double *C, const rocblas_int incc, const rocblas_int ldc, const rocblas_stride strideC, double *X, const rocblas_int incx, const rocblas_int ldx, const rocblas_stride strideX, const rocblas_int batch_count)#
-
rocblas_status rocsolver_sgeblttrs_npvt_interleaved_batched(rocblas_handle handle, const rocblas_int nb, const rocblas_int nblocks, const rocblas_int nrhs, float *A, const rocblas_int inca, const rocblas_int lda, const rocblas_stride strideA, float *B, const rocblas_int incb, const rocblas_int ldb, const rocblas_stride strideB, float *C, const rocblas_int incc, const rocblas_int ldc, const rocblas_stride strideC, float *X, const rocblas_int incx, const rocblas_int ldx, const rocblas_stride strideX, const rocblas_int batch_count)#
GEBLTTRS_NPVT_INTERLEAVED_BATCHED solves a batch of system of linear equations given by block tridiagonal matrices in its factorized form (without partial pivoting).
Each linear system has the form
\[\begin{split} M_jX_j = \left[\begin{array}{ccccc} B_{j1} & C_{j1}\\ A_{j1} & B_{j2} & C_{j2}\\ & \ddots & \ddots & \ddots \\ & & A_{j(n-2)} & B_{j(n-1)} & C_{j(n-1)}\\ & & & A_{j(n-1)} & B_{jn} \end{array}\right]\left[\begin{array}{c} X_{j1}\\ X_{j2}\\ X_{j3}\\ \vdots\\ X_{jn} \end{array}\right]=\left[\begin{array}{c} R_{j1}\\ R_{j2}\\ R_{j3}\\ \vdots\\ R_{jn} \end{array}\right]=R_j \end{split}\]where matrix \(M_j\) has \(n = \mathrm{nblocks}\) diagonal blocks of size nb, and the right-hand-side blocks \(R_{ji}\) are general blocks of size nb-by-nrhs. The blocks of matrix \(M_j\) should be in the factorized form as returned by GEBLTTRF_NPVT_INTERLEAVED_BATCHED.
- Parameters:
handle – [in] rocblas_handle.
nb – [in]
rocblas_int. nb >= 0.
The number of rows and columns of each block.
nblocks – [in]
rocblas_int. nblocks >= 0.
The number of blocks along the diagonal of each matrix in the batch.
nrhs – [in]
rocblas_int. nrhs >= 0.
The number of right hand sides, i.e., the number of columns of blocks R_{ji}.
A – [in]
pointer to type. Array on the GPU (the size depends on the value of strideA).
Contains the blocks A_{ji} as returned by
GEBLTTRF_NPVT_INTERLEAVED_BATCHED.inca – [in]
rocblas_int. inca > 0.
Stride from the start of one row of A_{ji} to the next. Normal use cases are inca = 1 (strided batched case) or inca = batch_count (interleaved batched case).
lda – [in]
rocblas_int. lda >= inca * nb.
Specifies the leading dimension of blocks A_{ji}, i.e. the stride from the start of one column of A_{ji} to the next.
strideA – [in]
rocblas_stride.
Stride from the start of one block A_{ji} to the same block in the next batch instance A_{(j+1)i}. There is no restriction for the value of strideA. Normal use cases are strideA >= lda*nb*nblocks (strided batched case) or strideA = 1 (interleaved batched case).
B – [in]
pointer to type. Array on the GPU (the size depends on the value of strideB).
Contains the blocks B_{ji} as returned by
GEBLTTRF_NPVT_INTERLEAVED_BATCHED.incb – [in]
rocblas_int. incb > 0.
Stride from the start of one row of B_{ji} to the next. Normal use cases are incb = 1 (strided batched case) or incb = batch_count (interleaved batched case).
ldb – [in]
rocblas_int. ldb >= incb * nb.
Specifies the leading dimension of blocks B_{ji}, i.e. the stride from the start of one column of B_{ji} to the next.
strideB – [in]
rocblas_stride.
Stride from the start of one block B_{ji} to the same block in the next batch instance B_{(j+1)i}. There is no restriction for the value of strideB. Normal use cases are strideB >= ldb*nb*nblocks (strided batched case) or strideB = 1 (interleaved batched case).
C – [in]
pointer to type. Array on the GPU (the size depends on the value of strideC).
Contains the blocks C_{ji} as returned by
GEBLTTRF_NPVT_INTERLEAVED_BATCHED.incc – [in]
rocblas_int. incc > 0.
Stride from the start of one row of C_{ji} to the next. Normal use cases are incc = 1 (strided batched case) or incc = batch_count (interleaved batched case).
ldc – [in]
rocblas_int. ldc >= incc * nb.
Specifies the leading dimension of blocks C_{ji}, i.e. the stride from the start of one column of C_{ji} to the next.
strideC – [in]
rocblas_stride.
Stride from the start of one block C_{ji} to the same block in the next batch instance C_{(j+1)i}. There is no restriction for the value of strideC. Normal use cases are strideC >= ldc*nb*nblocks (strided batched case) or strideC = 1 (interleaved batched case).
X – [inout]
pointer to type. Array on the GPU (the size depends on the value of strideX).
On entry, X contains the right-hand-side blocks R_{ji}. It is overwritten by solution vectors X_{ji} on exit.
incx – [in]
rocblas_int. incx > 0.
Stride from the start of one row of X_{ji} to the next. Normal use cases are incx = 1 (strided batched case) or incx = batch_count (interleaved batched case).
ldx – [in]
rocblas_int. ldx >= incx * nb.
Specifies the leading dimension of blocks X_{ji}, i.e. the stride from the start of one column of X_{ji} to the next.
strideX – [in]
rocblas_stride.
Stride from the start of one block X_{ji} to the same block in the next batch instance X_{(j+1)i}. There is no restriction for the value of strideX. Normal use cases are strideX >= ldx*nrhs*nblocks (strided batched case) or strideX = 1 (interleaved batched case).
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.3. Symmetric eigensolvers#
3.4.3.1. rocsolver_<type>syevj()#
-
rocblas_status rocsolver_dsyevj(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, rocblas_int *info)#
-
rocblas_status rocsolver_ssyevj(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, rocblas_int *info)#
SYEVJ computes the eigenvalues and optionally the eigenvectors of a real symmetric matrix A.
The eigenvalues are found using the iterative Jacobi algorithm and are returned in an order depending on the value of esort. The eigenvectors are computed depending on the value of evect. The computed eigenvectors are orthonormal.
At the \(k\)-th iteration (or “sweep”), \(A\) is transformed by a product of Jacobi rotations \(V\) as
\[ A^{(k)} = V' A^{(k-1)} V \]such that \(off(A^{(k)}) < off(A^{(k-1)})\), where \(A^{(0)} = A\) and \(off(A^{(k)})\) is the Frobenius norm of the off-diagonal elements of \(A^{(k)}\). As \(off(A^{(k)}) \rightarrow 0\), the diagonal elements of \(A^{(k)}\) increasingly resemble the eigenvalues of \(A\).
Note
In order to carry out calculations, this method may synchronize the stream contained within the rocblas_handle.
- Parameters:
handle – [in] rocblas_handle.
esort – [in] rocblas_esort
.
Specifies the order of the returned eigenvalues. If esort is rocblas_esort_ascending, then the eigenvalues are sorted and returned in ascending order. If esort is rocblas_esort_none, then the order of the returned eigenvalues is unspecified.
evect – [in] rocblas_evect
.
Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.
uplo – [in]
rocblas_fill.
Specifies whether the upper or lower part of the symmetric matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.
n – [in]
rocblas_int. n >= 0.
Number of rows and columns of matrix A.
A – [inout]
pointer to type. Array on the GPU of dimension lda*n.
On entry, the matrix A. On exit, the eigenvectors of A if they were computed and the algorithm converged; otherwise the contents of A are unchanged.
lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of matrix A.
abstol – [in]
type.
The absolute tolerance. The algorithm is considered to have converged once off(A) is <= norm(A) * abstol. If abstol <= 0, then the tolerance will be set to machine precision.
residual – [out]
pointer to type on the GPU.
The Frobenius norm of the off-diagonal elements of A (i.e. off(A)) at the final iteration.
max_sweeps – [in]
rocblas_int. max_sweeps > 0.
Maximum number of sweeps (iterations) to be used by the algorithm.
n_sweeps – [out]
pointer to a rocblas_int on the GPU.
The actual number of sweeps (iterations) used by the algorithm.
W – [out]
pointer to type. Array on the GPU of dimension n.
The eigenvalues of A in increasing order.
info – [out]
pointer to a rocblas_int on the GPU.
If info = 0, successful exit. If info = 1, the algorithm did not converge.
3.4.3.2. rocsolver_<type>syevj_batched()#
-
rocblas_status rocsolver_dsyevj_batched(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_ssyevj_batched(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
SYEVJ_BATCHED computes the eigenvalues and optionally the eigenvectors of a batch of real symmetric matrices A_j.
The eigenvalues are found using the iterative Jacobi algorithm and are returned in an order depending on the value of esort. The eigenvectors are computed depending on the value of evect. The computed eigenvectors are orthonormal.
At the \(k\)-th iteration (or “sweep”), \(A_j\) is transformed by a product of Jacobi rotations \(V_j\) as
\[ A_j^{(k)} = V_j' A_j^{(k-1)} V_j \]such that \(off(A_j^{(k)}) < off(A_j^{(k-1)})\), where \(A_j^{(0)} = A_j\) and \(off(A_j^{(k)})\) is the Frobenius norm of the off-diagonal elements of \(A_j^{(k)}\). As \(off(A_j^{(k)}) \rightarrow 0\), the diagonal elements of \(A_j^{(k)}\) increasingly resemble the eigenvalues of \(A_j\).
Note
In order to carry out calculations, this method may synchronize the stream contained within the rocblas_handle.
- Parameters:
handle – [in] rocblas_handle.
esort – [in] rocblas_esort
.
Specifies the order of the returned eigenvalues. If esort is rocblas_esort_ascending, then the eigenvalues are sorted and returned in ascending order. If esort is rocblas_esort_none, then the order of the returned eigenvalues is unspecified.
evect – [in] rocblas_evect
.
Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.
uplo – [in]
rocblas_fill.
Specifies whether the upper or lower part of the symmetric matrices A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.
n – [in]
rocblas_int. n >= 0.
Number of rows and columns of matrices A_j.
A – [inout]
Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.
On entry, the matrices A_j. On exit, the eigenvectors of A_j if they were computed and the algorithm converged; otherwise the contents of A_j are unchanged.
lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of matrices A_j.
abstol – [in]
type.
The absolute tolerance. The algorithm is considered to have converged once off(A_j) is <= norm(A_j) * abstol. If abstol <= 0, then the tolerance will be set to machine precision.
residual – [out]
pointer to type. Array of batch_count scalars on the GPU.
The Frobenius norm of the off-diagonal elements of A_j (i.e. off(A_j)) at the final iteration.
max_sweeps – [in]
rocblas_int. max_sweeps > 0.
Maximum number of sweeps (iterations) to be used by the algorithm.
n_sweeps – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
The actual number of sweeps (iterations) used by the algorithm for each batch instance.
W – [out]
pointer to type. Array on the GPU (the size depends on the value of strideW).
The eigenvalues of A_j in increasing order.
strideW – [in]
rocblas_stride.
Stride from the start of one vector W_j to the next one W_(j+1). There is no restriction for the value of strideW. Normal use case is strideW >= n.
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit for matrix A_j. If info[j] = 1, the algorithm did not converge.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.3.3. rocsolver_<type>syevj_strided_batched()#
-
rocblas_status rocsolver_dsyevj_strided_batched(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_ssyevj_strided_batched(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
SYEVJ_STRIDED_BATCHED computes the eigenvalues and optionally the eigenvectors of a batch of real symmetric matrices A_j.
The eigenvalues are found using the iterative Jacobi algorithm and are returned in an order depending on the value of esort. The eigenvectors are computed depending on the value of evect. The computed eigenvectors are orthonormal.
At the \(k\)-th iteration (or “sweep”), \(A_j\) is transformed by a product of Jacobi rotations \(V_j\) as
\[ A_j^{(k)} = V_j' A_j^{(k-1)} V_j \]such that \(off(A_j^{(k)}) < off(A_j^{(k-1)})\), where \(A_j^{(0)} = A_j\) and \(off(A_j^{(k)})\) is the Frobenius norm of the off-diagonal elements of \(A_j^{(k)}\). As \(off(A_j^{(k)}) \rightarrow 0\), the diagonal elements of \(A_j^{(k)}\) increasingly resemble the eigenvalues of \(A_j\).
Note
In order to carry out calculations, this method may synchronize the stream contained within the rocblas_handle.
- Parameters:
handle – [in] rocblas_handle.
esort – [in] rocblas_esort
.
Specifies the order of the returned eigenvalues. If esort is rocblas_esort_ascending, then the eigenvalues are sorted and returned in ascending order. If esort is rocblas_esort_none, then the order of the returned eigenvalues is unspecified.
evect – [in] rocblas_evect
.
Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.
uplo – [in]
rocblas_fill.
Specifies whether the upper or lower part of the symmetric matrices A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.
n – [in]
rocblas_int. n >= 0.
Number of rows and columns of matrices A_j.
A – [inout]
pointer to type. Array on the GPU (the size depends on the value of strideA).
On entry, the matrices A_j. On exit, the eigenvectors of A_j if they were computed and the algorithm converged; otherwise the contents of A_j are unchanged.
lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of matrices A_j.
strideA – [in]
rocblas_stride.
Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.
abstol – [in]
type.
The absolute tolerance. The algorithm is considered to have converged once off(A_j) is <= norm(A_j) * abstol. If abstol <= 0, then the tolerance will be set to machine precision.
residual – [out]
pointer to type. Array of batch_count scalars on the GPU.
The Frobenius norm of the off-diagonal elements of A_j (i.e. off(A_j)) at the final iteration.
max_sweeps – [in]
rocblas_int. max_sweeps > 0.
Maximum number of sweeps (iterations) to be used by the algorithm.
n_sweeps – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
The actual number of sweeps (iterations) used by the algorithm for each batch instance.
W – [out]
pointer to type. Array on the GPU (the size depends on the value of strideW).
The eigenvalues of A_j in increasing order.
strideW – [in]
rocblas_stride.
Stride from the start of one vector W_j to the next one W_(j+1). There is no restriction for the value of strideW. Normal use case is strideW >= n.
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit for matrix A_j. If info[j] = 1, the algorithm did not converge.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.3.4. rocsolver_<type>heevj()#
-
rocblas_status rocsolver_zheevj(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, rocblas_int *info)#
-
rocblas_status rocsolver_cheevj(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, rocblas_int *info)#
HEEVJ computes the eigenvalues and optionally the eigenvectors of a complex Hermitian matrix A.
The eigenvalues are found using the iterative Jacobi algorithm and are returned in an order depending on the value of esort. The eigenvectors are computed depending on the value of evect. The computed eigenvectors are orthonormal.
At the \(k\)-th iteration (or “sweep”), \(A\) is transformed by a product of Jacobi rotations \(V\) as
\[ A^{(k)} = V' A^{(k-1)} V \]such that \(off(A^{(k)}) < off(A^{(k-1)})\), where \(A^{(0)} = A\) and \(off(A^{(k)})\) is the Frobenius norm of the off-diagonal elements of \(A^{(k)}\). As \(off(A^{(k)}) \rightarrow 0\), the diagonal elements of \(A^{(k)}\) increasingly resemble the eigenvalues of \(A\).
Note
In order to carry out calculations, this method may synchronize the stream contained within the rocblas_handle.
- Parameters:
handle – [in] rocblas_handle.
esort – [in] rocblas_esort
.
Specifies the order of the returned eigenvalues. If esort is rocblas_esort_ascending, then the eigenvalues are sorted and returned in ascending order. If esort is rocblas_esort_none, then the order of the returned eigenvalues is unspecified.
evect – [in] rocblas_evect
.
Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.
uplo – [in]
rocblas_fill.
Specifies whether the upper or lower part of the Hermitian matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.
n – [in]
rocblas_int. n >= 0.
Number of rows and columns of matrix A.
A – [inout]
pointer to type. Array on the GPU of dimension lda*n.
On entry, the matrix A. On exit, the eigenvectors of A if they were computed and the algorithm converged; otherwise the contents of A are unchanged.
lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of matrix A.
abstol – [in]
real type.
The absolute tolerance. The algorithm is considered to have converged once off(A) is <= norm(A) * abstol. If abstol <= 0, then the tolerance will be set to machine precision.
residual – [out]
pointer to real type on the GPU.
The Frobenius norm of the off-diagonal elements of A (i.e. off(A)) at the final iteration.
max_sweeps – [in]
rocblas_int. max_sweeps > 0.
Maximum number of sweeps (iterations) to be used by the algorithm.
n_sweeps – [out]
pointer to a rocblas_int on the GPU.
The actual number of sweeps (iterations) used by the algorithm.
W – [out]
pointer to real type. Array on the GPU of dimension n.
The eigenvalues of A in increasing order.
info – [out]
pointer to a rocblas_int on the GPU.
If info = 0, successful exit. If info = 1, the algorithm did not converge.
3.4.3.5. rocsolver_<type>heevj_batched()#
-
rocblas_status rocsolver_zheevj_batched(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cheevj_batched(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
HEEVJ_BATCHED computes the eigenvalues and optionally the eigenvectors of a batch of complex Hermitian matrices A_j.
The eigenvalues are found using the iterative Jacobi algorithm and are returned in an order depending on the value of esort. The eigenvectors are computed depending on the value of evect. The computed eigenvectors are orthonormal.
At the \(k\)-th iteration (or “sweep”), \(A_j\) is transformed by a product of Jacobi rotations \(V_j\) as
\[ A_j^{(k)} = V_j' A_j^{(k-1)} V_j \]such that \(off(A_j^{(k)}) < off(A_j^{(k-1)})\), where \(A_j^{(0)} = A_j\) and \(off(A_j^{(k)})\) is the Frobenius norm of the off-diagonal elements of \(A_j^{(k)}\). As \(off(A_j^{(k)}) \rightarrow 0\), the diagonal elements of \(A_j^{(k)}\) increasingly resemble the eigenvalues of \(A_j\).
Note
In order to carry out calculations, this method may synchronize the stream contained within the rocblas_handle.
- Parameters:
handle – [in] rocblas_handle.
esort – [in] rocblas_esort
.
Specifies the order of the returned eigenvalues. If esort is rocblas_esort_ascending, then the eigenvalues are sorted and returned in ascending order. If esort is rocblas_esort_none, then the order of the returned eigenvalues is unspecified.
evect – [in] rocblas_evect
.
Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.
uplo – [in]
rocblas_fill.
Specifies whether the upper or lower part of the Hermitian matrices A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.
n – [in]
rocblas_int. n >= 0.
Number of rows and columns of matrices A_j.
A – [inout]
Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.
On entry, the matrices A_j. On exit, the eigenvectors of A_j if they were computed and the algorithm converged; otherwise the contents of A_j are unchanged.
lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of matrices A_j.
abstol – [in]
real type.
The absolute tolerance. The algorithm is considered to have converged once off(A_j) is <= norm(A_j) * abstol. If abstol <= 0, then the tolerance will be set to machine precision.
residual – [out]
pointer to real type. Array of batch_count scalars on the GPU.
The Frobenius norm of the off-diagonal elements of A_j (i.e. off(A_j)) at the final iteration.
max_sweeps – [in]
rocblas_int. max_sweeps > 0.
Maximum number of sweeps (iterations) to be used by the algorithm.
n_sweeps – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
The actual number of sweeps (iterations) used by the algorithm for each batch instance.
W – [out]
pointer to real type. Array on the GPU (the size depends on the value of strideW).
The eigenvalues of A_j in increasing order.
strideW – [in]
rocblas_stride.
Stride from the start of one vector W_j to the next one W_(j+1). There is no restriction for the value of strideW. Normal use case is strideW >= n.
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit for matrix A_j. If info[j] = 1, the algorithm did not converge.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.3.6. rocsolver_<type>heevj_strided_batched()#
-
rocblas_status rocsolver_zheevj_strided_batched(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cheevj_strided_batched(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
HEEVJ_STRIDED_BATCHED computes the eigenvalues and optionally the eigenvectors of a batch of complex Hermitian matrices A_j.
The eigenvalues are found using the iterative Jacobi algorithm and are returned in an order depending on the value of esort. The eigenvectors are computed depending on the value of evect. The computed eigenvectors are orthonormal.
At the \(k\)-th iteration (or “sweep”), \(A_j\) is transformed by a product of Jacobi rotations \(V_j\) as
\[ A_j^{(k)} = V_j' A_j^{(k-1)} V_j \]such that \(off(A_j^{(k)}) < off(A_j^{(k-1)})\), where \(A_j^{(0)} = A_j\) and \(off(A_j^{(k)})\) is the Frobenius norm of the off-diagonal elements of \(A_j^{(k)}\). As \(off(A_j^{(k)}) \rightarrow 0\), the diagonal elements of \(A_j^{(k)}\) increasingly resemble the eigenvalues of \(A_j\).
Note
In order to carry out calculations, this method may synchronize the stream contained within the rocblas_handle.
- Parameters:
handle – [in] rocblas_handle.
esort – [in] rocblas_esort
.
Specifies the order of the returned eigenvalues. If esort is rocblas_esort_ascending, then the eigenvalues are sorted and returned in ascending order. If esort is rocblas_esort_none, then the order of the returned eigenvalues is unspecified.
evect – [in] rocblas_evect
.
Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.
uplo – [in]
rocblas_fill.
Specifies whether the upper or lower part of the Hermitian matrices A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.
n – [in]
rocblas_int. n >= 0.
Number of rows and columns of matrices A_j.
A – [inout]
pointer to type. Array on the GPU (the size depends on the value of strideA).
On entry, the matrices A_j. On exit, the eigenvectors of A_j if they were computed and the algorithm converged; otherwise the contents of A_j are unchanged.
lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of matrices A_j.
strideA – [in]
rocblas_stride.
Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.
abstol – [in]
real type.
The absolute tolerance. The algorithm is considered to have converged once off(A_j) is <= norm(A_j) * abstol. If abstol <= 0, then the tolerance will be set to machine precision.
residual – [out]
pointer to real type. Array of batch_count scalars on the GPU.
The Frobenius norm of the off-diagonal elements of A_j (i.e. off(A_j)) at the final iteration.
max_sweeps – [in]
rocblas_int. max_sweeps > 0.
Maximum number of sweeps (iterations) to be used by the algorithm.
n_sweeps – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
The actual number of sweeps (iterations) used by the algorithm for each batch instance.
W – [out]
pointer to real type. Array on the GPU (the size depends on the value of strideW).
The eigenvalues of A_j in increasing order.
strideW – [in]
rocblas_stride.
Stride from the start of one vector W_j to the next one W_(j+1). There is no restriction for the value of strideW. Normal use case is strideW >= n.
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit for matrix A_j. If info[j] = 1, the algorithm did not converge.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.3.7. rocsolver_<type>sygvj()#
-
rocblas_status rocsolver_dsygvj(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, double *B, const rocblas_int ldb, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, rocblas_int *info)#
-
rocblas_status rocsolver_ssygvj(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, float *B, const rocblas_int ldb, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, rocblas_int *info)#
SYGVJ computes the eigenvalues and (optionally) eigenvectors of a real generalized symmetric-definite eigenproblem.
The problem solved by this function is either of the form
\[\begin{split} \begin{array}{cl} A X = \lambda B X & \: \text{1st form,}\\ A B X = \lambda X & \: \text{2nd form, or}\\ B A X = \lambda X & \: \text{3rd form,} \end{array} \end{split}\]depending on the value of itype. The eigenvalues are found using the iterative Jacobi algorithm, and are returned in ascending order. The eigenvectors are computed depending on the value of evect.
When computed, the matrix Z of eigenvectors is normalized as follows:
\[\begin{split} \begin{array}{cl} Z^T B Z=I & \: \text{if 1st or 2nd form, or}\\ Z^T B^{-1} Z=I & \: \text{if 3rd form.} \end{array} \end{split}\]Note
In order to carry out calculations, this method may synchronize the stream contained within the rocblas_handle.
- Parameters:
handle – [in] rocblas_handle.
itype – [in] rocblas_eform
.
Specifies the form of the generalized eigenproblem.
evect – [in] rocblas_evect
.
Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.
uplo – [in]
rocblas_fill.
Specifies whether the upper or lower parts of the matrices A and B are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A and B are not used.
n – [in]
rocblas_int. n >= 0.
The matrix dimensions.
A – [inout]
pointer to type. Array on the GPU of dimension lda*n.
On entry, the symmetric matrix A. On exit, if evect is original, the normalized matrix Z of eigenvectors. If evect is none, then the upper or lower triangular part of the matrix A (including the diagonal) is destroyed, depending on the value of uplo.
lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of A.
B – [out]
pointer to type. Array on the GPU of dimension ldb*n.
On entry, the symmetric positive definite matrix B. On exit, the triangular factor of B as returned by
POTRF.ldb – [in]
rocblas_int. ldb >= n.
Specifies the leading dimension of B.
abstol – [in]
type.
The absolute tolerance. The algorithm is considered to have converged once off(T) is <= norm(T) * abstol, where T is the matrix obtained by reduction to standard form. If abstol <= 0, then the tolerance will be set to machine precision.
residual – [out]
pointer to type on the GPU.
The Frobenius norm of the off-diagonal elements of T (i.e. off(T)) at the final iteration, where T is the matrix obtained by reduction to standard form.
max_sweeps – [in]
rocblas_int. max_sweeps > 0.
Maximum number of sweeps (iterations) to be used by the algorithm.
n_sweeps – [out]
pointer to a rocblas_int on the GPU.
The actual number of sweeps (iterations) used by the algorithm.
W – [out]
pointer to type. Array on the GPU of dimension n.
On exit, the eigenvalues in increasing order.
info – [out]
pointer to a rocblas_int on the GPU.
If info = 0, successful exit. If info = 1, the algorithm did not converge. If info = n + i, the leading minor of order i of B is not positive definite.
3.4.3.8. rocsolver_<type>sygvj_batched()#
-
rocblas_status rocsolver_dsygvj_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, double *const B[], const rocblas_int ldb, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_ssygvj_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, float *const B[], const rocblas_int ldb, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
SYGVJ_BATCHED computes the eigenvalues and (optionally) eigenvectors of a batch of real generalized symmetric-definite eigenproblems.
For each instance in the batch, the problem solved by this function is either of the form
\[\begin{split} \begin{array}{cl} A_j X_j = \lambda B_j X_j & \: \text{1st form,}\\ A_j B_j X_j = \lambda X_j & \: \text{2nd form, or}\\ B_j A_j X_j = \lambda X_j & \: \text{3rd form,} \end{array} \end{split}\]depending on the value of itype. The eigenvalues are found using the iterative Jacobi algorithm, and are returned in ascending order. The eigenvectors are computed depending on the value of evect.
When computed, the matrix \(Z_j\) of eigenvectors is normalized as follows:
\[\begin{split} \begin{array}{cl} Z_j^T B_j Z_j=I & \: \text{if 1st or 2nd form, or}\\ Z_j^T B_j^{-1} Z_j=I & \: \text{if 3rd form.} \end{array} \end{split}\]Note
In order to carry out calculations, this method may synchronize the stream contained within the rocblas_handle.
- Parameters:
handle – [in] rocblas_handle.
itype – [in] rocblas_eform
.
Specifies the form of the generalized eigenproblems.
evect – [in] rocblas_evect
.
Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.
uplo – [in]
rocblas_fill.
Specifies whether the upper or lower parts of the matrices A_j and B_j are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A_j and B_j are not used.
n – [in]
rocblas_int. n >= 0.
The matrix dimensions.
A – [inout]
array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.
On entry, the symmetric matrices A_j. On exit, if evect is original, the normalized matrix Z_j of eigenvectors. If evect is none, then the upper or lower triangular part of the matrices A_j (including the diagonal) are destroyed, depending on the value of uplo.
lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of A_j.
B – [out]
array of pointers to type. Each pointer points to an array on the GPU of dimension ldb*n.
On entry, the symmetric positive definite matrices B_j. On exit, the triangular factor of B_j as returned by
POTRF_BATCHED.ldb – [in]
rocblas_int. ldb >= n.
Specifies the leading dimension of B_j.
abstol – [in]
type.
The absolute tolerance. The algorithm is considered to have converged once off(T_j) is <= norm(T_j) * abstol, where T_j is the matrix obtained by reduction to standard form. If abstol <= 0, then the tolerance will be set to machine precision.
residual – [out]
pointer to type on the GPU.
The Frobenius norm of the off-diagonal elements of T_j (i.e. off(T_j)) at the final iteration, where T is the matrix obtained by reduction to standard form.
max_sweeps – [in]
rocblas_int. max_sweeps > 0.
Maximum number of sweeps (iterations) to be used by the algorithm.
n_sweeps – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
The actual number of sweeps (iterations) used by the algorithm for each batch instance.
W – [out]
pointer to type. Array on the GPU (the size depends on the value of strideW).
On exit, the eigenvalues in increasing order.
strideW – [in]
rocblas_stride.
Stride from the start of one vector W_j to the next one W_(j+1). There is no restriction for the value of strideW. Normal use is strideW >= n.
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit of batch instance j. If info[j] = 1, the algorithm did not converge. If info[j] = n + i, the leading minor of order i of B_j is not positive definite.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.3.9. rocsolver_<type>sygvj_strided_batched()#
-
rocblas_status rocsolver_dsygvj_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *B, const rocblas_int ldb, const rocblas_stride strideB, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_ssygvj_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *B, const rocblas_int ldb, const rocblas_stride strideB, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
SYGVJ_STRIDED_BATCHED computes the eigenvalues and (optionally) eigenvectors of a batch of real generalized symmetric-definite eigenproblems.
For each instance in the batch, the problem solved by this function is either of the form
\[\begin{split} \begin{array}{cl} A_j X_j = \lambda B_j X_j & \: \text{1st form,}\\ A_j B_j X_j = \lambda X_j & \: \text{2nd form, or}\\ B_j A_j X_j = \lambda X_j & \: \text{3rd form,} \end{array} \end{split}\]depending on the value of itype. The eigenvalues are found using the iterative Jacobi algorithm, and are returned in ascending order. The eigenvectors are computed depending on the value of evect.
When computed, the matrix \(Z_j\) of eigenvectors is normalized as follows:
\[\begin{split} \begin{array}{cl} Z_j^T B_j Z_j=I & \: \text{if 1st or 2nd form, or}\\ Z_j^T B_j^{-1} Z_j=I & \: \text{if 3rd form.} \end{array} \end{split}\]Note
In order to carry out calculations, this method may synchronize the stream contained within the rocblas_handle.
- Parameters:
handle – [in] rocblas_handle.
itype – [in] rocblas_eform
.
Specifies the form of the generalized eigenproblems.
evect – [in] rocblas_evect
.
Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.
uplo – [in]
rocblas_fill.
Specifies whether the upper or lower parts of the matrices A_j and B_j are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A_j and B_j are not used.
n – [in]
rocblas_int. n >= 0.
The matrix dimensions.
A – [inout]
pointer to type. Array on the GPU (the size depends on the value of strideA).
On entry, the symmetric matrices A_j. On exit, if evect is original, the normalized matrix Z_j of eigenvectors. If evect is none, then the upper or lower triangular part of the matrices A_j (including the diagonal) are destroyed, depending on the value of uplo.
lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of A_j.
strideA – [in]
rocblas_stride.
Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use is strideA >= lda*n.
B – [out]
pointer to type. Array on the GPU (the size depends on the value of strideB).
On entry, the symmetric positive definite matrices B_j. On exit, the triangular factor of B_j as returned by
POTRF_STRIDED_BATCHED.ldb – [in]
rocblas_int. ldb >= n.
Specifies the leading dimension of B_j.
strideB – [in]
rocblas_stride.
Stride from the start of one matrix B_j to the next one B_(j+1). There is no restriction for the value of strideB. Normal use is strideB >= ldb*n.
abstol – [in]
type.
The absolute tolerance. The algorithm is considered to have converged once off(T_j) is <= norm(T_j) * abstol, where T_j is the matrix obtained by reduction to standard form. If abstol <= 0, then the tolerance will be set to machine precision.
residual – [out]
pointer to type on the GPU.
The Frobenius norm of the off-diagonal elements of T_j (i.e. off(T_j)) at the final iteration, where T is the matrix obtained by reduction to standard form.
max_sweeps – [in]
rocblas_int. max_sweeps > 0.
Maximum number of sweeps (iterations) to be used by the algorithm.
n_sweeps – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
The actual number of sweeps (iterations) used by the algorithm for each batch instance.
W – [out]
pointer to type. Array on the GPU (the size depends on the value of strideW).
On exit, the eigenvalues in increasing order.
strideW – [in]
rocblas_stride.
Stride from the start of one vector W_j to the next one W_(j+1). There is no restriction for the value of strideW. Normal use is strideW >= n.
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit of batch j. If info[j] = 1, the algorithm did not converge. If info[j] = n + i, the leading minor of order i of B_j is not positive definite.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.3.10. rocsolver_<type>hegvj()#
-
rocblas_status rocsolver_zhegvj(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *B, const rocblas_int ldb, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, rocblas_int *info)#
-
rocblas_status rocsolver_chegvj(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *B, const rocblas_int ldb, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, rocblas_int *info)#
HEGVJ computes the eigenvalues and (optionally) eigenvectors of a complex generalized hermitian-definite eigenproblem.
The problem solved by this function is either of the form
\[\begin{split} \begin{array}{cl} A X = \lambda B X & \: \text{1st form,}\\ A B X = \lambda X & \: \text{2nd form, or}\\ B A X = \lambda X & \: \text{3rd form,} \end{array} \end{split}\]depending on the value of itype. The eigenvalues are found using the iterative Jacobi algorithm, and are returned in ascending order. The eigenvectors are computed depending on the value of evect.
When computed, the matrix Z of eigenvectors is normalized as follows:
\[\begin{split} \begin{array}{cl} Z^H B Z=I & \: \text{if 1st or 2nd form, or}\\ Z^H B^{-1} Z=I & \: \text{if 3rd form.} \end{array} \end{split}\]Note
In order to carry out calculations, this method may synchronize the stream contained within the rocblas_handle.
- Parameters:
handle – [in] rocblas_handle.
itype – [in] rocblas_eform
.
Specifies the form of the generalized eigenproblem.
evect – [in] rocblas_evect
.
Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.
uplo – [in]
rocblas_fill.
Specifies whether the upper or lower parts of the matrices A and B are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A and B are not used.
n – [in]
rocblas_int. n >= 0.
The matrix dimensions.
A – [inout]
pointer to type. Array on the GPU of dimension lda*n.
On entry, the hermitian matrix A. On exit, if evect is original, the normalized matrix Z of eigenvectors. If evect is none, then the upper or lower triangular part of the matrix A (including the diagonal) is destroyed, depending on the value of uplo.
lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of A.
B – [out]
pointer to type. Array on the GPU of dimension ldb*n.
On entry, the hermitian positive definite matrix B. On exit, the triangular factor of B as returned by
POTRF.ldb – [in]
rocblas_int. ldb >= n.
Specifies the leading dimension of B.
abstol – [in]
type.
The absolute tolerance. The algorithm is considered to have converged once off(T) is <= norm(T) * abstol, where T is the matrix obtained by reduction to standard form. If abstol <= 0, then the tolerance will be set to machine precision.
residual – [out]
pointer to type on the GPU.
The Frobenius norm of the off-diagonal elements of T (i.e. off(T)) at the final iteration, where T is the matrix obtained by reduction to standard form.
max_sweeps – [in]
rocblas_int. max_sweeps > 0.
Maximum number of sweeps (iterations) to be used by the algorithm.
n_sweeps – [out]
pointer to a rocblas_int on the GPU.
The actual number of sweeps (iterations) used by the algorithm.
W – [out]
pointer to real type. Array on the GPU of dimension n.
On exit, the eigenvalues in increasing order.
info – [out]
pointer to a rocblas_int on the GPU.
If info = 0, successful exit. If info = 1, the algorithm did not converge. If info = n + i, the leading minor of order i of B is not positive definite.
3.4.3.11. rocsolver_<type>hegvj_batched()#
-
rocblas_status rocsolver_zhegvj_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *const B[], const rocblas_int ldb, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_chegvj_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *const B[], const rocblas_int ldb, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
HEGVJ_BATCHED computes the eigenvalues and (optionally) eigenvectors of a batch of complex generalized hermitian-definite eigenproblems.
For each instance in the batch, the problem solved by this function is either of the form
\[\begin{split} \begin{array}{cl} A_j X_j = \lambda B_j X_j & \: \text{1st form,}\\ A_j B_j X_j = \lambda X_j & \: \text{2nd form, or}\\ B_j A_j X_j = \lambda X_j & \: \text{3rd form,} \end{array} \end{split}\]depending on the value of itype. The eigenvalues are found using the iterative Jacobi algorithm, and are returned in ascending order. The eigenvectors are computed depending on the value of evect.
When computed, the matrix \(Z_j\) of eigenvectors is normalized as follows:
\[\begin{split} \begin{array}{cl} Z_j^H B_j Z_j=I & \: \text{if 1st or 2nd form, or}\\ Z_j^H B_j^{-1} Z_j=I & \: \text{if 3rd form.} \end{array} \end{split}\]Note
In order to carry out calculations, this method may synchronize the stream contained within the rocblas_handle.
- Parameters:
handle – [in] rocblas_handle.
itype – [in] rocblas_eform
.
Specifies the form of the generalized eigenproblems.
evect – [in] rocblas_evect
.
Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.
uplo – [in]
rocblas_fill.
Specifies whether the upper or lower parts of the matrices A_j and B_j are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A_j and B_j are not used.
n – [in]
rocblas_int. n >= 0.
The matrix dimensions.
A – [inout]
array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.
On entry, the hermitian matrices A_j. On exit, if evect is original, the normalized matrix Z_j of eigenvectors. If evect is none, then the upper or lower triangular part of the matrices A_j (including the diagonal) are destroyed, depending on the value of uplo.
lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of A_j.
B – [out]
array of pointers to type. Each pointer points to an array on the GPU of dimension ldb*n.
On entry, the hermitian positive definite matrices B_j. On exit, the triangular factor of B_j as returned by
POTRF_BATCHED.ldb – [in]
rocblas_int. ldb >= n.
Specifies the leading dimension of B_j.
abstol – [in]
type.
The absolute tolerance. The algorithm is considered to have converged once off(T_j) is <= norm(T_j) * abstol, where T_j is the matrix obtained by reduction to standard form. If abstol <= 0, then the tolerance will be set to machine precision.
residual – [out]
pointer to type on the GPU.
The Frobenius norm of the off-diagonal elements of T_j (i.e. off(T_j)) at the final iteration, where T is the matrix obtained by reduction to standard form.
max_sweeps – [in]
rocblas_int. max_sweeps > 0.
Maximum number of sweeps (iterations) to be used by the algorithm.
n_sweeps – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
The actual number of sweeps (iterations) used by the algorithm for each batch instance.
W – [out]
pointer to real type. Array on the GPU (the size depends on the value of strideW).
On exit, the eigenvalues in increasing order.
strideW – [in]
rocblas_stride.
Stride from the start of one vector W_j to the next one W_(j+1). There is no restriction for the value of strideW. Normal use is strideW >= n.
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit of batch j. If info[j] = 1, the algorithm did not converge. If info[j] = n + i, the leading minor of order i of B_j is not positive definite.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.3.12. rocsolver_<type>hegvj_strided_batched()#
-
rocblas_status rocsolver_zhegvj_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *B, const rocblas_int ldb, const rocblas_stride strideB, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_chegvj_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *B, const rocblas_int ldb, const rocblas_stride strideB, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
HEGVJ_STRIDED_BATCHED computes the eigenvalues and (optionally) eigenvectors of a batch of complex generalized hermitian-definite eigenproblems.
For each instance in the batch, the problem solved by this function is either of the form
\[\begin{split} \begin{array}{cl} A_j X_j = \lambda B_j X_j & \: \text{1st form,}\\ A_j B_j X_j = \lambda X_j & \: \text{2nd form, or}\\ B_j A_j X_j = \lambda X_j & \: \text{3rd form,} \end{array} \end{split}\]depending on the value of itype. The eigenvalues are found using the iterative Jacobi algorithm, and are returned in ascending order. The eigenvectors are computed depending on the value of evect.
When computed, the matrix \(Z_j\) of eigenvectors is normalized as follows:
\[\begin{split} \begin{array}{cl} Z_j^H B_j Z_j=I & \: \text{if 1st or 2nd form, or}\\ Z_j^H B_j^{-1} Z_j=I & \: \text{if 3rd form.} \end{array} \end{split}\]Note
In order to carry out calculations, this method may synchronize the stream contained within the rocblas_handle.
- Parameters:
handle – [in] rocblas_handle.
itype – [in] rocblas_eform
.
Specifies the form of the generalized eigenproblems.
evect – [in] rocblas_evect
.
Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.
uplo – [in]
rocblas_fill.
Specifies whether the upper or lower parts of the matrices A_j and B_j are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A_j and B_j are not used.
n – [in]
rocblas_int. n >= 0.
The matrix dimensions.
A – [inout]
pointer to type. Array on the GPU (the size depends on the value of strideA).
On entry, the hermitian matrices A_j. On exit, if evect is original, the normalized matrix Z_j of eigenvectors. If evect is none, then the upper or lower triangular part of the matrices A_j (including the diagonal) are destroyed, depending on the value of uplo.
lda – [in]
rocblas_int. lda >= n.
Specifies the leading dimension of A_j.
strideA – [in]
rocblas_stride.
Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use is strideA >= lda*n.
B – [out]
pointer to type. Array on the GPU (the size depends on the value of strideB).
On entry, the hermitian positive definite matrices B_j. On exit, the triangular factor of B_j as returned by
POTRF_STRIDED_BATCHED.ldb – [in]
rocblas_int. ldb >= n.
Specifies the leading dimension of B_j.
strideB – [in]
rocblas_stride.
Stride from the start of one matrix B_j to the next one B_(j+1). There is no restriction for the value of strideB. Normal use is strideB >= ldb*n.
abstol – [in]
type.
The absolute tolerance. The algorithm is considered to have converged once off(T_j) is <= norm(T_j) * abstol, where T_j is the matrix obtained by reduction to standard form. If abstol <= 0, then the tolerance will be set to machine precision.
residual – [out]
pointer to type on the GPU.
The Frobenius norm of the off-diagonal elements of T_j (i.e. off(T_j)) at the final iteration, where T is the matrix obtained by reduction to standard form.
max_sweeps – [in]
rocblas_int. max_sweeps > 0.
Maximum number of sweeps (iterations) to be used by the algorithm.
n_sweeps – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
The actual number of sweeps (iterations) used by the algorithm for each batch instance.
W – [out]
pointer to real type. Array on the GPU (the size depends on the value of strideW).
On exit, the eigenvalues in increasing order.
strideW – [in]
rocblas_stride.
Stride from the start of one vector W_j to the next one W_(j+1). There is no restriction for the value of strideW. Normal use is strideW >= n.
info – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
If info[j] = 0, successful exit of batch j. If info[j] = 1, the algorithm did not converge. If info[j] = n + i, the leading minor of order i of B_j is not positive definite.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.4. Singular value decomposition#
3.4.4.1. rocsolver_<type>gesvdj()#
-
rocblas_status rocsolver_zgesvdj(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *S, rocblas_double_complex *U, const rocblas_int ldu, rocblas_double_complex *V, const rocblas_int ldv, rocblas_int *info)#
-
rocblas_status rocsolver_cgesvdj(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *S, rocblas_float_complex *U, const rocblas_int ldu, rocblas_float_complex *V, const rocblas_int ldv, rocblas_int *info)#
-
rocblas_status rocsolver_dgesvdj(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *S, double *U, const rocblas_int ldu, double *V, const rocblas_int ldv, rocblas_int *info)#
-
rocblas_status rocsolver_sgesvdj(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *S, float *U, const rocblas_int ldu, float *V, const rocblas_int ldv, rocblas_int *info)#
GESVDJ computes the singular values and optionally the singular vectors of a general m-by-n matrix A (Singular Value Decomposition).
The SVD of matrix A is given by:
\[ A = U S V' \]where the m-by-n matrix S is zero except, possibly, for its min(m,n) diagonal elements, which are the singular values of A. U and V are orthogonal (unitary) matrices. The first min(m,n) columns of U and V are the left and right singular vectors of A, respectively.
The computation of the singular vectors is optional and it is controlled by the function arguments left_svect and right_svect as described below. When computed, this function returns the transpose (or transpose conjugate) of the right singular vectors, i.e. the rows of V’.
left_svect and right_svect are rocblas_svect enums that can take the following values:
rocblas_svect_all: the entire matrix U (or V’) is computed,
rocblas_svect_singular: the singular vectors (first min(m,n) columns of U or rows of V’) are computed, or
rocblas_svect_none: no columns (or rows) of U (or V’) are computed, i.e. no singular vectors.
The singular values are computed by applying QR factorization to AV if m >= n (resp. LQ factorization to U’A if m < n), where V (resp. U) is found as the eigenvectors of A’A (resp. AA’) using the Jacobi eigenvalue algorithm.
Note
In order to carry out calculations, this method may synchronize the stream contained within the rocblas_handle.
- Parameters:
handle – [in] rocblas_handle.
left_svect – [in] rocblas_svect
.
Specifies how the left singular vectors are computed. rocblas_svect_overwrite is not supported.
right_svect – [in] rocblas_svect
.
Specifies how the right singular vectors are computed. rocblas_svect_overwrite is not supported.
m – [in]
rocblas_int. m >= 0.
The number of rows of matrix A.
n – [in]
rocblas_int. n >= 0.
The number of columns of matrix A.
A – [inout]
pointer to type. Array on the GPU of dimension lda*n.
On entry, the matrix A. On exit, the contents of A are destroyed.
lda – [in]
rocblas_int. lda >= m.
The leading dimension of A.
abstol – [in]
real type.
The absolute tolerance. The algorithm is considered to have converged once off(A’A) is <= norm(A’A) * abstol [resp. off(AA’) <= norm(AA’) * abstol]. If abstol <= 0, then the tolerance will be set to machine precision.
residual – [out]
pointer to real type on the GPU.
The Frobenius norm of the off-diagonal elements of A’A (resp. AA’) at the final iteration.
max_sweeps – [in]
rocblas_int. max_sweeps > 0.
Maximum number of sweeps (iterations) to be used by the algorithm.
n_sweeps – [out]
pointer to a rocblas_int on the GPU.
The actual number of sweeps (iterations) used by the algorithm.
S – [out]
pointer to real type. Array on the GPU of dimension min(m,n).
The singular values of A in decreasing order.
U – [out]
pointer to type. Array on the GPU of dimension ldu*min(m,n) if left_svect is set to singular, or ldu*m when left_svect is equal to all.
The matrix of left singular vectors stored as columns. Not referenced if left_svect is set to none.
ldu – [in]
rocblas_int. ldu >= m if left_svect is set to all or singular; ldu >= 1 otherwise.
The leading dimension of U.
V – [out]
pointer to type. Array on the GPU of dimension ldv*n.
The matrix of right singular vectors stored as rows (transposed / conjugate-transposed). Not referenced if right_svect is set to none.
ldv – [in]
rocblas_int. ldv >= n if right_svect is set to all; ldv >= min(m,n) if right_svect is set to singular; or ldv >= 1 otherwise.
The leading dimension of V.
info – [out]
pointer to a rocblas_int on the GPU.
If info = 0, successful exit. If info = 1, the algorithm did not converge.
3.4.4.2. rocsolver_<type>gesvdj_batched()#
-
rocblas_status rocsolver_zgesvdj_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *S, const rocblas_stride strideS, rocblas_double_complex *U, const rocblas_int ldu, const rocblas_stride strideU, rocblas_double_complex *V, const rocblas_int ldv, const rocblas_stride strideV, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cgesvdj_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *S, const rocblas_stride strideS, rocblas_float_complex *U, const rocblas_int ldu, const rocblas_stride strideU, rocblas_float_complex *V, const rocblas_int ldv, const rocblas_stride strideV, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_dgesvdj_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *S, const rocblas_stride strideS, double *U, const rocblas_int ldu, const rocblas_stride strideU, double *V, const rocblas_int ldv, const rocblas_stride strideV, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_sgesvdj_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *S, const rocblas_stride strideS, float *U, const rocblas_int ldu, const rocblas_stride strideU, float *V, const rocblas_int ldv, const rocblas_stride strideV, rocblas_int *info, const rocblas_int batch_count)#
GESVDJ_BATCHED computes the singular values and optionally the singular vectors of a batch of general m-by-n matrix A (Singular Value Decomposition).
The SVD of matrix A_j in the batch is given by:
\[ A_j = U_j S_j V_j' \]where the m-by-n matrix \(S_j\) is zero except, possibly, for its min(m,n) diagonal elements, which are the singular values of \(A_j\). \(U_j\) and \(V_j\) are orthogonal (unitary) matrices. The first min(m,n) columns of \(U_j\) and \(V_j\) are the left and right singular vectors of \(A_j\), respectively.
The computation of the singular vectors is optional and it is controlled by the function arguments left_svect and right_svect as described below. When computed, this function returns the transpose (or transpose conjugate) of the right singular vectors, i.e. the rows of \(V_j'\).
left_svect and right_svect are rocblas_svect enums that can take the following values:
rocblas_svect_all: the entire matrix \(U_j\) (or \(V_j'\)) is computed,
rocblas_svect_singular: the singular vectors (first min(m,n) columns of \(U_j\) or rows of \(V_j'\)) are computed, or
rocblas_svect_none: no columns (or rows) of \(U_j\) (or \(V_j'\)) are computed, i.e. no singular vectors.
The singular values are computed by applying QR factorization to \(A_jV_j\) if m >= n (resp. LQ factorization to \(U_j'A_j\) if m < n), where \(V_j\) (resp. \(U_j\)) is found as the eigenvectors of \(A_j'A_j\) (resp. \(A_jA_j'\)) using the Jacobi eigenvalue algorithm.
Note
In order to carry out calculations, this method may synchronize the stream contained within the rocblas_handle.
- Parameters:
handle – [in] rocblas_handle.
left_svect – [in] rocblas_svect
.
Specifies how the left singular vectors are computed. rocblas_svect_overwrite is not supported.
right_svect – [in] rocblas_svect
.
Specifies how the right singular vectors are computed. rocblas_svect_overwrite is not supported.
m – [in]
rocblas_int. m >= 0.
The number of rows of all matrices A_j in the batch.
n – [in]
rocblas_int. n >= 0.
The number of columns of all matrices A_j in the batch.
A – [inout]
Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.
On entry, the matrices A_j. On exit, the contents of A_j are destroyed.
lda – [in]
rocblas_int. lda >= m.
The leading dimension of A_j.
abstol – [in]
real type.
The absolute tolerance. The algorithm is considered to have converged once off(A_j’A_j) is <= norm(A_j’A_j) * abstol [resp. off(A_jA_j’) <= norm(A_jA_j’) * abstol]. If abstol <= 0, then the tolerance will be set to machine precision.
residual – [out]
pointer to real type on the GPU.
The Frobenius norm of the off-diagonal elements of A_j’A_j (resp. A_jA_j’) at the final iteration.
max_sweeps – [in]
rocblas_int. max_sweeps > 0.
Maximum number of sweeps (iterations) to be used by the algorithm.
n_sweeps – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
The actual number of sweeps (iterations) used by the algorithm for each batch instance.
S – [out]
pointer to real type. Array on the GPU (the size depends on the value of strideS).
The singular values of A_j in decreasing order.
strideS – [in]
rocblas_stride.
Stride from the start of one vector S_j to the next one S_(j+1). There is no restriction for the value of strideS. Normal use case is strideS >= min(m,n).
U – [out]
pointer to type. Array on the GPU (the side depends on the value of strideU).
The matrices U_j of left singular vectors stored as columns. Not referenced if left_svect is set to none.
ldu – [in]
rocblas_int. ldu >= m if left_svect is set to all or singular; ldu >= 1 otherwise.
The leading dimension of U_j.
strideU – [in]
rocblas_stride.
Stride from the start of one matrix U_j to the next one U_(j+1). There is no restriction for the value of strideU. Normal use case is strideU >= ldu*min(m,n) if left_svect is set to singular, or strideU >= ldu*m when left_svect is equal to all.
V – [out]
pointer to type. Array on the GPU (the size depends on the value of strideV).
The matrices V_j of right singular vectors stored as rows (transposed / conjugate-transposed). Not referenced if right_svect is set to none.
ldv – [in]
rocblas_int. ldv >= n if right_svect is set to all; ldv >= min(m,n) if right_svect is set to singular; or ldv >= 1 otherwise.
The leading dimension of V.
strideV – [in]
rocblas_stride.
Stride from the start of one matrix V_j to the next one V_(j+1). There is no restriction for the value of strideV. Normal use case is strideV >= ldv*n.
info – [out]
pointer to a rocblas_int on the GPU.
If info[j] = 0, successful exit. If info[j] = 1, the algorithm did not converge.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.
3.4.4.3. rocsolver_<type>gesvdj_strided_batched()#
-
rocblas_status rocsolver_zgesvdj_strided_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *S, const rocblas_stride strideS, rocblas_double_complex *U, const rocblas_int ldu, const rocblas_stride strideU, rocblas_double_complex *V, const rocblas_int ldv, const rocblas_stride strideV, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_cgesvdj_strided_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *S, const rocblas_stride strideS, rocblas_float_complex *U, const rocblas_int ldu, const rocblas_stride strideU, rocblas_float_complex *V, const rocblas_int ldv, const rocblas_stride strideV, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_dgesvdj_strided_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *S, const rocblas_stride strideS, double *U, const rocblas_int ldu, const rocblas_stride strideU, double *V, const rocblas_int ldv, const rocblas_stride strideV, rocblas_int *info, const rocblas_int batch_count)#
-
rocblas_status rocsolver_sgesvdj_strided_batched(rocblas_handle handle, const rocblas_svect left_svect, const rocblas_svect right_svect, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *S, const rocblas_stride strideS, float *U, const rocblas_int ldu, const rocblas_stride strideU, float *V, const rocblas_int ldv, const rocblas_stride strideV, rocblas_int *info, const rocblas_int batch_count)#
GESVDJ_STRIDED_BATCHED computes the singular values and optionally the singular vectors of a batch of general m-by-n matrix A (Singular Value Decomposition).
The SVD of matrix A_j in the batch is given by:
\[ A_j = U_j S_j V_j' \]where the m-by-n matrix \(S_j\) is zero except, possibly, for its min(m,n) diagonal elements, which are the singular values of \(A_j\). \(U_j\) and \(V_j\) are orthogonal (unitary) matrices. The first min(m,n) columns of \(U_j\) and \(V_j\) are the left and right singular vectors of \(A_j\), respectively.
The computation of the singular vectors is optional and it is controlled by the function arguments left_svect and right_svect as described below. When computed, this function returns the transpose (or transpose conjugate) of the right singular vectors, i.e. the rows of \(V_j'\).
left_svect and right_svect are rocblas_svect enums that can take the following values:
rocblas_svect_all: the entire matrix \(U_j\) (or \(V_j'\)) is computed,
rocblas_svect_singular: the singular vectors (first min(m,n) columns of \(U_j\) or rows of \(V_j'\)) are computed, or
rocblas_svect_none: no columns (or rows) of \(U_j\) (or \(V_j'\)) are computed, i.e. no singular vectors.
The singular values are computed by applying QR factorization to \(A_jV_j\) if m >= n (resp. LQ factorization to \(U_j'A_j\) if m < n), where \(V_j\) (resp. \(U_j\)) is found as the eigenvectors of \(A_j'A_j\) (resp. \(A_jA_j'\)) using the Jacobi eigenvalue algorithm.
Note
In order to carry out calculations, this method may synchronize the stream contained within the rocblas_handle.
- Parameters:
handle – [in] rocblas_handle.
left_svect – [in] rocblas_svect
.
Specifies how the left singular vectors are computed. rocblas_svect_overwrite is not supported.
right_svect – [in] rocblas_svect
.
Specifies how the right singular vectors are computed. rocblas_svect_overwrite is not supported.
m – [in]
rocblas_int. m >= 0.
The number of rows of all matrices A_j in the batch.
n – [in]
rocblas_int. n >= 0.
The number of columns of all matrices A_j in the batch.
A – [inout]
pointer to type. Array on the GPU (the size depends on the value of strideA).
On entry, the matrices A_j. On exit, the contents of A_j are destroyed.
lda – [in]
rocblas_int. lda >= m.
The leading dimension of A_j.
strideA – [in]
rocblas_stride.
Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.
abstol – [in]
real type.
The absolute tolerance. The algorithm is considered to have converged once off(A_j’A_j) is <= norm(A_j’A_j) * abstol [resp. off(A_jA_j’) <= norm(A_jA_j’) * abstol]. If abstol <= 0, then the tolerance will be set to machine precision.
residual – [out]
pointer to real type on the GPU.
The Frobenius norm of the off-diagonal elements of A_j’A_j (resp. A_jA_j’) at the final iteration.
max_sweeps – [in]
rocblas_int. max_sweeps > 0.
Maximum number of sweeps (iterations) to be used by the algorithm.
n_sweeps – [out]
pointer to rocblas_int. Array of batch_count integers on the GPU.
The actual number of sweeps (iterations) used by the algorithm for each batch instance.
S – [out]
pointer to real type. Array on the GPU (the size depends on the value of strideS).
The singular values of A_j in decreasing order.
strideS – [in]
rocblas_stride.
Stride from the start of one vector S_j to the next one S_(j+1). There is no restriction for the value of strideS. Normal use case is strideS >= min(m,n).
U – [out]
pointer to type. Array on the GPU (the side depends on the value of strideU).
The matrices U_j of left singular vectors stored as columns. Not referenced if left_svect is set to none.
ldu – [in]
rocblas_int. ldu >= m if left_svect is set to all or singular; ldu >= 1 otherwise.
The leading dimension of U_j.
strideU – [in]
rocblas_stride.
Stride from the start of one matrix U_j to the next one U_(j+1). There is no restriction for the value of strideU. Normal use case is strideU >= ldu*min(m,n) if left_svect is set to singular, or strideU >= ldu*m when left_svect is equal to all.
V – [out]
pointer to type. Array on the GPU (the size depends on the value of strideV).
The matrices V_j of right singular vectors stored as rows (transposed / conjugate-transposed). Not referenced if right_svect is set to none.
ldv – [in]
rocblas_int. ldv >= n if right_svect is set to all; ldv >= min(m,n) if right_svect is set to singular; or ldv >= 1 otherwise.
The leading dimension of V.
strideV – [in]
rocblas_stride.
Stride from the start of one matrix V_j to the next one V_(j+1). There is no restriction for the value of strideV. Normal use case is strideV >= ldv*n.
info – [out]
pointer to a rocblas_int on the GPU.
If info[j] = 0, successful exit. If info[j] = 1, the algorithm did not converge.
batch_count – [in]
rocblas_int. batch_count >= 0.
Number of matrices in the batch.