rocSOLVER LAPACK Functions

Contents

rocSOLVER LAPACK Functions#

LAPACK routines solve complex Numerical Linear Algebra problems. These functions are organized in the following categories:

Note

Throughout the APIs’ descriptions, we use the following notations:

  • i, j, and k are used as general purpose indices. In some legacy LAPACK APIs, k could be a parameter indicating some problem/matrix dimension.

  • Depending on the context, when it is necessary to index rows, columns and blocks or submatrices, i is assigned to rows, j to columns and k to blocks. \(l\) is always used to index matrices/problems in a batch.

  • x[i] stands for the i-th element of vector x, while A[i,j] represents the element in the i-th row and j-th column of matrix A. Indices are 1-based, i.e. x[1] is the first element of x.

  • To identify a block in a matrix or a matrix in the batch, k and \(l\) are used as sub-indices

  • x_i \(=x_i\); we sometimes use both notations, \(x_i\) when displaying mathematical equations, and x_i in the text describing the function parameters.

  • If X is a real vector or matrix, \(X^T\) indicates its transpose; if X is complex, then \(X^H\) represents its conjugate transpose. When X could be real or complex, we use X’ to indicate X transposed or X conjugate transposed, accordingly.

  • When a matrix A is formed as the product of several matrices, the following notation is used: A=M(1)M(2)…M(t).

Triangular factorizations#

rocsolver_<type>potf2()#

rocblas_status rocsolver_zpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_cpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_dpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_spotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)#

POTF2 computes the Cholesky factorization of a real symmetric (complex Hermitian) positive definite matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form:

\[\begin{split} \begin{array}{cl} A = U'U & \: \text{if uplo is upper, or}\\ A = LL' & \: \text{if uplo is lower.} \end{array} \end{split}\]

U is an upper triangular matrix and L is lower triangular.

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the matrix A to be factored. On exit, the lower or upper triangular factor.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A.

  • info[out] pointer to a rocblas_int on the GPU. If info = 0, successful factorization of matrix A. If info = i > 0, the leading minor of order i of A is not positive definite. The factorization stopped at this point.

rocsolver_<type>potf2_batched()#

rocblas_status rocsolver_zpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_spotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#

POTF2_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form:

\[\begin{split} \begin{array}{cl} A_l^{} = U_l'U_l^{} & \: \text{if uplo is upper, or}\\ A_l^{} = L_l^{}L_l' & \: \text{if uplo is lower.} \end{array} \end{split}\]

\(U_l\) is an upper triangular matrix and \(L_l\) is lower triangular.

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of matrix A_l.

  • A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the matrices A_l to be factored. On exit, the upper or lower triangular factors.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A_l.

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful factorization of matrix A_l. If info[l] = i > 0, the leading minor of order i of A_l is not positive definite. The l-th factorization stopped at this point.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>potf2_strided_batched()#

rocblas_status rocsolver_zpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_spotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#

POTF2_STRIDED_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form:

\[\begin{split} \begin{array}{cl} A_l^{} = U_l'U_l^{} & \: \text{if uplo is upper, or}\\ A_l^{} = L_l^{}L_l' & \: \text{if uplo is lower.} \end{array} \end{split}\]

\(U_l\) is an upper triangular matrix and \(L_l\) is lower triangular.

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of matrix A_l.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the matrices A_l to be factored. On exit, the upper or lower triangular factors.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful factorization of matrix A_l. If info[l] = i > 0, the leading minor of order i of A_l is not positive definite. The l-th factorization stopped at this point.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>potrf()#

rocblas_status rocsolver_zpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_cpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_dpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_spotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)#

POTRF computes the Cholesky factorization of a real symmetric (complex Hermitian) positive definite matrix A.

(This is the blocked version of the algorithm).

The factorization has the form:

\[\begin{split} \begin{array}{cl} A = U'U & \: \text{if uplo is upper, or}\\ A = LL' & \: \text{if uplo is lower.} \end{array} \end{split}\]

U is an upper triangular matrix and L is lower triangular.

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the matrix A to be factored. On exit, the lower or upper triangular factor.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A.

  • info[out] pointer to a rocblas_int on the GPU. If info = 0, successful factorization of matrix A. If info = i > 0, the leading minor of order i of A is not positive definite. The factorization stopped at this point.

rocsolver_<type>potrf_batched()#

rocblas_status rocsolver_zpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_spotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#

POTRF_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form:

\[\begin{split} \begin{array}{cl} A_l^{} = U_l'U_l^{} & \: \text{if uplo is upper, or}\\ A_l^{} = L_l^{}L_l' & \: \text{if uplo is lower.} \end{array} \end{split}\]

\(U_l\) is an upper triangular matrix and \(L_l\) is lower triangular.

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of matrix A_l.

  • A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the matrices A_l to be factored. On exit, the upper or lower triangular factors.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A_l.

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful factorization of matrix A_l. If info[l] = i > 0, the leading minor of order i of A_l is not positive definite. The l-th factorization stopped at this point.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>potrf_strided_batched()#

rocblas_status rocsolver_zpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_spotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#

POTRF_STRIDED_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form:

\[\begin{split} \begin{array}{cl} A_l^{} = U_l'U_l^{} & \: \text{if uplo is upper, or}\\ A_l^{} = L_l^{}L_l' & \: \text{if uplo is lower.} \end{array} \end{split}\]

\(U_l\) is an upper triangular matrix and \(L_l\) is lower triangular.

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of matrix A_l.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the matrices A_l to be factored. On exit, the upper or lower triangular factors.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful factorization of matrix A_l. If info[l] = i > 0, the leading minor of order i of A_l is not positive definite. The l-th factorization stopped at this point.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>getf2()#

rocblas_status rocsolver_zgetf2_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_double_complex *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_cgetf2_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_float_complex *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_dgetf2_64(rocblas_handle handle, const int64_t m, const int64_t n, double *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_sgetf2_64(rocblas_handle handle, const int64_t m, const int64_t n, float *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_zgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_cgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_dgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_sgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#

GETF2 computes the LU factorization of a general m-by-n matrix A using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization has the form

\[ A = PLU \]

where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

  • ipiv[out] pointer to rocblas_int. Array on the GPU of dimension min(m,n). The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= i <= min(m,n), the row i of the matrix was interchanged with row ipiv[i]. Matrix P of the factorization can be derived from ipiv.

  • info[out] pointer to a rocblas_int on the GPU. If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero pivot.

rocsolver_<type>getf2_batched()#

rocblas_status rocsolver_zgetf2_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_double_complex *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_cgetf2_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_float_complex *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_dgetf2_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, double *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_sgetf2_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, float *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_zgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

GETF2_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = P_lL_lU_l \]

where \(P_l\) is a permutation matrix, \(L_l\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_l\) is upper triangular (upper trapezoidal if m < n).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all matrices A_l in the batch.

  • A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the factors L_l and U_l from the factorizations. The unit diagonal elements of L_l are not stored.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP). Contains the vectors of pivot indices ipiv_l (corresponding to A_l). Dimension of ipiv_l is min(m,n). Elements of ipiv_l are 1-based indices. For each instance A_l in the batch and for 1 <= i <= min(m,n), the row i of the matrix A_l was interchanged with row ipiv_l[i]. Matrix P_l of the factorization can be derived from ipiv_l.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, U_l is singular. U_l[i,i] is the first zero pivot.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>getf2_strided_batched()#

rocblas_status rocsolver_zgetf2_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_double_complex *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_cgetf2_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_float_complex *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_dgetf2_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, double *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_sgetf2_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, float *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_zgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

GETF2_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = P_lL_lU_l \]

where \(P_l\) is a permutation matrix, \(L_l\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_l\) is upper triangular (upper trapezoidal if m < n).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the factors L_l and U_l from the factorization. The unit diagonal elements of L_l are not stored.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • ipiv[out] pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP). Contains the vectors of pivots indices ipiv_l (corresponding to A_l). Dimension of ipiv_l is min(m,n). Elements of ipiv_l are 1-based indices. For each instance A_l in the batch and for 1 <= i <= min(m,n), the row i of the matrix A_l was interchanged with row ipiv_l[i]. Matrix P_l of the factorization can be derived from ipiv_l.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, U_l is singular. U_l[i,i] is the first zero pivot.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>getrf()#

rocblas_status rocsolver_zgetrf_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_double_complex *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_cgetrf_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_float_complex *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_dgetrf_64(rocblas_handle handle, const int64_t m, const int64_t n, double *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_sgetrf_64(rocblas_handle handle, const int64_t m, const int64_t n, float *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_zgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_cgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_dgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_sgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#

GETRF computes the LU factorization of a general m-by-n matrix A using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization has the form

\[ A = PLU \]

where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

  • ipiv[out] pointer to rocblas_int. Array on the GPU of dimension min(m,n). The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= i <= min(m,n), the row i of the matrix was interchanged with row ipiv[i]. Matrix P of the factorization can be derived from ipiv.

  • info[out] pointer to a rocblas_int on the GPU. If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero pivot.

rocsolver_<type>getrf_batched()#

rocblas_status rocsolver_zgetrf_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_double_complex *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_cgetrf_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_float_complex *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_dgetrf_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, double *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_sgetrf_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, float *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_zgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

GETRF_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = P_lL_lU_l \]

where \(P_l\) is a permutation matrix, \(L_l\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_l\) is upper triangular (upper trapezoidal if m < n).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all matrices A_l in the batch.

  • A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the factors L_l and U_l from the factorizations. The unit diagonal elements of L_l are not stored.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP). Contains the vectors of pivot indices ipiv_l (corresponding to A_l). Dimension of ipiv_l is min(m,n). Elements of ipiv_l are 1-based indices. For each instance A_l in the batch and for 1 <= i <= min(m,n), the row i of the matrix A_l was interchanged with row ipiv_l[i]. Matrix P_l of the factorization can be derived from ipiv_l.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, U_l is singular. U_l[i,i] is the first zero pivot.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>getrf_strided_batched()#

rocblas_status rocsolver_zgetrf_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_double_complex *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_cgetrf_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_float_complex *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_dgetrf_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, double *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_sgetrf_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, float *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_zgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

GETRF_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = P_lL_lU_l \]

where \(P_l\) is a permutation matrix, \(L_l\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_l\) is upper triangular (upper trapezoidal if m < n).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the factors L_l and U_l from the factorization. The unit diagonal elements of L_l are not stored.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • ipiv[out] pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP). Contains the vectors of pivots indices ipiv_l (corresponding to A_l). Dimension of ipiv_l is min(m,n). Elements of ipiv_l are 1-based indices. For each instance A_l in the batch and for 1 <= i <= min(m,n), the row i of the matrix A_l was interchanged with row ipiv_l[i]. Matrix P_l of the factorization can be derived from ipiv_l.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, U_l is singular. U_l[i,i] is the first zero pivot.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>sytf2()#

rocblas_status rocsolver_zsytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_csytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_dsytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_ssytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#

SYTF2 computes the factorization of a symmetric indefinite matrix \(A\) using Bunch-Kaufman diagonal pivoting.

(This is the unblocked version of the algorithm).

The factorization has the form

\[\begin{split} \begin{array}{cl} A = U D U^T & \: \text{or}\\ A = L D L^T & \end{array} \end{split}\]

where \(U\) or \(L\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D\) is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks \(D_k\).

Specifically, \(U\) and \(L\) are computed as

\[\begin{split} \begin{array}{cl} U = P(n) U(n) \cdots P(k) U(k) \cdots & \: \text{and}\\ L = P(1) L(1) \cdots P(k) L(k) \cdots & \end{array} \end{split}\]

where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_k\), and \(P(k)\) is a permutation matrix defined by \(ipiv[k]\). If we let \(s\) denote the order of block \(D_k\), then \(U(k)\) and \(L(k)\) are unit upper/lower triangular matrices defined as

\[\begin{split} U(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}\]

and

\[\begin{split} L(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}\]

If \(s = 1\), then \(D_k\) is stored in \(A[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A\). If \(s = 2\) and uplo is upper, then \(D_k\) is stored in \(A[k-1,k-1]\), \(A[k-1,k]\), and \(A[k,k]\), and \(v\) is stored in the upper parts of columns \(k-1\) and \(k\) of \(A\). If \(s = 2\) and uplo is lower, then \(D_k\) is stored in \(A[k,k]\), \(A[k+1,k]\), and \(A[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the symmetric matrix A to be factored. On exit, the block diagonal matrix D and the multipliers needed to compute U or L.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A.

  • ipiv[out] pointer to rocblas_int. Array on the GPU of dimension n. The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv[k] > 0 then rows and columns k and ipiv[k] were interchanged and D[k,k] is a 1-by-1 diagonal block. If, instead, ipiv[k] = ipiv[k-1] < 0 and uplo is upper (or ipiv[k] = ipiv[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv[k] (or rows and columns k+1 and -ipiv[k]) were interchanged and D[k-1,k-1] to D[k,k] (or D[k,k] to D[k+1,k+1]) is a 2-by-2 diagonal block.

  • info[out] pointer to a rocblas_int on the GPU. If info = 0, successful exit. If info = i > 0, D is singular. D[i,i] is the first diagonal zero.

rocsolver_<type>sytf2_batched()#

rocblas_status rocsolver_zsytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_csytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dsytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

SYTF2_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the unblocked version of the algorithm).

The factorization has the form

\[\begin{split} \begin{array}{cl} A_l^{} = U_l^{} D_l^{} U_l^T & \: \text{or}\\ A_l^{} = L_l^{} D_l^{} L_l^T & \end{array} \end{split}\]

where \(U_l\) or \(L_l\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D_l\) is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks \(D_{kl}\).

Specifically, \(U_l\) and \(L_l\) are computed as

\[\begin{split} \begin{array}{cl} U_l = P_l(n) U_l(n) \cdots P_l(k) U_l(k) \cdots & \: \text{and}\\ L_l = P_l(1) L_l(1) \cdots P_l(k) L_l(k) \cdots & \end{array} \end{split}\]

where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_{kl}\), and \(P_l(k)\) is a permutation matrix defined by \(ipiv_l[k]\). If we let \(s\) denote the order of block \(D_{kl}\), then \(U_l(k)\) and \(L_l(k)\) are unit upper/lower triangular matrices defined as

\[\begin{split} U_l(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}\]

and

\[\begin{split} L_l(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}\]

If \(s = 1\), then \(D_{kl}\) is stored in \(A_l[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A_l\). If \(s = 2\) and uplo is upper, then \(D_{kl}\) is stored in \(A_l[k-1,k-1]\), \(A_l[k-1,k]\), and \(A_l[k,k]\), and \(v\) is stored in the upper parts of columns \(k-1\) and \(k\) of \(A_l\). If \(s = 2\) and uplo is lower, then \(D_{kl}\) is stored in \(A_l[k,k]\), \(A_l[k+1,k]\), and \(A_l[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A_l\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the matrices A_l are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of all matrices A_l in the batch.

  • A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the symmetric matrices A_l to be factored. On exit, the block diagonal matrices D_l and the multipliers needed to compute U_l or L_l.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to rocblas_int. Array on the GPU of dimension n. The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_l[k] > 0 then rows and columns k and ipiv_l[k] were interchanged and D_l[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_l[k] = ipiv_l[k-1] < 0 and uplo is upper (or ipiv_l[k] = ipiv_l[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_l[k] (or rows and columns k+1 and -ipiv_l[k]) were interchanged and D_l[k-1,k-1] to D_l[k,k] (or D_l[k,k] to D_l[k+1,k+1]) is a 2-by-2 diagonal block.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, D_l is singular. D_l[i,i] is the first diagonal zero.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>sytf2_strided_batched()#

rocblas_status rocsolver_zsytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_csytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dsytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

SYTF2_STRIDED_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the unblocked version of the algorithm).

The factorization has the form

\[\begin{split} \begin{array}{cl} A_l^{} = U_l^{} D_l^{} U_l^T & \: \text{or}\\ A_l^{} = L_l^{} D_l^{} L_l^T & \end{array} \end{split}\]

where \(U_l\) or \(L_l\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D_l\) is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks \(D_{kl}\).

Specifically, \(U_l\) and \(L_l\) are computed as

\[\begin{split} \begin{array}{cl} U_l = P_l(n) U_l(n) \cdots P_l(k) U_l(k) \cdots & \: \text{and}\\ L_l = P_l(1) L_l(1) \cdots P_l(k) L_l(k) \cdots & \end{array} \end{split}\]

where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_{kl}\), and \(P_l(k)\) is a permutation matrix defined by \(ipiv_l[k]\). If we let \(s\) denote the order of block \(D_{kl}\), then \(U_l(k)\) and \(L_l(k)\) are unit upper/lower triangular matrices defined as

\[\begin{split} U_l(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}\]

and

\[\begin{split} L_l(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}\]

If \(s = 1\), then \(D_{kl}\) is stored in \(A_l[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A_l\). If \(s = 2\) and uplo is upper, then \(D_{kl}\) is stored in \(A_l[k-1,k-1]\), \(A_l[k-1,k]\), and \(A_l[k,k]\), and \(v\) is stored in the upper parts of columns \(k-1\) and \(k\) of \(A_l\). If \(s = 2\) and uplo is lower, then \(D_{kl}\) is stored in \(A_l[k,k]\), \(A_l[k+1,k]\), and \(A_l[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A_l\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the matrices A_l are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of all matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the symmetric matrices A_l to be factored. On exit, the block diagonal matrices D_l and the multipliers needed to compute U_l or L_l.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • ipiv[out] pointer to rocblas_int. Array on the GPU of dimension n. The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_l[k] > 0 then rows and columns k and ipiv_l[k] were interchanged and D_l[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_l[k] = ipiv_l[k-1] < 0 and uplo is upper (or ipiv_l[k] = ipiv_l[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_l[k] (or rows and columns k+1 and -ipiv_l[k]) were interchanged and D_l[k-1,k-1] to D_l[k,k] (or D_l[k,k] to D_l[k+1,k+1]) is a 2-by-2 diagonal block.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, D_l is singular. D_l[i,i] is the first diagonal zero.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>sytrf()#

rocblas_status rocsolver_zsytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_csytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_dsytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_ssytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#

SYTRF computes the factorization of a symmetric indefinite matrix \(A\) using Bunch-Kaufman diagonal pivoting.

(This is the blocked version of the algorithm).

The factorization has the form

\[\begin{split} \begin{array}{cl} A = U D U^T & \: \text{or}\\ A = L D L^T & \end{array} \end{split}\]

where \(U\) or \(L\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D\) is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks \(D_k\).

Specifically, \(U\) and \(L\) are computed as

\[\begin{split} \begin{array}{cl} U = P(n) U(n) \cdots P(k) U(k) \cdots & \: \text{and}\\ L = P(1) L(1) \cdots P(k) L(k) \cdots & \end{array} \end{split}\]

where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_k\), and \(P(k)\) is a permutation matrix defined by \(ipiv[k]\). If we let \(s\) denote the order of block \(D_k\), then \(U(k)\) and \(L(k)\) are unit upper/lower triangular matrices defined as

\[\begin{split} U(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}\]

and

\[\begin{split} L(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}\]

If \(s = 1\), then \(D_k\) is stored in \(A[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A\). If \(s = 2\) and uplo is upper, then \(D_k\) is stored in \(A[k-1,k-1]\), \(A[k-1,k]\), and \(A[k,k]\), and \(v\) is stored in the upper parts of columns \(k-1\) and \(k\) of \(A\). If \(s = 2\) and uplo is lower, then \(D_k\) is stored in \(A[k,k]\), \(A[k+1,k]\), and \(A[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the symmetric matrix A to be factored. On exit, the block diagonal matrix D and the multipliers needed to compute U or L.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A.

  • ipiv[out] pointer to rocblas_int. Array on the GPU of dimension n. The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv[k] > 0 then rows and columns k and ipiv[k] were interchanged and D[k,k] is a 1-by-1 diagonal block. If, instead, ipiv[k] = ipiv[k-1] < 0 and uplo is upper (or ipiv[k] = ipiv[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv[k] (or rows and columns k+1 and -ipiv[k]) were interchanged and D[k-1,k-1] to D[k,k] (or D[k,k] to D[k+1,k+1]) is a 2-by-2 diagonal block.

  • info[out] pointer to a rocblas_int on the GPU. If info = 0, successful exit. If info = i > 0, D is singular. D[i,i] is the first diagonal zero.

rocsolver_<type>sytrf_batched()#

rocblas_status rocsolver_zsytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_csytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dsytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

SYTRF_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the blocked version of the algorithm).

The factorization has the form

\[\begin{split} \begin{array}{cl} A_l^{} = U_l^{} D_l^{} U_l^T & \: \text{or}\\ A_l^{} = L_l^{} D_l^{} L_l^T & \end{array} \end{split}\]

where \(U_l\) or \(L_l\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D_l\) is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks \(D_{kl}\).

Specifically, \(U_l\) and \(L_l\) are computed as

\[\begin{split} \begin{array}{cl} U_l = P_l(n) U_l(n) \cdots P_l(k) U_l(k) \cdots & \: \text{and}\\ L_l = P_l(1) L_l(1) \cdots P_l(k) L_l(k) \cdots & \end{array} \end{split}\]

where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_{kl}\), and \(P_l(k)\) is a permutation matrix defined by \(ipiv_l[k]\). If we let \(s\) denote the order of block \(D_{kl}\), then \(U_l(k)\) and \(L_l(k)\) are unit upper/lower triangular matrices defined as

\[\begin{split} U_l(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}\]

and

\[\begin{split} L_l(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}\]

If \(s = 1\), then \(D_{kl}\) is stored in \(A_l[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A_l\). If \(s = 2\) and uplo is upper, then \(D_{kl}\) is stored in \(A_l[k-1,k-1]\), \(A_l[k-1,k]\), and \(A_l[k,k]\), and \(v\) is stored in the upper parts of columns \(k-1\) and \(k\) of \(A_l\). If \(s = 2\) and uplo is lower, then \(D_{kl}\) is stored in \(A_l[k,k]\), \(A_l[k+1,k]\), and \(A_l[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A_l\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the matrices A_l are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of all matrices A_l in the batch.

  • A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the symmetric matrices A_l to be factored. On exit, the block diagonal matrices D_l and the multipliers needed to compute U_l or L_l.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to rocblas_int. Array on the GPU of dimension n. The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_l[k] > 0 then rows and columns k and ipiv_l[k] were interchanged and D_l[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_l[k] = ipiv_l[k-1] < 0 and uplo is upper (or ipiv_l[k] = ipiv_l[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_l[k] (or rows and columns k+1 and -ipiv_l[k]) were interchanged and D_l[k-1,k-1] to D_l[k,k] (or D_l[k,k] to D_l[k+1,k+1]) is a 2-by-2 diagonal block.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, D_l is singular. D_l[i,i] is the first diagonal zero.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>sytrf_strided_batched()#

rocblas_status rocsolver_zsytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_csytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dsytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

SYTRF_STRIDED_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the blocked version of the algorithm).

The factorization has the form

\[\begin{split} \begin{array}{cl} A_l^{} = U_l^{} D_l^{} U_l^T & \: \text{or}\\ A_l^{} = L_l^{} D_l^{} L_l^T & \end{array} \end{split}\]

where \(U_l\) or \(L_l\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D_l\) is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks \(D_{kl}\).

Specifically, \(U_l\) and \(L_l\) are computed as

\[\begin{split} \begin{array}{cl} U_l = P_l(n) U_l(n) \cdots P_l(k) U_l(k) \cdots & \: \text{and}\\ L_l = P_l(1) L_l(1) \cdots P_l(k) L_l(k) \cdots & \end{array} \end{split}\]

where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_{kl}\), and \(P_l(k)\) is a permutation matrix defined by \(ipiv_l[k]\). If we let \(s\) denote the order of block \(D_{kl}\), then \(U_l(k)\) and \(L_l(k)\) are unit upper/lower triangular matrices defined as

\[\begin{split} U_l(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}\]

and

\[\begin{split} L_l(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}\]

If \(s = 1\), then \(D_{kl}\) is stored in \(A_l[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A_l\). If \(s = 2\) and uplo is upper, then \(D_{kl}\) is stored in \(A_l[k-1,k-1]\), \(A_l[k-1,k]\), and \(A_l[k,k]\), and \(v\) is stored in the upper parts of columns \(k-1\) and \(k\) of \(A_l\). If \(s = 2\) and uplo is lower, then \(D_l(k)\) is stored in \(A_l[k,k]\), \(A_l[k+1,k]\), and \(A_l[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A_l\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the matrices A_l are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of all matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the symmetric matrices A_l to be factored. On exit, the block diagonal matrices D_l and the multipliers needed to compute U_l or L_l.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • ipiv[out] pointer to rocblas_int. Array on the GPU of dimension n. The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_l[k] > 0 then rows and columns k and ipiv_l[k] were interchanged and D_l[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_l[k] = ipiv_l[k-1] < 0 and uplo is upper (or ipiv_l[k] = ipiv_l[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_l[k] (or rows and columns k+1 and -ipiv_l[k]) were interchanged and D_l[k-1,k-1] to D_l[k,k] (or D_l[k,k] to D_l[k+1,k+1]) is a 2-by-2 diagonal block.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, D_l is singular. D_l[i,i] is the first diagonal zero.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

Orthogonal factorizations#

rocsolver_<type>geqr2()#

rocblas_status rocsolver_zgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GEQR2 computes a QR factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

\[\begin{split} A = Q\left[\begin{array}{c} R\\ 0 \end{array}\right] \end{split}\]

where R is upper triangular (upper trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H(1)H(2)\cdots H(k), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H(i)\) is given by

\[ H(i) = I - \text{ipiv}[i] \cdot v_i^{} v_i' \]

where the first i-1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on and above the diagonal contain the factor R; the elements below the diagonal are the last m - i elements of Householder vector v_i.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

  • ipiv[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars.

rocsolver_<type>geqr2_batched()#

rocblas_status rocsolver_zgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQR2_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[\begin{split} A_l = Q_l\left[\begin{array}{c} R_l\\ 0 \end{array}\right] \end{split}\]

where \(R_l\) is upper triangular (upper trapezoidal if m < n), and \(Q_l\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(1)H_l(2)\cdots H_l(k), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the first i-1 elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the diagonal contain the factor R_l. The elements below the diagonal are the last m - i elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>geqr2_strided_batched()#

rocblas_status rocsolver_zgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQR2_STRIDED_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[\begin{split} A_l = Q_l\left[\begin{array}{c} R_l\\ 0 \end{array}\right] \end{split}\]

where \(R_l\) is upper triangular (upper trapezoidal if m < n), and \(Q_l\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(1)H_l(2)\cdots H_l(k), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the first i-1 elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the diagonal contain the factor R_l. The elements below the diagonal are the last m - i elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>geqrf()#

rocblas_status rocsolver_zgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GEQRF computes a QR factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

\[\begin{split} A = Q\left[\begin{array}{c} R\\ 0 \end{array}\right] \end{split}\]

where R is upper triangular (upper trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H(1)H(2)\cdots H(k), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H(i)\) is given by

\[ H(i) = I - \text{ipiv}[i] \cdot v_i^{} v_i' \]

where the first i-1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on and above the diagonal contain the factor R; the elements below the diagonal are the last m - i elements of Householder vector v_i.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

  • ipiv[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars.

rocsolver_<type>geqrf_batched()#

rocblas_status rocsolver_zgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQRF_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[\begin{split} A_l = Q_l\left[\begin{array}{c} R_l\\ 0 \end{array}\right] \end{split}\]

where \(R_l\) is upper triangular (upper trapezoidal if m < n), and \(Q_l\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(1)H_l(2)\cdots H_l(k), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the first i-1 elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the diagonal contain the factor R_l. The elements below the diagonal are the last m - i elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>geqrf_strided_batched()#

rocblas_status rocsolver_zgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQRF_STRIDED_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[\begin{split} A_l = Q_l\left[\begin{array}{c} R_l\\ 0 \end{array}\right] \end{split}\]

where \(R_l\) is upper triangular (upper trapezoidal if m < n), and \(Q_l\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(1)H_l(2)\cdots H_l(k), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the first i-1 elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the diagonal contain the factor R_l. The elements below the diagonal are the last m - i elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>gerq2()#

rocblas_status rocsolver_zgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GERQ2 computes a RQ factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

\[ A = \left[\begin{array}{cc} 0 & R \end{array}\right] Q \]

where R is upper triangular (upper trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H(1)'H(2)' \cdots H(k)', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrix \(H(i)\) is given by

\[ H(i) = I - \text{ipiv}[i] \cdot v_i^{} v_i' \]

where the last n-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_i.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

  • ipiv[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars.

rocsolver_<type>gerq2_batched()#

rocblas_status rocsolver_zgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GERQ2_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = \left[\begin{array}{cc} 0 & R_l \end{array}\right] Q_l \]

where \(R_l\) is upper triangular (upper trapezoidal if m > n), and \(Q_l\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(1)'H_l(2)' \cdots H_l(k)', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the last n-i elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_l; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>gerq2_strided_batched()#

rocblas_status rocsolver_zgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GERQ2_STRIDED_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = \left[\begin{array}{cc} 0 & R_l \end{array}\right] Q_l \]

where \(R_l\) is upper triangular (upper trapezoidal if m > n), and \(Q_l\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(1)'H_l(2)' \cdots H_l(k)', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the last n-i elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_l; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>gerqf()#

rocblas_status rocsolver_zgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GERQF computes a RQ factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

\[ A = \left[\begin{array}{cc} 0 & R \end{array}\right] Q \]

where R is upper triangular (upper trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H(1)'H(2)' \cdots H(k)', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrix \(H(i)\) is given by

\[ H(i) = I - \text{ipiv}[i] \cdot v_i^{} v_i' \]

where the last n-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_i.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

  • ipiv[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars.

rocsolver_<type>gerqf_batched()#

rocblas_status rocsolver_zgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GERQF_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = \left[\begin{array}{cc} 0 & R_l \end{array}\right] Q_l \]

where \(R_l\) is upper triangular (upper trapezoidal if m > n), and \(Q_l\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(1)'H_l(2)' \cdots H_l(k)', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the last n-i elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_l; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>gerqf_strided_batched()#

rocblas_status rocsolver_zgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GERQF_STRIDED_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = \left[\begin{array}{cc} 0 & R_l \end{array}\right] Q_l \]

where \(R_l\) is upper triangular (upper trapezoidal if m > n), and \(Q_l\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(1)'H_l(2)' \cdots H_l(k)', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the last n-i elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_l; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>geql2()#

rocblas_status rocsolver_zgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GEQL2 computes a QL factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

\[\begin{split} A = Q\left[\begin{array}{c} 0\\ L \end{array}\right] \end{split}\]

where L is lower triangular (lower trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H(k)H(k-1)\cdots H(1), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H(i)\) is given by

\[ H(i) = I - \text{ipiv}[i] \cdot v_i^{} v_i' \]

where the last m-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_i.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

  • ipiv[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars.

rocsolver_<type>geql2_batched()#

rocblas_status rocsolver_zgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQL2_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[\begin{split} A_l = Q_l\left[\begin{array}{c} 0\\ L_l \end{array}\right] \end{split}\]

where \(L_l\) is lower triangular (lower trapezoidal if m < n), and \(Q_l\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(k)H_l(k-1)\cdots H_l(1), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the last m-i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L_l; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>geql2_strided_batched()#

rocblas_status rocsolver_zgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQL2_STRIDED_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[\begin{split} A_l = Q_l\left[\begin{array}{c} 0\\ L_l \end{array}\right] \end{split}\]

where \(L_l\) is lower triangular (lower trapezoidal if m < n), and \(Q_l\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(k)H_l(k-1)\cdots H_l(1), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the last m-i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L_l; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>geqlf()#

rocblas_status rocsolver_zgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#