# rocSOLVER LAPACK Functions#

LAPACK routines solve complex Numerical Linear Algebra problems. These functions are organized in the following categories:

Note

Throughout the APIs’ descriptions, we use the following notations:

• i, j, and k are used as general purpose indices. In some legacy LAPACK APIs, k could be a parameter indicating some problem/matrix dimension.

• Depending on the context, when it is necessary to index rows, columns and blocks or submatrices, i is assigned to rows, j to columns and k to blocks. $$l$$ is always used to index matrices/problems in a batch.

• x[i] stands for the i-th element of vector x, while A[i,j] represents the element in the i-th row and j-th column of matrix A. Indices are 1-based, i.e. x[1] is the first element of x.

• To identify a block in a matrix or a matrix in the batch, k and $$l$$ are used as sub-indices

• x_i $$=x_i$$; we sometimes use both notations, $$x_i$$ when displaying mathematical equations, and x_i in the text describing the function parameters.

• If X is a real vector or matrix, $$X^T$$ indicates its transpose; if X is complex, then $$X^H$$ represents its conjugate transpose. When X could be real or complex, we use X’ to indicate X transposed or X conjugate transposed, accordingly.

• When a matrix A is formed as the product of several matrices, the following notation is used: A=M(1)M(2)…M(t).

## Triangular factorizations#

### rocsolver_<type>potf2()#

rocblas_status rocsolver_zpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_cpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_dpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_spotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)#

POTF2 computes the Cholesky factorization of a real symmetric (complex Hermitian) positive definite matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form:

$\begin{split} \begin{array}{cl} A = U'U & \: \text{if uplo is upper, or}\\ A = LL' & \: \text{if uplo is lower.} \end{array} \end{split}$

U is an upper triangular matrix and L is lower triangular.

Parameters:
• handle[in] rocblas_handle.

• uplo[in] rocblas_fill. Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

• n[in] rocblas_int. n >= 0. The number of rows and columns of matrix A.

• A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the matrix A to be factored. On exit, the lower or upper triangular factor.

• lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A.

• info[out] pointer to a rocblas_int on the GPU. If info = 0, successful factorization of matrix A. If info = i > 0, the leading minor of order i of A is not positive definite. The factorization stopped at this point.

### rocsolver_<type>potf2_batched()#

rocblas_status rocsolver_zpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_spotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#

POTF2_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix $$A_l$$ in the batch has the form:

$\begin{split} \begin{array}{cl} A_l^{} = U_l'U_l^{} & \: \text{if uplo is upper, or}\\ A_l^{} = L_l^{}L_l' & \: \text{if uplo is lower.} \end{array} \end{split}$

$$U_l$$ is an upper triangular matrix and $$L_l$$ is lower triangular.

Parameters:
• handle[in] rocblas_handle.

• uplo[in] rocblas_fill. Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

• n[in] rocblas_int. n >= 0. The number of rows and columns of matrix A_l.

• A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the matrices A_l to be factored. On exit, the upper or lower triangular factors.

• lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A_l.

• info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful factorization of matrix A_l. If info[l] = i > 0, the leading minor of order i of A_l is not positive definite. The l-th factorization stopped at this point.

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>potf2_strided_batched()#

rocblas_status rocsolver_zpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_spotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#

POTF2_STRIDED_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix $$A_l$$ in the batch has the form:

$\begin{split} \begin{array}{cl} A_l^{} = U_l'U_l^{} & \: \text{if uplo is upper, or}\\ A_l^{} = L_l^{}L_l' & \: \text{if uplo is lower.} \end{array} \end{split}$

$$U_l$$ is an upper triangular matrix and $$L_l$$ is lower triangular.

Parameters:
• handle[in] rocblas_handle.

• uplo[in] rocblas_fill. Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

• n[in] rocblas_int. n >= 0. The number of rows and columns of matrix A_l.

• A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the matrices A_l to be factored. On exit, the upper or lower triangular factors.

• lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A_l.

• strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

• info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful factorization of matrix A_l. If info[l] = i > 0, the leading minor of order i of A_l is not positive definite. The l-th factorization stopped at this point.

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>potrf()#

rocblas_status rocsolver_zpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_cpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_dpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_spotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)#

POTRF computes the Cholesky factorization of a real symmetric (complex Hermitian) positive definite matrix A.

(This is the blocked version of the algorithm).

The factorization has the form:

$\begin{split} \begin{array}{cl} A = U'U & \: \text{if uplo is upper, or}\\ A = LL' & \: \text{if uplo is lower.} \end{array} \end{split}$

U is an upper triangular matrix and L is lower triangular.

Parameters:
• handle[in] rocblas_handle.

• uplo[in] rocblas_fill. Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

• n[in] rocblas_int. n >= 0. The number of rows and columns of matrix A.

• A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the matrix A to be factored. On exit, the lower or upper triangular factor.

• lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A.

• info[out] pointer to a rocblas_int on the GPU. If info = 0, successful factorization of matrix A. If info = i > 0, the leading minor of order i of A is not positive definite. The factorization stopped at this point.

### rocsolver_<type>potrf_batched()#

rocblas_status rocsolver_zpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_spotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#

POTRF_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the blocked version of the algorithm).

The factorization of matrix $$A_l$$ in the batch has the form:

$\begin{split} \begin{array}{cl} A_l^{} = U_l'U_l^{} & \: \text{if uplo is upper, or}\\ A_l^{} = L_l^{}L_l' & \: \text{if uplo is lower.} \end{array} \end{split}$

$$U_l$$ is an upper triangular matrix and $$L_l$$ is lower triangular.

Parameters:
• handle[in] rocblas_handle.

• uplo[in] rocblas_fill. Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

• n[in] rocblas_int. n >= 0. The number of rows and columns of matrix A_l.

• A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the matrices A_l to be factored. On exit, the upper or lower triangular factors.

• lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A_l.

• info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful factorization of matrix A_l. If info[l] = i > 0, the leading minor of order i of A_l is not positive definite. The l-th factorization stopped at this point.

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>potrf_strided_batched()#

rocblas_status rocsolver_zpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_spotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#

POTRF_STRIDED_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the blocked version of the algorithm).

The factorization of matrix $$A_l$$ in the batch has the form:

$\begin{split} \begin{array}{cl} A_l^{} = U_l'U_l^{} & \: \text{if uplo is upper, or}\\ A_l^{} = L_l^{}L_l' & \: \text{if uplo is lower.} \end{array} \end{split}$

$$U_l$$ is an upper triangular matrix and $$L_l$$ is lower triangular.

Parameters:
• handle[in] rocblas_handle.

• uplo[in] rocblas_fill. Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

• n[in] rocblas_int. n >= 0. The number of rows and columns of matrix A_l.

• A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the matrices A_l to be factored. On exit, the upper or lower triangular factors.

• lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A_l.

• strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

• info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful factorization of matrix A_l. If info[l] = i > 0, the leading minor of order i of A_l is not positive definite. The l-th factorization stopped at this point.

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>getf2()#

rocblas_status rocsolver_zgetf2_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_double_complex *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_cgetf2_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_float_complex *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_dgetf2_64(rocblas_handle handle, const int64_t m, const int64_t n, double *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_sgetf2_64(rocblas_handle handle, const int64_t m, const int64_t n, float *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_zgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_cgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_dgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_sgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#

GETF2 computes the LU factorization of a general m-by-n matrix A using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization has the form

$A = PLU$

where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

• n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

• A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

• ipiv[out] pointer to rocblas_int. Array on the GPU of dimension min(m,n). The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= i <= min(m,n), the row i of the matrix was interchanged with row ipiv[i]. Matrix P of the factorization can be derived from ipiv.

• info[out] pointer to a rocblas_int on the GPU. If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero pivot.

### rocsolver_<type>getf2_batched()#

rocblas_status rocsolver_zgetf2_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_double_complex *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_cgetf2_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_float_complex *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_dgetf2_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, double *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_sgetf2_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, float *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_zgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

GETF2_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix $$A_l$$ in the batch has the form

$A_l = P_lL_lU_l$

where $$P_l$$ is a permutation matrix, $$L_l$$ is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and $$U_l$$ is upper triangular (upper trapezoidal if m < n).

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of all matrices A_l in the batch.

• n[in] rocblas_int. n >= 0. The number of columns of all matrices A_l in the batch.

• A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the factors L_l and U_l from the factorizations. The unit diagonal elements of L_l are not stored.

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

• ipiv[out] pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP). Contains the vectors of pivot indices ipiv_l (corresponding to A_l). Dimension of ipiv_l is min(m,n). Elements of ipiv_l are 1-based indices. For each instance A_l in the batch and for 1 <= i <= min(m,n), the row i of the matrix A_l was interchanged with row ipiv_l[i]. Matrix P_l of the factorization can be derived from ipiv_l.

• strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

• info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, U_l is singular. U_l[i,i] is the first zero pivot.

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>getf2_strided_batched()#

rocblas_status rocsolver_zgetf2_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_double_complex *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_cgetf2_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_float_complex *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_dgetf2_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, double *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_sgetf2_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, float *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_zgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

GETF2_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix $$A_l$$ in the batch has the form

$A_l = P_lL_lU_l$

where $$P_l$$ is a permutation matrix, $$L_l$$ is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and $$U_l$$ is upper triangular (upper trapezoidal if m < n).

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of all matrices A_l in the batch.

• n[in] rocblas_int. n >= 0. The number of columns of all matrices A_l in the batch.

• A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the factors L_l and U_l from the factorization. The unit diagonal elements of L_l are not stored.

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

• strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

• ipiv[out] pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP). Contains the vectors of pivots indices ipiv_l (corresponding to A_l). Dimension of ipiv_l is min(m,n). Elements of ipiv_l are 1-based indices. For each instance A_l in the batch and for 1 <= i <= min(m,n), the row i of the matrix A_l was interchanged with row ipiv_l[i]. Matrix P_l of the factorization can be derived from ipiv_l.

• strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

• info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, U_l is singular. U_l[i,i] is the first zero pivot.

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>getrf()#

rocblas_status rocsolver_zgetrf_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_double_complex *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_cgetrf_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_float_complex *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_dgetrf_64(rocblas_handle handle, const int64_t m, const int64_t n, double *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_sgetrf_64(rocblas_handle handle, const int64_t m, const int64_t n, float *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_zgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_cgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_dgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_sgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#

GETRF computes the LU factorization of a general m-by-n matrix A using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization has the form

$A = PLU$

where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

• n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

• A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

• ipiv[out] pointer to rocblas_int. Array on the GPU of dimension min(m,n). The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= i <= min(m,n), the row i of the matrix was interchanged with row ipiv[i]. Matrix P of the factorization can be derived from ipiv.

• info[out] pointer to a rocblas_int on the GPU. If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero pivot.

### rocsolver_<type>getrf_batched()#

rocblas_status rocsolver_zgetrf_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_double_complex *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_cgetrf_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_float_complex *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_dgetrf_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, double *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_sgetrf_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, float *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_zgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

GETRF_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix $$A_l$$ in the batch has the form

$A_l = P_lL_lU_l$

where $$P_l$$ is a permutation matrix, $$L_l$$ is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and $$U_l$$ is upper triangular (upper trapezoidal if m < n).

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of all matrices A_l in the batch.

• n[in] rocblas_int. n >= 0. The number of columns of all matrices A_l in the batch.

• A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the factors L_l and U_l from the factorizations. The unit diagonal elements of L_l are not stored.

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

• ipiv[out] pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP). Contains the vectors of pivot indices ipiv_l (corresponding to A_l). Dimension of ipiv_l is min(m,n). Elements of ipiv_l are 1-based indices. For each instance A_l in the batch and for 1 <= i <= min(m,n), the row i of the matrix A_l was interchanged with row ipiv_l[i]. Matrix P_l of the factorization can be derived from ipiv_l.

• strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

• info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, U_l is singular. U_l[i,i] is the first zero pivot.

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>getrf_strided_batched()#

rocblas_status rocsolver_zgetrf_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_double_complex *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_cgetrf_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_float_complex *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_dgetrf_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, double *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_sgetrf_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, float *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_zgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

GETRF_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix $$A_l$$ in the batch has the form

$A_l = P_lL_lU_l$

where $$P_l$$ is a permutation matrix, $$L_l$$ is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and $$U_l$$ is upper triangular (upper trapezoidal if m < n).

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of all matrices A_l in the batch.

• n[in] rocblas_int. n >= 0. The number of columns of all matrices A_l in the batch.

• A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the factors L_l and U_l from the factorization. The unit diagonal elements of L_l are not stored.

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

• strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

• ipiv[out] pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP). Contains the vectors of pivots indices ipiv_l (corresponding to A_l). Dimension of ipiv_l is min(m,n). Elements of ipiv_l are 1-based indices. For each instance A_l in the batch and for 1 <= i <= min(m,n), the row i of the matrix A_l was interchanged with row ipiv_l[i]. Matrix P_l of the factorization can be derived from ipiv_l.

• strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

• info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, U_l is singular. U_l[i,i] is the first zero pivot.

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>sytf2()#

rocblas_status rocsolver_zsytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_csytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_dsytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_ssytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#

SYTF2 computes the factorization of a symmetric indefinite matrix $$A$$ using Bunch-Kaufman diagonal pivoting.

(This is the unblocked version of the algorithm).

The factorization has the form

$\begin{split} \begin{array}{cl} A = U D U^T & \: \text{or}\\ A = L D L^T & \end{array} \end{split}$

where $$U$$ or $$L$$ is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and $$D$$ is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks $$D_k$$.

Specifically, $$U$$ and $$L$$ are computed as

$\begin{split} \begin{array}{cl} U = P(n) U(n) \cdots P(k) U(k) \cdots & \: \text{and}\\ L = P(1) L(1) \cdots P(k) L(k) \cdots & \end{array} \end{split}$

where $$k$$ decreases from $$n$$ to 1 (increases from 1 to $$n$$) in steps of 1 or 2, depending on the order of block $$D_k$$, and $$P(k)$$ is a permutation matrix defined by $$ipiv[k]$$. If we let $$s$$ denote the order of block $$D_k$$, then $$U(k)$$ and $$L(k)$$ are unit upper/lower triangular matrices defined as

$\begin{split} U(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}$

and

$\begin{split} L(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}$

If $$s = 1$$, then $$D_k$$ is stored in $$A[k,k]$$ and $$v$$ is stored in the upper/lower part of column $$k$$ of $$A$$. If $$s = 2$$ and uplo is upper, then $$D_k$$ is stored in $$A[k-1,k-1]$$, $$A[k-1,k]$$, and $$A[k,k]$$, and $$v$$ is stored in the upper parts of columns $$k-1$$ and $$k$$ of $$A$$. If $$s = 2$$ and uplo is lower, then $$D_k$$ is stored in $$A[k,k]$$, $$A[k+1,k]$$, and $$A[k+1,k+1]$$, and $$v$$ is stored in the lower parts of columns $$k$$ and $$k+1$$ of $$A$$.

Parameters:
• handle[in] rocblas_handle.

• uplo[in] rocblas_fill. Specifies whether the upper or lower part of the matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

• n[in] rocblas_int. n >= 0. The number of rows and columns of the matrix A.

• A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the symmetric matrix A to be factored. On exit, the block diagonal matrix D and the multipliers needed to compute U or L.

• lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A.

• ipiv[out] pointer to rocblas_int. Array on the GPU of dimension n. The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv[k] > 0 then rows and columns k and ipiv[k] were interchanged and D[k,k] is a 1-by-1 diagonal block. If, instead, ipiv[k] = ipiv[k-1] < 0 and uplo is upper (or ipiv[k] = ipiv[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv[k] (or rows and columns k+1 and -ipiv[k]) were interchanged and D[k-1,k-1] to D[k,k] (or D[k,k] to D[k+1,k+1]) is a 2-by-2 diagonal block.

• info[out] pointer to a rocblas_int on the GPU. If info = 0, successful exit. If info = i > 0, D is singular. D[i,i] is the first diagonal zero.

### rocsolver_<type>sytf2_batched()#

rocblas_status rocsolver_zsytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_csytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dsytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

SYTF2_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the unblocked version of the algorithm).

The factorization has the form

$\begin{split} \begin{array}{cl} A_l^{} = U_l^{} D_l^{} U_l^T & \: \text{or}\\ A_l^{} = L_l^{} D_l^{} L_l^T & \end{array} \end{split}$

where $$U_l$$ or $$L_l$$ is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and $$D_l$$ is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks $$D_{kl}$$.

Specifically, $$U_l$$ and $$L_l$$ are computed as

$\begin{split} \begin{array}{cl} U_l = P_l(n) U_l(n) \cdots P_l(k) U_l(k) \cdots & \: \text{and}\\ L_l = P_l(1) L_l(1) \cdots P_l(k) L_l(k) \cdots & \end{array} \end{split}$

where $$k$$ decreases from $$n$$ to 1 (increases from 1 to $$n$$) in steps of 1 or 2, depending on the order of block $$D_{kl}$$, and $$P_l(k)$$ is a permutation matrix defined by $$ipiv_l[k]$$. If we let $$s$$ denote the order of block $$D_{kl}$$, then $$U_l(k)$$ and $$L_l(k)$$ are unit upper/lower triangular matrices defined as

$\begin{split} U_l(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}$

and

$\begin{split} L_l(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}$

If $$s = 1$$, then $$D_{kl}$$ is stored in $$A_l[k,k]$$ and $$v$$ is stored in the upper/lower part of column $$k$$ of $$A_l$$. If $$s = 2$$ and uplo is upper, then $$D_{kl}$$ is stored in $$A_l[k-1,k-1]$$, $$A_l[k-1,k]$$, and $$A_l[k,k]$$, and $$v$$ is stored in the upper parts of columns $$k-1$$ and $$k$$ of $$A_l$$. If $$s = 2$$ and uplo is lower, then $$D_{kl}$$ is stored in $$A_l[k,k]$$, $$A_l[k+1,k]$$, and $$A_l[k+1,k+1]$$, and $$v$$ is stored in the lower parts of columns $$k$$ and $$k+1$$ of $$A_l$$.

Parameters:
• handle[in] rocblas_handle.

• uplo[in] rocblas_fill. Specifies whether the upper or lower part of the matrices A_l are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

• n[in] rocblas_int. n >= 0. The number of rows and columns of all matrices A_l in the batch.

• A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the symmetric matrices A_l to be factored. On exit, the block diagonal matrices D_l and the multipliers needed to compute U_l or L_l.

• lda[in] rocblas_int. lda >= n. Specifies the leading dimension of matrices A_l.

• ipiv[out] pointer to rocblas_int. Array on the GPU of dimension n. The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_l[k] > 0 then rows and columns k and ipiv_l[k] were interchanged and D_l[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_l[k] = ipiv_l[k-1] < 0 and uplo is upper (or ipiv_l[k] = ipiv_l[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_l[k] (or rows and columns k+1 and -ipiv_l[k]) were interchanged and D_l[k-1,k-1] to D_l[k,k] (or D_l[k,k] to D_l[k+1,k+1]) is a 2-by-2 diagonal block.

• strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

• info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, D_l is singular. D_l[i,i] is the first diagonal zero.

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>sytf2_strided_batched()#

rocblas_status rocsolver_zsytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_csytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dsytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

SYTF2_STRIDED_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the unblocked version of the algorithm).

The factorization has the form

$\begin{split} \begin{array}{cl} A_l^{} = U_l^{} D_l^{} U_l^T & \: \text{or}\\ A_l^{} = L_l^{} D_l^{} L_l^T & \end{array} \end{split}$

where $$U_l$$ or $$L_l$$ is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and $$D_l$$ is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks $$D_{kl}$$.

Specifically, $$U_l$$ and $$L_l$$ are computed as

$\begin{split} \begin{array}{cl} U_l = P_l(n) U_l(n) \cdots P_l(k) U_l(k) \cdots & \: \text{and}\\ L_l = P_l(1) L_l(1) \cdots P_l(k) L_l(k) \cdots & \end{array} \end{split}$

where $$k$$ decreases from $$n$$ to 1 (increases from 1 to $$n$$) in steps of 1 or 2, depending on the order of block $$D_{kl}$$, and $$P_l(k)$$ is a permutation matrix defined by $$ipiv_l[k]$$. If we let $$s$$ denote the order of block $$D_{kl}$$, then $$U_l(k)$$ and $$L_l(k)$$ are unit upper/lower triangular matrices defined as

$\begin{split} U_l(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}$

and

$\begin{split} L_l(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}$

If $$s = 1$$, then $$D_{kl}$$ is stored in $$A_l[k,k]$$ and $$v$$ is stored in the upper/lower part of column $$k$$ of $$A_l$$. If $$s = 2$$ and uplo is upper, then $$D_{kl}$$ is stored in $$A_l[k-1,k-1]$$, $$A_l[k-1,k]$$, and $$A_l[k,k]$$, and $$v$$ is stored in the upper parts of columns $$k-1$$ and $$k$$ of $$A_l$$. If $$s = 2$$ and uplo is lower, then $$D_{kl}$$ is stored in $$A_l[k,k]$$, $$A_l[k+1,k]$$, and $$A_l[k+1,k+1]$$, and $$v$$ is stored in the lower parts of columns $$k$$ and $$k+1$$ of $$A_l$$.

Parameters:
• handle[in] rocblas_handle.

• uplo[in] rocblas_fill. Specifies whether the upper or lower part of the matrices A_l are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

• n[in] rocblas_int. n >= 0. The number of rows and columns of all matrices A_l in the batch.

• A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the symmetric matrices A_l to be factored. On exit, the block diagonal matrices D_l and the multipliers needed to compute U_l or L_l.

• lda[in] rocblas_int. lda >= n. Specifies the leading dimension of matrices A_l.

• strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

• ipiv[out] pointer to rocblas_int. Array on the GPU of dimension n. The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_l[k] > 0 then rows and columns k and ipiv_l[k] were interchanged and D_l[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_l[k] = ipiv_l[k-1] < 0 and uplo is upper (or ipiv_l[k] = ipiv_l[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_l[k] (or rows and columns k+1 and -ipiv_l[k]) were interchanged and D_l[k-1,k-1] to D_l[k,k] (or D_l[k,k] to D_l[k+1,k+1]) is a 2-by-2 diagonal block.

• strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

• info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, D_l is singular. D_l[i,i] is the first diagonal zero.

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>sytrf()#

rocblas_status rocsolver_zsytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_csytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_dsytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_ssytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#

SYTRF computes the factorization of a symmetric indefinite matrix $$A$$ using Bunch-Kaufman diagonal pivoting.

(This is the blocked version of the algorithm).

The factorization has the form

$\begin{split} \begin{array}{cl} A = U D U^T & \: \text{or}\\ A = L D L^T & \end{array} \end{split}$

where $$U$$ or $$L$$ is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and $$D$$ is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks $$D_k$$.

Specifically, $$U$$ and $$L$$ are computed as

$\begin{split} \begin{array}{cl} U = P(n) U(n) \cdots P(k) U(k) \cdots & \: \text{and}\\ L = P(1) L(1) \cdots P(k) L(k) \cdots & \end{array} \end{split}$

where $$k$$ decreases from $$n$$ to 1 (increases from 1 to $$n$$) in steps of 1 or 2, depending on the order of block $$D_k$$, and $$P(k)$$ is a permutation matrix defined by $$ipiv[k]$$. If we let $$s$$ denote the order of block $$D_k$$, then $$U(k)$$ and $$L(k)$$ are unit upper/lower triangular matrices defined as

$\begin{split} U(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}$

and

$\begin{split} L(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}$

If $$s = 1$$, then $$D_k$$ is stored in $$A[k,k]$$ and $$v$$ is stored in the upper/lower part of column $$k$$ of $$A$$. If $$s = 2$$ and uplo is upper, then $$D_k$$ is stored in $$A[k-1,k-1]$$, $$A[k-1,k]$$, and $$A[k,k]$$, and $$v$$ is stored in the upper parts of columns $$k-1$$ and $$k$$ of $$A$$. If $$s = 2$$ and uplo is lower, then $$D_k$$ is stored in $$A[k,k]$$, $$A[k+1,k]$$, and $$A[k+1,k+1]$$, and $$v$$ is stored in the lower parts of columns $$k$$ and $$k+1$$ of $$A$$.

Parameters:
• handle[in] rocblas_handle.

• uplo[in] rocblas_fill. Specifies whether the upper or lower part of the matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

• n[in] rocblas_int. n >= 0. The number of rows and columns of the matrix A.

• A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the symmetric matrix A to be factored. On exit, the block diagonal matrix D and the multipliers needed to compute U or L.

• lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A.

• ipiv[out] pointer to rocblas_int. Array on the GPU of dimension n. The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv[k] > 0 then rows and columns k and ipiv[k] were interchanged and D[k,k] is a 1-by-1 diagonal block. If, instead, ipiv[k] = ipiv[k-1] < 0 and uplo is upper (or ipiv[k] = ipiv[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv[k] (or rows and columns k+1 and -ipiv[k]) were interchanged and D[k-1,k-1] to D[k,k] (or D[k,k] to D[k+1,k+1]) is a 2-by-2 diagonal block.

• info[out] pointer to a rocblas_int on the GPU. If info = 0, successful exit. If info = i > 0, D is singular. D[i,i] is the first diagonal zero.

### rocsolver_<type>sytrf_batched()#

rocblas_status rocsolver_zsytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_csytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dsytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

SYTRF_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the blocked version of the algorithm).

The factorization has the form

$\begin{split} \begin{array}{cl} A_l^{} = U_l^{} D_l^{} U_l^T & \: \text{or}\\ A_l^{} = L_l^{} D_l^{} L_l^T & \end{array} \end{split}$

where $$U_l$$ or $$L_l$$ is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and $$D_l$$ is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks $$D_{kl}$$.

Specifically, $$U_l$$ and $$L_l$$ are computed as

$\begin{split} \begin{array}{cl} U_l = P_l(n) U_l(n) \cdots P_l(k) U_l(k) \cdots & \: \text{and}\\ L_l = P_l(1) L_l(1) \cdots P_l(k) L_l(k) \cdots & \end{array} \end{split}$

where $$k$$ decreases from $$n$$ to 1 (increases from 1 to $$n$$) in steps of 1 or 2, depending on the order of block $$D_{kl}$$, and $$P_l(k)$$ is a permutation matrix defined by $$ipiv_l[k]$$. If we let $$s$$ denote the order of block $$D_{kl}$$, then $$U_l(k)$$ and $$L_l(k)$$ are unit upper/lower triangular matrices defined as

$\begin{split} U_l(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}$

and

$\begin{split} L_l(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}$

If $$s = 1$$, then $$D_{kl}$$ is stored in $$A_l[k,k]$$ and $$v$$ is stored in the upper/lower part of column $$k$$ of $$A_l$$. If $$s = 2$$ and uplo is upper, then $$D_{kl}$$ is stored in $$A_l[k-1,k-1]$$, $$A_l[k-1,k]$$, and $$A_l[k,k]$$, and $$v$$ is stored in the upper parts of columns $$k-1$$ and $$k$$ of $$A_l$$. If $$s = 2$$ and uplo is lower, then $$D_{kl}$$ is stored in $$A_l[k,k]$$, $$A_l[k+1,k]$$, and $$A_l[k+1,k+1]$$, and $$v$$ is stored in the lower parts of columns $$k$$ and $$k+1$$ of $$A_l$$.

Parameters:
• handle[in] rocblas_handle.

• uplo[in] rocblas_fill. Specifies whether the upper or lower part of the matrices A_l are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

• n[in] rocblas_int. n >= 0. The number of rows and columns of all matrices A_l in the batch.

• A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the symmetric matrices A_l to be factored. On exit, the block diagonal matrices D_l and the multipliers needed to compute U_l or L_l.

• lda[in] rocblas_int. lda >= n. Specifies the leading dimension of matrices A_l.

• ipiv[out] pointer to rocblas_int. Array on the GPU of dimension n. The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_l[k] > 0 then rows and columns k and ipiv_l[k] were interchanged and D_l[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_l[k] = ipiv_l[k-1] < 0 and uplo is upper (or ipiv_l[k] = ipiv_l[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_l[k] (or rows and columns k+1 and -ipiv_l[k]) were interchanged and D_l[k-1,k-1] to D_l[k,k] (or D_l[k,k] to D_l[k+1,k+1]) is a 2-by-2 diagonal block.

• strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

• info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, D_l is singular. D_l[i,i] is the first diagonal zero.

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>sytrf_strided_batched()#

rocblas_status rocsolver_zsytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_csytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dsytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

SYTRF_STRIDED_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the blocked version of the algorithm).

The factorization has the form

$\begin{split} \begin{array}{cl} A_l^{} = U_l^{} D_l^{} U_l^T & \: \text{or}\\ A_l^{} = L_l^{} D_l^{} L_l^T & \end{array} \end{split}$

where $$U_l$$ or $$L_l$$ is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and $$D_l$$ is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks $$D_{kl}$$.

Specifically, $$U_l$$ and $$L_l$$ are computed as

$\begin{split} \begin{array}{cl} U_l = P_l(n) U_l(n) \cdots P_l(k) U_l(k) \cdots & \: \text{and}\\ L_l = P_l(1) L_l(1) \cdots P_l(k) L_l(k) \cdots & \end{array} \end{split}$

where $$k$$ decreases from $$n$$ to 1 (increases from 1 to $$n$$) in steps of 1 or 2, depending on the order of block $$D_{kl}$$, and $$P_l(k)$$ is a permutation matrix defined by $$ipiv_l[k]$$. If we let $$s$$ denote the order of block $$D_{kl}$$, then $$U_l(k)$$ and $$L_l(k)$$ are unit upper/lower triangular matrices defined as

$\begin{split} U_l(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}$

and

$\begin{split} L_l(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}$

If $$s = 1$$, then $$D_{kl}$$ is stored in $$A_l[k,k]$$ and $$v$$ is stored in the upper/lower part of column $$k$$ of $$A_l$$. If $$s = 2$$ and uplo is upper, then $$D_{kl}$$ is stored in $$A_l[k-1,k-1]$$, $$A_l[k-1,k]$$, and $$A_l[k,k]$$, and $$v$$ is stored in the upper parts of columns $$k-1$$ and $$k$$ of $$A_l$$. If $$s = 2$$ and uplo is lower, then $$D_l(k)$$ is stored in $$A_l[k,k]$$, $$A_l[k+1,k]$$, and $$A_l[k+1,k+1]$$, and $$v$$ is stored in the lower parts of columns $$k$$ and $$k+1$$ of $$A_l$$.

Parameters:
• handle[in] rocblas_handle.

• uplo[in] rocblas_fill. Specifies whether the upper or lower part of the matrices A_l are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

• n[in] rocblas_int. n >= 0. The number of rows and columns of all matrices A_l in the batch.

• A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the symmetric matrices A_l to be factored. On exit, the block diagonal matrices D_l and the multipliers needed to compute U_l or L_l.

• lda[in] rocblas_int. lda >= n. Specifies the leading dimension of matrices A_l.

• strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

• ipiv[out] pointer to rocblas_int. Array on the GPU of dimension n. The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_l[k] > 0 then rows and columns k and ipiv_l[k] were interchanged and D_l[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_l[k] = ipiv_l[k-1] < 0 and uplo is upper (or ipiv_l[k] = ipiv_l[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_l[k] (or rows and columns k+1 and -ipiv_l[k]) were interchanged and D_l[k-1,k-1] to D_l[k,k] (or D_l[k,k] to D_l[k+1,k+1]) is a 2-by-2 diagonal block.

• strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

• info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, D_l is singular. D_l[i,i] is the first diagonal zero.

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

## Orthogonal factorizations#

### rocsolver_<type>geqr2()#

rocblas_status rocsolver_zgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GEQR2 computes a QR factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

$\begin{split} A = Q\left[\begin{array}{c} R\\ 0 \end{array}\right] \end{split}$

where R is upper triangular (upper trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q = H(1)H(2)\cdots H(k), \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H(i)$$ is given by

$H(i) = I - \text{ipiv}[i] \cdot v_i^{} v_i'$

where the first i-1 elements of the Householder vector $$v_i$$ are zero, and $$v_i[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

• n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

• A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on and above the diagonal contain the factor R; the elements below the diagonal are the last m - i elements of Householder vector v_i.

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

• ipiv[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars.

### rocsolver_<type>geqr2_batched()#

rocblas_status rocsolver_zgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQR2_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix $$A_l$$ in the batch has the form

$\begin{split} A_l = Q_l\left[\begin{array}{c} R_l\\ 0 \end{array}\right] \end{split}$

where $$R_l$$ is upper triangular (upper trapezoidal if m < n), and $$Q_l$$ is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q_l = H_l(1)H_l(2)\cdots H_l(k), \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H_l(i)$$ is given by

$H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}'$

where the first i-1 elements of Householder vector $$v_{l_i}$$ are zero, and $$v_{l_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

• n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

• A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the diagonal contain the factor R_l. The elements below the diagonal are the last m - i elements of Householder vector v_(l_i).

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

• ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

• strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>geqr2_strided_batched()#

rocblas_status rocsolver_zgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQR2_STRIDED_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix $$A_l$$ in the batch has the form

$\begin{split} A_l = Q_l\left[\begin{array}{c} R_l\\ 0 \end{array}\right] \end{split}$

where $$R_l$$ is upper triangular (upper trapezoidal if m < n), and $$Q_l$$ is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q_l = H_l(1)H_l(2)\cdots H_l(k), \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H_l(i)$$ is given by

$H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}'$

where the first i-1 elements of Householder vector $$v_{l_i}$$ are zero, and $$v_{l_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

• n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

• A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the diagonal contain the factor R_l. The elements below the diagonal are the last m - i elements of Householder vector v_(l_i).

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

• strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

• ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

• strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>geqrf()#

rocblas_status rocsolver_zgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GEQRF computes a QR factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

$\begin{split} A = Q\left[\begin{array}{c} R\\ 0 \end{array}\right] \end{split}$

where R is upper triangular (upper trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q = H(1)H(2)\cdots H(k), \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H(i)$$ is given by

$H(i) = I - \text{ipiv}[i] \cdot v_i^{} v_i'$

where the first i-1 elements of the Householder vector $$v_i$$ are zero, and $$v_i[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

• n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

• A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on and above the diagonal contain the factor R; the elements below the diagonal are the last m - i elements of Householder vector v_i.

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

• ipiv[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars.

### rocsolver_<type>geqrf_batched()#

rocblas_status rocsolver_zgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQRF_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix $$A_l$$ in the batch has the form

$\begin{split} A_l = Q_l\left[\begin{array}{c} R_l\\ 0 \end{array}\right] \end{split}$

where $$R_l$$ is upper triangular (upper trapezoidal if m < n), and $$Q_l$$ is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q_l = H_l(1)H_l(2)\cdots H_l(k), \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H_l(i)$$ is given by

$H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}'$

where the first i-1 elements of Householder vector $$v_{l_i}$$ are zero, and $$v_{l_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

• n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

• A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the diagonal contain the factor R_l. The elements below the diagonal are the last m - i elements of Householder vector v_(l_i).

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

• ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

• strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>geqrf_strided_batched()#

rocblas_status rocsolver_zgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQRF_STRIDED_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix $$A_l$$ in the batch has the form

$\begin{split} A_l = Q_l\left[\begin{array}{c} R_l\\ 0 \end{array}\right] \end{split}$

where $$R_l$$ is upper triangular (upper trapezoidal if m < n), and $$Q_l$$ is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q_l = H_l(1)H_l(2)\cdots H_l(k), \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H_l(i)$$ is given by

$H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}'$

where the first i-1 elements of Householder vector $$v_{l_i}$$ are zero, and $$v_{l_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

• n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

• A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the diagonal contain the factor R_l. The elements below the diagonal are the last m - i elements of Householder vector v_(l_i).

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

• strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

• ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

• strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>gerq2()#

rocblas_status rocsolver_zgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GERQ2 computes a RQ factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

$A = \left[\begin{array}{cc} 0 & R \end{array}\right] Q$

where R is upper triangular (upper trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

$Q = H(1)'H(2)' \cdots H(k)', \quad \text{with} \: k = \text{min}(m,n).$

Each Householder matrix $$H(i)$$ is given by

$H(i) = I - \text{ipiv}[i] \cdot v_i^{} v_i'$

where the last n-i elements of the Householder vector $$v_i$$ are zero, and $$v_i[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

• n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

• A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_i.

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

• ipiv[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars.

### rocsolver_<type>gerq2_batched()#

rocblas_status rocsolver_zgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GERQ2_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix $$A_l$$ in the batch has the form

$A_l = \left[\begin{array}{cc} 0 & R_l \end{array}\right] Q_l$

where $$R_l$$ is upper triangular (upper trapezoidal if m > n), and $$Q_l$$ is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

$Q_l = H_l(1)'H_l(2)' \cdots H_l(k)', \quad \text{with} \: k = \text{min}(m,n).$

Each Householder matrices $$H_l(i)$$ is given by

$H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}'$

where the last n-i elements of Householder vector $$v_{l_i}$$ are zero, and $$v_{l_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

• n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

• A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_l; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

• ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

• strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>gerq2_strided_batched()#

rocblas_status rocsolver_zgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GERQ2_STRIDED_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix $$A_l$$ in the batch has the form

$A_l = \left[\begin{array}{cc} 0 & R_l \end{array}\right] Q_l$

where $$R_l$$ is upper triangular (upper trapezoidal if m > n), and $$Q_l$$ is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

$Q_l = H_l(1)'H_l(2)' \cdots H_l(k)', \quad \text{with} \: k = \text{min}(m,n).$

Each Householder matrices $$H_l(i)$$ is given by

$H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}'$

where the last n-i elements of Householder vector $$v_{l_i}$$ are zero, and $$v_{l_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

• n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

• A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_l; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

• strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

• ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

• strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>gerqf()#

rocblas_status rocsolver_zgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GERQF computes a RQ factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

$A = \left[\begin{array}{cc} 0 & R \end{array}\right] Q$

where R is upper triangular (upper trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

$Q = H(1)'H(2)' \cdots H(k)', \quad \text{with} \: k = \text{min}(m,n).$

Each Householder matrix $$H(i)$$ is given by

$H(i) = I - \text{ipiv}[i] \cdot v_i^{} v_i'$

where the last n-i elements of the Householder vector $$v_i$$ are zero, and $$v_i[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

• n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

• A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_i.

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

• ipiv[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars.

### rocsolver_<type>gerqf_batched()#

rocblas_status rocsolver_zgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GERQF_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix $$A_l$$ in the batch has the form

$A_l = \left[\begin{array}{cc} 0 & R_l \end{array}\right] Q_l$

where $$R_l$$ is upper triangular (upper trapezoidal if m > n), and $$Q_l$$ is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

$Q_l = H_l(1)'H_l(2)' \cdots H_l(k)', \quad \text{with} \: k = \text{min}(m,n).$

Each Householder matrices $$H_l(i)$$ is given by

$H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}'$

where the last n-i elements of Householder vector $$v_{l_i}$$ are zero, and $$v_{l_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

• n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

• A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_l; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

• ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

• strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>gerqf_strided_batched()#

rocblas_status rocsolver_zgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GERQF_STRIDED_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix $$A_l$$ in the batch has the form

$A_l = \left[\begin{array}{cc} 0 & R_l \end{array}\right] Q_l$

where $$R_l$$ is upper triangular (upper trapezoidal if m > n), and $$Q_l$$ is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

$Q_l = H_l(1)'H_l(2)' \cdots H_l(k)', \quad \text{with} \: k = \text{min}(m,n).$

Each Householder matrices $$H_l(i)$$ is given by

$H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}'$

where the last n-i elements of Householder vector $$v_{l_i}$$ are zero, and $$v_{l_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

• n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

• A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_l; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

• strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

• ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

• strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>geql2()#

rocblas_status rocsolver_zgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GEQL2 computes a QL factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

$\begin{split} A = Q\left[\begin{array}{c} 0\\ L \end{array}\right] \end{split}$

where L is lower triangular (lower trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q = H(k)H(k-1)\cdots H(1), \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H(i)$$ is given by

$H(i) = I - \text{ipiv}[i] \cdot v_i^{} v_i'$

where the last m-i elements of the Householder vector $$v_i$$ are zero, and $$v_i[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

• n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

• A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_i.

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

• ipiv[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars.

### rocsolver_<type>geql2_batched()#

rocblas_status rocsolver_zgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQL2_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix $$A_l$$ in the batch has the form

$\begin{split} A_l = Q_l\left[\begin{array}{c} 0\\ L_l \end{array}\right] \end{split}$

where $$L_l$$ is lower triangular (lower trapezoidal if m < n), and $$Q_l$$ is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q_l = H_l(k)H_l(k-1)\cdots H_l(1), \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H_l(i)$$ is given by

$H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}'$

where the last m-i elements of the Householder vector $$v_{l_i}$$ are zero, and $$v_{l_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

• n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

• A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L_l; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

• ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

• strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>geql2_strided_batched()#

rocblas_status rocsolver_zgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQL2_STRIDED_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix $$A_l$$ in the batch has the form

$\begin{split} A_l = Q_l\left[\begin{array}{c} 0\\ L_l \end{array}\right] \end{split}$

where $$L_l$$ is lower triangular (lower trapezoidal if m < n), and $$Q_l$$ is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q_l = H_l(k)H_l(k-1)\cdots H_l(1), \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H_l(i)$$ is given by

$H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}'$

where the last m-i elements of the Householder vector $$v_{l_i}$$ are zero, and $$v_{l_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

• n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

• A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L_l; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

• lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

• strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

• ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

• strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

### rocsolver_<type>geqlf()#

rocblas_status rocsolver_zgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgeqlf(rocblas_handle handle, const rocblas_int m, const