rocSOLVER LAPACK Functions#

LAPACK routines solve complex Numerical Linear Algebra problems. These functions are organized in the following categories:

Note

Throughout the APIs’ descriptions, we use the following notations:

  • i, j, and k are used as general purpose indices. In some legacy LAPACK APIs, k could be a parameter indicating some problem/matrix dimension.

  • Depending on the context, when it is necessary to index rows, columns and blocks or submatrices, i is assigned to rows, j to columns and k to blocks. \(l\) is always used to index matrices/problems in a batch.

  • x[i] stands for the i-th element of vector x, while A[i,j] represents the element in the i-th row and j-th column of matrix A. Indices are 1-based, i.e. x[1] is the first element of x.

  • To identify a block in a matrix or a matrix in the batch, k and \(l\) are used as sub-indices

  • x_i \(=x_i\); we sometimes use both notations, \(x_i\) when displaying mathematical equations, and x_i in the text describing the function parameters.

  • If X is a real vector or matrix, \(X^T\) indicates its transpose; if X is complex, then \(X^H\) represents its conjugate transpose. When X could be real or complex, we use X’ to indicate X transposed or X conjugate transposed, accordingly.

  • When a matrix A is formed as the product of several matrices, the following notation is used: A=M(1)M(2)…M(t).

Triangular factorizations#

rocsolver_<type>potf2()#

rocblas_status rocsolver_zpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_cpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_dpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_spotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)#

POTF2 computes the Cholesky factorization of a real symmetric (complex Hermitian) positive definite matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form:

\[\begin{split} \begin{array}{cl} A = U'U & \: \text{if uplo is upper, or}\\ A = LL' & \: \text{if uplo is lower.} \end{array} \end{split}\]

U is an upper triangular matrix and L is lower triangular.

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the matrix A to be factored. On exit, the lower or upper triangular factor.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A.

  • info[out] pointer to a rocblas_int on the GPU. If info = 0, successful factorization of matrix A. If info = i > 0, the leading minor of order i of A is not positive definite. The factorization stopped at this point.

rocsolver_<type>potf2_batched()#

rocblas_status rocsolver_zpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_spotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#

POTF2_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form:

\[\begin{split} \begin{array}{cl} A_l^{} = U_l'U_l^{} & \: \text{if uplo is upper, or}\\ A_l^{} = L_l^{}L_l' & \: \text{if uplo is lower.} \end{array} \end{split}\]

\(U_l\) is an upper triangular matrix and \(L_l\) is lower triangular.

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of matrix A_l.

  • A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the matrices A_l to be factored. On exit, the upper or lower triangular factors.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A_l.

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful factorization of matrix A_l. If info[l] = i > 0, the leading minor of order i of A_l is not positive definite. The l-th factorization stopped at this point.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>potf2_strided_batched()#

rocblas_status rocsolver_zpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_spotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#

POTF2_STRIDED_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form:

\[\begin{split} \begin{array}{cl} A_l^{} = U_l'U_l^{} & \: \text{if uplo is upper, or}\\ A_l^{} = L_l^{}L_l' & \: \text{if uplo is lower.} \end{array} \end{split}\]

\(U_l\) is an upper triangular matrix and \(L_l\) is lower triangular.

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of matrix A_l.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the matrices A_l to be factored. On exit, the upper or lower triangular factors.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful factorization of matrix A_l. If info[l] = i > 0, the leading minor of order i of A_l is not positive definite. The l-th factorization stopped at this point.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>potrf()#

rocblas_status rocsolver_zpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_cpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_dpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_spotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)#

POTRF computes the Cholesky factorization of a real symmetric (complex Hermitian) positive definite matrix A.

(This is the blocked version of the algorithm).

The factorization has the form:

\[\begin{split} \begin{array}{cl} A = U'U & \: \text{if uplo is upper, or}\\ A = LL' & \: \text{if uplo is lower.} \end{array} \end{split}\]

U is an upper triangular matrix and L is lower triangular.

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the matrix A to be factored. On exit, the lower or upper triangular factor.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A.

  • info[out] pointer to a rocblas_int on the GPU. If info = 0, successful factorization of matrix A. If info = i > 0, the leading minor of order i of A is not positive definite. The factorization stopped at this point.

rocsolver_<type>potrf_batched()#

rocblas_status rocsolver_zpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_spotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#

POTRF_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form:

\[\begin{split} \begin{array}{cl} A_l^{} = U_l'U_l^{} & \: \text{if uplo is upper, or}\\ A_l^{} = L_l^{}L_l' & \: \text{if uplo is lower.} \end{array} \end{split}\]

\(U_l\) is an upper triangular matrix and \(L_l\) is lower triangular.

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of matrix A_l.

  • A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the matrices A_l to be factored. On exit, the upper or lower triangular factors.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A_l.

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful factorization of matrix A_l. If info[l] = i > 0, the leading minor of order i of A_l is not positive definite. The l-th factorization stopped at this point.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>potrf_strided_batched()#

rocblas_status rocsolver_zpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_spotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#

POTRF_STRIDED_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form:

\[\begin{split} \begin{array}{cl} A_l^{} = U_l'U_l^{} & \: \text{if uplo is upper, or}\\ A_l^{} = L_l^{}L_l' & \: \text{if uplo is lower.} \end{array} \end{split}\]

\(U_l\) is an upper triangular matrix and \(L_l\) is lower triangular.

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of matrix A_l.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the matrices A_l to be factored. On exit, the upper or lower triangular factors.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful factorization of matrix A_l. If info[l] = i > 0, the leading minor of order i of A_l is not positive definite. The l-th factorization stopped at this point.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>getf2()#

rocblas_status rocsolver_zgetf2_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_double_complex *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_cgetf2_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_float_complex *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_dgetf2_64(rocblas_handle handle, const int64_t m, const int64_t n, double *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_sgetf2_64(rocblas_handle handle, const int64_t m, const int64_t n, float *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_zgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_cgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_dgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_sgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#

GETF2 computes the LU factorization of a general m-by-n matrix A using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization has the form

\[ A = PLU \]

where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

  • ipiv[out] pointer to rocblas_int. Array on the GPU of dimension min(m,n). The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= i <= min(m,n), the row i of the matrix was interchanged with row ipiv[i]. Matrix P of the factorization can be derived from ipiv.

  • info[out] pointer to a rocblas_int on the GPU. If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero pivot.

rocsolver_<type>getf2_batched()#

rocblas_status rocsolver_zgetf2_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_double_complex *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_cgetf2_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_float_complex *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_dgetf2_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, double *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_sgetf2_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, float *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_zgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

GETF2_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = P_lL_lU_l \]

where \(P_l\) is a permutation matrix, \(L_l\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_l\) is upper triangular (upper trapezoidal if m < n).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all matrices A_l in the batch.

  • A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the factors L_l and U_l from the factorizations. The unit diagonal elements of L_l are not stored.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP). Contains the vectors of pivot indices ipiv_l (corresponding to A_l). Dimension of ipiv_l is min(m,n). Elements of ipiv_l are 1-based indices. For each instance A_l in the batch and for 1 <= i <= min(m,n), the row i of the matrix A_l was interchanged with row ipiv_l[i]. Matrix P_l of the factorization can be derived from ipiv_l.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, U_l is singular. U_l[i,i] is the first zero pivot.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>getf2_strided_batched()#

rocblas_status rocsolver_zgetf2_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_double_complex *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_cgetf2_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_float_complex *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_dgetf2_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, double *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_sgetf2_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, float *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_zgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

GETF2_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = P_lL_lU_l \]

where \(P_l\) is a permutation matrix, \(L_l\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_l\) is upper triangular (upper trapezoidal if m < n).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the factors L_l and U_l from the factorization. The unit diagonal elements of L_l are not stored.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • ipiv[out] pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP). Contains the vectors of pivots indices ipiv_l (corresponding to A_l). Dimension of ipiv_l is min(m,n). Elements of ipiv_l are 1-based indices. For each instance A_l in the batch and for 1 <= i <= min(m,n), the row i of the matrix A_l was interchanged with row ipiv_l[i]. Matrix P_l of the factorization can be derived from ipiv_l.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, U_l is singular. U_l[i,i] is the first zero pivot.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>getrf()#

rocblas_status rocsolver_zgetrf_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_double_complex *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_cgetrf_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_float_complex *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_dgetrf_64(rocblas_handle handle, const int64_t m, const int64_t n, double *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_sgetrf_64(rocblas_handle handle, const int64_t m, const int64_t n, float *A, const int64_t lda, int64_t *ipiv, int64_t *info)#
rocblas_status rocsolver_zgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_cgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_dgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_sgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#

GETRF computes the LU factorization of a general m-by-n matrix A using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization has the form

\[ A = PLU \]

where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

  • ipiv[out] pointer to rocblas_int. Array on the GPU of dimension min(m,n). The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= i <= min(m,n), the row i of the matrix was interchanged with row ipiv[i]. Matrix P of the factorization can be derived from ipiv.

  • info[out] pointer to a rocblas_int on the GPU. If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero pivot.

rocsolver_<type>getrf_batched()#

rocblas_status rocsolver_zgetrf_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_double_complex *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_cgetrf_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_float_complex *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_dgetrf_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, double *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_sgetrf_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, float *const A[], const int64_t lda, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_zgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

GETRF_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = P_lL_lU_l \]

where \(P_l\) is a permutation matrix, \(L_l\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_l\) is upper triangular (upper trapezoidal if m < n).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all matrices A_l in the batch.

  • A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the factors L_l and U_l from the factorizations. The unit diagonal elements of L_l are not stored.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP). Contains the vectors of pivot indices ipiv_l (corresponding to A_l). Dimension of ipiv_l is min(m,n). Elements of ipiv_l are 1-based indices. For each instance A_l in the batch and for 1 <= i <= min(m,n), the row i of the matrix A_l was interchanged with row ipiv_l[i]. Matrix P_l of the factorization can be derived from ipiv_l.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, U_l is singular. U_l[i,i] is the first zero pivot.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>getrf_strided_batched()#

rocblas_status rocsolver_zgetrf_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_double_complex *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_cgetrf_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, rocblas_float_complex *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_dgetrf_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, double *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_sgetrf_strided_batched_64(rocblas_handle handle, const int64_t m, const int64_t n, float *A, const int64_t lda, const rocblas_stride strideA, int64_t *ipiv, const rocblas_stride strideP, int64_t *info, const int64_t batch_count)#
rocblas_status rocsolver_zgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

GETRF_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = P_lL_lU_l \]

where \(P_l\) is a permutation matrix, \(L_l\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_l\) is upper triangular (upper trapezoidal if m < n).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the factors L_l and U_l from the factorization. The unit diagonal elements of L_l are not stored.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • ipiv[out] pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP). Contains the vectors of pivots indices ipiv_l (corresponding to A_l). Dimension of ipiv_l is min(m,n). Elements of ipiv_l are 1-based indices. For each instance A_l in the batch and for 1 <= i <= min(m,n), the row i of the matrix A_l was interchanged with row ipiv_l[i]. Matrix P_l of the factorization can be derived from ipiv_l.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, U_l is singular. U_l[i,i] is the first zero pivot.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>sytf2()#

rocblas_status rocsolver_zsytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_csytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_dsytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_ssytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#

SYTF2 computes the factorization of a symmetric indefinite matrix \(A\) using Bunch-Kaufman diagonal pivoting.

(This is the unblocked version of the algorithm).

The factorization has the form

\[\begin{split} \begin{array}{cl} A = U D U^T & \: \text{or}\\ A = L D L^T & \end{array} \end{split}\]

where \(U\) or \(L\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D\) is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks \(D_k\).

Specifically, \(U\) and \(L\) are computed as

\[\begin{split} \begin{array}{cl} U = P(n) U(n) \cdots P(k) U(k) \cdots & \: \text{and}\\ L = P(1) L(1) \cdots P(k) L(k) \cdots & \end{array} \end{split}\]

where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_k\), and \(P(k)\) is a permutation matrix defined by \(ipiv[k]\). If we let \(s\) denote the order of block \(D_k\), then \(U(k)\) and \(L(k)\) are unit upper/lower triangular matrices defined as

\[\begin{split} U(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}\]

and

\[\begin{split} L(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}\]

If \(s = 1\), then \(D_k\) is stored in \(A[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A\). If \(s = 2\) and uplo is upper, then \(D_k\) is stored in \(A[k-1,k-1]\), \(A[k-1,k]\), and \(A[k,k]\), and \(v\) is stored in the upper parts of columns \(k-1\) and \(k\) of \(A\). If \(s = 2\) and uplo is lower, then \(D_k\) is stored in \(A[k,k]\), \(A[k+1,k]\), and \(A[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the symmetric matrix A to be factored. On exit, the block diagonal matrix D and the multipliers needed to compute U or L.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A.

  • ipiv[out] pointer to rocblas_int. Array on the GPU of dimension n. The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv[k] > 0 then rows and columns k and ipiv[k] were interchanged and D[k,k] is a 1-by-1 diagonal block. If, instead, ipiv[k] = ipiv[k-1] < 0 and uplo is upper (or ipiv[k] = ipiv[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv[k] (or rows and columns k+1 and -ipiv[k]) were interchanged and D[k-1,k-1] to D[k,k] (or D[k,k] to D[k+1,k+1]) is a 2-by-2 diagonal block.

  • info[out] pointer to a rocblas_int on the GPU. If info = 0, successful exit. If info = i > 0, D is singular. D[i,i] is the first diagonal zero.

rocsolver_<type>sytf2_batched()#

rocblas_status rocsolver_zsytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_csytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dsytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

SYTF2_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the unblocked version of the algorithm).

The factorization has the form

\[\begin{split} \begin{array}{cl} A_l^{} = U_l^{} D_l^{} U_l^T & \: \text{or}\\ A_l^{} = L_l^{} D_l^{} L_l^T & \end{array} \end{split}\]

where \(U_l\) or \(L_l\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D_l\) is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks \(D_{kl}\).

Specifically, \(U_l\) and \(L_l\) are computed as

\[\begin{split} \begin{array}{cl} U_l = P_l(n) U_l(n) \cdots P_l(k) U_l(k) \cdots & \: \text{and}\\ L_l = P_l(1) L_l(1) \cdots P_l(k) L_l(k) \cdots & \end{array} \end{split}\]

where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_{kl}\), and \(P_l(k)\) is a permutation matrix defined by \(ipiv_l[k]\). If we let \(s\) denote the order of block \(D_{kl}\), then \(U_l(k)\) and \(L_l(k)\) are unit upper/lower triangular matrices defined as

\[\begin{split} U_l(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}\]

and

\[\begin{split} L_l(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}\]

If \(s = 1\), then \(D_{kl}\) is stored in \(A_l[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A_l\). If \(s = 2\) and uplo is upper, then \(D_{kl}\) is stored in \(A_l[k-1,k-1]\), \(A_l[k-1,k]\), and \(A_l[k,k]\), and \(v\) is stored in the upper parts of columns \(k-1\) and \(k\) of \(A_l\). If \(s = 2\) and uplo is lower, then \(D_{kl}\) is stored in \(A_l[k,k]\), \(A_l[k+1,k]\), and \(A_l[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A_l\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the matrices A_l are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of all matrices A_l in the batch.

  • A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the symmetric matrices A_l to be factored. On exit, the block diagonal matrices D_l and the multipliers needed to compute U_l or L_l.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to rocblas_int. Array on the GPU of dimension n. The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_l[k] > 0 then rows and columns k and ipiv_l[k] were interchanged and D_l[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_l[k] = ipiv_l[k-1] < 0 and uplo is upper (or ipiv_l[k] = ipiv_l[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_l[k] (or rows and columns k+1 and -ipiv_l[k]) were interchanged and D_l[k-1,k-1] to D_l[k,k] (or D_l[k,k] to D_l[k+1,k+1]) is a 2-by-2 diagonal block.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, D_l is singular. D_l[i,i] is the first diagonal zero.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>sytf2_strided_batched()#

rocblas_status rocsolver_zsytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_csytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dsytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

SYTF2_STRIDED_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the unblocked version of the algorithm).

The factorization has the form

\[\begin{split} \begin{array}{cl} A_l^{} = U_l^{} D_l^{} U_l^T & \: \text{or}\\ A_l^{} = L_l^{} D_l^{} L_l^T & \end{array} \end{split}\]

where \(U_l\) or \(L_l\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D_l\) is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks \(D_{kl}\).

Specifically, \(U_l\) and \(L_l\) are computed as

\[\begin{split} \begin{array}{cl} U_l = P_l(n) U_l(n) \cdots P_l(k) U_l(k) \cdots & \: \text{and}\\ L_l = P_l(1) L_l(1) \cdots P_l(k) L_l(k) \cdots & \end{array} \end{split}\]

where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_{kl}\), and \(P_l(k)\) is a permutation matrix defined by \(ipiv_l[k]\). If we let \(s\) denote the order of block \(D_{kl}\), then \(U_l(k)\) and \(L_l(k)\) are unit upper/lower triangular matrices defined as

\[\begin{split} U_l(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}\]

and

\[\begin{split} L_l(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}\]

If \(s = 1\), then \(D_{kl}\) is stored in \(A_l[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A_l\). If \(s = 2\) and uplo is upper, then \(D_{kl}\) is stored in \(A_l[k-1,k-1]\), \(A_l[k-1,k]\), and \(A_l[k,k]\), and \(v\) is stored in the upper parts of columns \(k-1\) and \(k\) of \(A_l\). If \(s = 2\) and uplo is lower, then \(D_{kl}\) is stored in \(A_l[k,k]\), \(A_l[k+1,k]\), and \(A_l[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A_l\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the matrices A_l are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of all matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the symmetric matrices A_l to be factored. On exit, the block diagonal matrices D_l and the multipliers needed to compute U_l or L_l.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • ipiv[out] pointer to rocblas_int. Array on the GPU of dimension n. The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_l[k] > 0 then rows and columns k and ipiv_l[k] were interchanged and D_l[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_l[k] = ipiv_l[k-1] < 0 and uplo is upper (or ipiv_l[k] = ipiv_l[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_l[k] (or rows and columns k+1 and -ipiv_l[k]) were interchanged and D_l[k-1,k-1] to D_l[k,k] (or D_l[k,k] to D_l[k+1,k+1]) is a 2-by-2 diagonal block.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, D_l is singular. D_l[i,i] is the first diagonal zero.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>sytrf()#

rocblas_status rocsolver_zsytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_csytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_dsytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_ssytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#

SYTRF computes the factorization of a symmetric indefinite matrix \(A\) using Bunch-Kaufman diagonal pivoting.

(This is the blocked version of the algorithm).

The factorization has the form

\[\begin{split} \begin{array}{cl} A = U D U^T & \: \text{or}\\ A = L D L^T & \end{array} \end{split}\]

where \(U\) or \(L\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D\) is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks \(D_k\).

Specifically, \(U\) and \(L\) are computed as

\[\begin{split} \begin{array}{cl} U = P(n) U(n) \cdots P(k) U(k) \cdots & \: \text{and}\\ L = P(1) L(1) \cdots P(k) L(k) \cdots & \end{array} \end{split}\]

where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_k\), and \(P(k)\) is a permutation matrix defined by \(ipiv[k]\). If we let \(s\) denote the order of block \(D_k\), then \(U(k)\) and \(L(k)\) are unit upper/lower triangular matrices defined as

\[\begin{split} U(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}\]

and

\[\begin{split} L(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}\]

If \(s = 1\), then \(D_k\) is stored in \(A[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A\). If \(s = 2\) and uplo is upper, then \(D_k\) is stored in \(A[k-1,k-1]\), \(A[k-1,k]\), and \(A[k,k]\), and \(v\) is stored in the upper parts of columns \(k-1\) and \(k\) of \(A\). If \(s = 2\) and uplo is lower, then \(D_k\) is stored in \(A[k,k]\), \(A[k+1,k]\), and \(A[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the symmetric matrix A to be factored. On exit, the block diagonal matrix D and the multipliers needed to compute U or L.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of A.

  • ipiv[out] pointer to rocblas_int. Array on the GPU of dimension n. The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv[k] > 0 then rows and columns k and ipiv[k] were interchanged and D[k,k] is a 1-by-1 diagonal block. If, instead, ipiv[k] = ipiv[k-1] < 0 and uplo is upper (or ipiv[k] = ipiv[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv[k] (or rows and columns k+1 and -ipiv[k]) were interchanged and D[k-1,k-1] to D[k,k] (or D[k,k] to D[k+1,k+1]) is a 2-by-2 diagonal block.

  • info[out] pointer to a rocblas_int on the GPU. If info = 0, successful exit. If info = i > 0, D is singular. D[i,i] is the first diagonal zero.

rocsolver_<type>sytrf_batched()#

rocblas_status rocsolver_zsytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_csytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dsytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

SYTRF_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the blocked version of the algorithm).

The factorization has the form

\[\begin{split} \begin{array}{cl} A_l^{} = U_l^{} D_l^{} U_l^T & \: \text{or}\\ A_l^{} = L_l^{} D_l^{} L_l^T & \end{array} \end{split}\]

where \(U_l\) or \(L_l\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D_l\) is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks \(D_{kl}\).

Specifically, \(U_l\) and \(L_l\) are computed as

\[\begin{split} \begin{array}{cl} U_l = P_l(n) U_l(n) \cdots P_l(k) U_l(k) \cdots & \: \text{and}\\ L_l = P_l(1) L_l(1) \cdots P_l(k) L_l(k) \cdots & \end{array} \end{split}\]

where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_{kl}\), and \(P_l(k)\) is a permutation matrix defined by \(ipiv_l[k]\). If we let \(s\) denote the order of block \(D_{kl}\), then \(U_l(k)\) and \(L_l(k)\) are unit upper/lower triangular matrices defined as

\[\begin{split} U_l(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}\]

and

\[\begin{split} L_l(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}\]

If \(s = 1\), then \(D_{kl}\) is stored in \(A_l[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A_l\). If \(s = 2\) and uplo is upper, then \(D_{kl}\) is stored in \(A_l[k-1,k-1]\), \(A_l[k-1,k]\), and \(A_l[k,k]\), and \(v\) is stored in the upper parts of columns \(k-1\) and \(k\) of \(A_l\). If \(s = 2\) and uplo is lower, then \(D_{kl}\) is stored in \(A_l[k,k]\), \(A_l[k+1,k]\), and \(A_l[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A_l\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the matrices A_l are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of all matrices A_l in the batch.

  • A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the symmetric matrices A_l to be factored. On exit, the block diagonal matrices D_l and the multipliers needed to compute U_l or L_l.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to rocblas_int. Array on the GPU of dimension n. The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_l[k] > 0 then rows and columns k and ipiv_l[k] were interchanged and D_l[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_l[k] = ipiv_l[k-1] < 0 and uplo is upper (or ipiv_l[k] = ipiv_l[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_l[k] (or rows and columns k+1 and -ipiv_l[k]) were interchanged and D_l[k-1,k-1] to D_l[k,k] (or D_l[k,k] to D_l[k+1,k+1]) is a 2-by-2 diagonal block.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, D_l is singular. D_l[i,i] is the first diagonal zero.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>sytrf_strided_batched()#

rocblas_status rocsolver_zsytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_csytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dsytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

SYTRF_STRIDED_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the blocked version of the algorithm).

The factorization has the form

\[\begin{split} \begin{array}{cl} A_l^{} = U_l^{} D_l^{} U_l^T & \: \text{or}\\ A_l^{} = L_l^{} D_l^{} L_l^T & \end{array} \end{split}\]

where \(U_l\) or \(L_l\) is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and \(D_l\) is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks \(D_{kl}\).

Specifically, \(U_l\) and \(L_l\) are computed as

\[\begin{split} \begin{array}{cl} U_l = P_l(n) U_l(n) \cdots P_l(k) U_l(k) \cdots & \: \text{and}\\ L_l = P_l(1) L_l(1) \cdots P_l(k) L_l(k) \cdots & \end{array} \end{split}\]

where \(k\) decreases from \(n\) to 1 (increases from 1 to \(n\)) in steps of 1 or 2, depending on the order of block \(D_{kl}\), and \(P_l(k)\) is a permutation matrix defined by \(ipiv_l[k]\). If we let \(s\) denote the order of block \(D_{kl}\), then \(U_l(k)\) and \(L_l(k)\) are unit upper/lower triangular matrices defined as

\[\begin{split} U_l(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}\]

and

\[\begin{split} L_l(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}\]

If \(s = 1\), then \(D_{kl}\) is stored in \(A_l[k,k]\) and \(v\) is stored in the upper/lower part of column \(k\) of \(A_l\). If \(s = 2\) and uplo is upper, then \(D_{kl}\) is stored in \(A_l[k-1,k-1]\), \(A_l[k-1,k]\), and \(A_l[k,k]\), and \(v\) is stored in the upper parts of columns \(k-1\) and \(k\) of \(A_l\). If \(s = 2\) and uplo is lower, then \(D_l(k)\) is stored in \(A_l[k,k]\), \(A_l[k+1,k]\), and \(A_l[k+1,k+1]\), and \(v\) is stored in the lower parts of columns \(k\) and \(k+1\) of \(A_l\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the matrices A_l are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of all matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the symmetric matrices A_l to be factored. On exit, the block diagonal matrices D_l and the multipliers needed to compute U_l or L_l.

  • lda[in] rocblas_int. lda >= n. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • ipiv[out] pointer to rocblas_int. Array on the GPU of dimension n. The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_l[k] > 0 then rows and columns k and ipiv_l[k] were interchanged and D_l[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_l[k] = ipiv_l[k-1] < 0 and uplo is upper (or ipiv_l[k] = ipiv_l[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_l[k] (or rows and columns k+1 and -ipiv_l[k]) were interchanged and D_l[k-1,k-1] to D_l[k,k] (or D_l[k,k] to D_l[k+1,k+1]) is a 2-by-2 diagonal block.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • info[out] pointer to rocblas_int. Array of batch_count integers on the GPU. If info[l] = 0, successful exit for factorization of A_l. If info[l] = i > 0, D_l is singular. D_l[i,i] is the first diagonal zero.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

Orthogonal factorizations#

rocsolver_<type>geqr2()#

rocblas_status rocsolver_zgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GEQR2 computes a QR factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

\[\begin{split} A = Q\left[\begin{array}{c} R\\ 0 \end{array}\right] \end{split}\]

where R is upper triangular (upper trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H(1)H(2)\cdots H(k), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H(i)\) is given by

\[ H(i) = I - \text{ipiv}[i] \cdot v_i^{} v_i' \]

where the first i-1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on and above the diagonal contain the factor R; the elements below the diagonal are the last m - i elements of Householder vector v_i.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

  • ipiv[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars.

rocsolver_<type>geqr2_batched()#

rocblas_status rocsolver_zgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQR2_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[\begin{split} A_l = Q_l\left[\begin{array}{c} R_l\\ 0 \end{array}\right] \end{split}\]

where \(R_l\) is upper triangular (upper trapezoidal if m < n), and \(Q_l\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(1)H_l(2)\cdots H_l(k), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the first i-1 elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the diagonal contain the factor R_l. The elements below the diagonal are the last m - i elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>geqr2_strided_batched()#

rocblas_status rocsolver_zgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQR2_STRIDED_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[\begin{split} A_l = Q_l\left[\begin{array}{c} R_l\\ 0 \end{array}\right] \end{split}\]

where \(R_l\) is upper triangular (upper trapezoidal if m < n), and \(Q_l\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(1)H_l(2)\cdots H_l(k), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the first i-1 elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the diagonal contain the factor R_l. The elements below the diagonal are the last m - i elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>geqrf()#

rocblas_status rocsolver_zgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GEQRF computes a QR factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

\[\begin{split} A = Q\left[\begin{array}{c} R\\ 0 \end{array}\right] \end{split}\]

where R is upper triangular (upper trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H(1)H(2)\cdots H(k), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H(i)\) is given by

\[ H(i) = I - \text{ipiv}[i] \cdot v_i^{} v_i' \]

where the first i-1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on and above the diagonal contain the factor R; the elements below the diagonal are the last m - i elements of Householder vector v_i.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

  • ipiv[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars.

rocsolver_<type>geqrf_batched()#

rocblas_status rocsolver_zgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQRF_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[\begin{split} A_l = Q_l\left[\begin{array}{c} R_l\\ 0 \end{array}\right] \end{split}\]

where \(R_l\) is upper triangular (upper trapezoidal if m < n), and \(Q_l\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(1)H_l(2)\cdots H_l(k), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the first i-1 elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the diagonal contain the factor R_l. The elements below the diagonal are the last m - i elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>geqrf_strided_batched()#

rocblas_status rocsolver_zgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQRF_STRIDED_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[\begin{split} A_l = Q_l\left[\begin{array}{c} R_l\\ 0 \end{array}\right] \end{split}\]

where \(R_l\) is upper triangular (upper trapezoidal if m < n), and \(Q_l\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(1)H_l(2)\cdots H_l(k), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the first i-1 elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the diagonal contain the factor R_l. The elements below the diagonal are the last m - i elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>gerq2()#

rocblas_status rocsolver_zgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GERQ2 computes a RQ factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

\[ A = \left[\begin{array}{cc} 0 & R \end{array}\right] Q \]

where R is upper triangular (upper trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H(1)'H(2)' \cdots H(k)', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrix \(H(i)\) is given by

\[ H(i) = I - \text{ipiv}[i] \cdot v_i^{} v_i' \]

where the last n-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_i.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

  • ipiv[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars.

rocsolver_<type>gerq2_batched()#

rocblas_status rocsolver_zgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GERQ2_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = \left[\begin{array}{cc} 0 & R_l \end{array}\right] Q_l \]

where \(R_l\) is upper triangular (upper trapezoidal if m > n), and \(Q_l\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(1)'H_l(2)' \cdots H_l(k)', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the last n-i elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_l; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>gerq2_strided_batched()#

rocblas_status rocsolver_zgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GERQ2_STRIDED_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = \left[\begin{array}{cc} 0 & R_l \end{array}\right] Q_l \]

where \(R_l\) is upper triangular (upper trapezoidal if m > n), and \(Q_l\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(1)'H_l(2)' \cdots H_l(k)', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the last n-i elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_l; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>gerqf()#

rocblas_status rocsolver_zgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GERQF computes a RQ factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

\[ A = \left[\begin{array}{cc} 0 & R \end{array}\right] Q \]

where R is upper triangular (upper trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H(1)'H(2)' \cdots H(k)', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrix \(H(i)\) is given by

\[ H(i) = I - \text{ipiv}[i] \cdot v_i^{} v_i' \]

where the last n-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_i.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

  • ipiv[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars.

rocsolver_<type>gerqf_batched()#

rocblas_status rocsolver_zgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GERQF_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = \left[\begin{array}{cc} 0 & R_l \end{array}\right] Q_l \]

where \(R_l\) is upper triangular (upper trapezoidal if m > n), and \(Q_l\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(1)'H_l(2)' \cdots H_l(k)', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the last n-i elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_l; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>gerqf_strided_batched()#

rocblas_status rocsolver_zgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GERQF_STRIDED_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = \left[\begin{array}{cc} 0 & R_l \end{array}\right] Q_l \]

where \(R_l\) is upper triangular (upper trapezoidal if m > n), and \(Q_l\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(1)'H_l(2)' \cdots H_l(k)', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the last n-i elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_l; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>geql2()#

rocblas_status rocsolver_zgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GEQL2 computes a QL factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

\[\begin{split} A = Q\left[\begin{array}{c} 0\\ L \end{array}\right] \end{split}\]

where L is lower triangular (lower trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H(k)H(k-1)\cdots H(1), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H(i)\) is given by

\[ H(i) = I - \text{ipiv}[i] \cdot v_i^{} v_i' \]

where the last m-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_i.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

  • ipiv[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars.

rocsolver_<type>geql2_batched()#

rocblas_status rocsolver_zgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQL2_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[\begin{split} A_l = Q_l\left[\begin{array}{c} 0\\ L_l \end{array}\right] \end{split}\]

where \(L_l\) is lower triangular (lower trapezoidal if m < n), and \(Q_l\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(k)H_l(k-1)\cdots H_l(1), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the last m-i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L_l; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>geql2_strided_batched()#

rocblas_status rocsolver_zgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQL2_STRIDED_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[\begin{split} A_l = Q_l\left[\begin{array}{c} 0\\ L_l \end{array}\right] \end{split}\]

where \(L_l\) is lower triangular (lower trapezoidal if m < n), and \(Q_l\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(k)H_l(k-1)\cdots H_l(1), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the last m-i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L_l; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>geqlf()#

rocblas_status rocsolver_zgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GEQLF computes a QL factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

\[\begin{split} A = Q\left[\begin{array}{c} 0\\ L \end{array}\right] \end{split}\]

where L is lower triangular (lower trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H(k)H(k-1)\cdots H(1), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H(i)\) is given by

\[ H(i) = I - \text{ipiv}[i] \cdot v_i^{} v_i' \]

where the last m-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_i.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

  • ipiv[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars.

rocsolver_<type>geqlf_batched()#

rocblas_status rocsolver_zgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQLF_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[\begin{split} A_l = Q_l\left[\begin{array}{c} 0\\ L_l \end{array}\right] \end{split}\]

where \(L_l\) is lower triangular (lower trapezoidal if m < n), and \(Q_l\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(k)H_l(k-1)\cdots H_l(1), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the last m-i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L_l; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>geqlf_strided_batched()#

rocblas_status rocsolver_zgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQLF_STRIDED_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[\begin{split} A_l = Q_l\left[\begin{array}{c} 0\\ L_l \end{array}\right] \end{split}\]

where \(L_l\) is lower triangular (lower trapezoidal if m < n), and \(Q_l\) is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(k)H_l(k-1)\cdots H_l(1), \quad \text{with} \: k = \text{min}(m,n) \]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where the last m-i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L_l; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>gelq2()#

rocblas_status rocsolver_zgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GELQ2 computes a LQ factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

\[ A = \left[\begin{array}{cc} L & 0 \end{array}\right] Q \]

where L is lower triangular (lower trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H(k)'H(k-1)' \cdots H(1)', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrix \(H(i)\) is given by

\[ H(i) = I - \text{ipiv}[i] \cdot v_i' v_i^{} \]

where the first i-1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on and below the diagonal contain the factor L; the elements above the diagonal are the last n - i elements of Householder vector v_i.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

  • ipiv[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars.

rocsolver_<type>gelq2_batched()#

rocblas_status rocsolver_zgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GELQ2_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = \left[\begin{array}{cc} L_l & 0 \end{array}\right] Q_l \]

where \(L_l\) is lower triangular (lower trapezoidal if m > n), and \(Q_l\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(k)'H_l(k-1)' \cdots H_l(1)', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}' v_{l_i}^{} \]

where the first i-1 elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and below the diagonal contain the factor L_l. The elements above the diagonal are the last n - i elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>gelq2_strided_batched()#

rocblas_status rocsolver_zgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GELQ2_STRIDED_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = \left[\begin{array}{cc} L_l & 0 \end{array}\right] Q_l \]

where \(L_l\) is lower triangular (lower trapezoidal if m > n), and \(Q_l\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(k)'H_l(k-1)' \cdots H_l(1)', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}' v_{l_i}^{} \]

where the first i-1 elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and below the diagonal contain the factor L_l. The elements above the diagonal are the last n - i elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>gelqf()#

rocblas_status rocsolver_zgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GELQF computes a LQ factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

\[ A = \left[\begin{array}{cc} L & 0 \end{array}\right] Q \]

where L is lower triangular (lower trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q = H(k)'H(k-1)' \cdots H(1)', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrix \(H(i)\) is given by

\[ H(i) = I - \text{ipiv}[i] \cdot v_i' v_i^{} \]

where the first i-1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on and below the diagonal contain the factor L; the elements above the diagonal are the last n - i elements of Householder vector v_i.

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of A.

  • ipiv[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars.

rocsolver_<type>gelqf_batched()#

rocblas_status rocsolver_zgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GELQF_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = \left[\begin{array}{cc} L_l & 0 \end{array}\right] Q_l \]

where \(L_l\) is lower triangular (lower trapezoidal if m > n), and \(Q_l\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(k)'H_l(k-1)' \cdots H_l(1)', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}' v_{l_i}^{} \]

where the first i-1 elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and below the diagonal contain the factor L_l. The elements above the diagonal are the last n - i elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>gelqf_strided_batched()#

rocblas_status rocsolver_zgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GELQF_STRIDED_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix \(A_l\) in the batch has the form

\[ A_l = \left[\begin{array}{cc} L_l & 0 \end{array}\right] Q_l \]

where \(L_l\) is lower triangular (lower trapezoidal if m > n), and \(Q_l\) is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

\[ Q_l = H_l(k)'H_l(k-1)' \cdots H_l(1)', \quad \text{with} \: k = \text{min}(m,n). \]

Each Householder matrices \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{ipiv}_l^{}[i] \cdot v_{l_i}' v_{l_i}^{} \]

where the first i-1 elements of Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on and below the diagonal contain the factor L_l. The elements above the diagonal are the last n - i elements of Householder vector v_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • ipiv[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors ipiv_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector ipiv_l to the next one ipiv_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

Problem and matrix reductions#

rocsolver_<type>gebd2()#

rocblas_status rocsolver_zgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tauq, rocblas_double_complex *taup)#
rocblas_status rocsolver_cgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tauq, rocblas_float_complex *taup)#
rocblas_status rocsolver_dgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, double *tauq, double *taup)#
rocblas_status rocsolver_sgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, float *tauq, float *taup)#

GEBD2 computes the bidiagonal form of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The bidiagonal form is given by:

\[ B = Q' A P \]

where B is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q and P are orthogonal/unitary matrices represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q = H(1)H(2)\cdots H(n)\: \text{and} \: P = G(1)G(2)\cdots G(n-1), & \: \text{if}\: m >= n, \:\text{or}\\ Q = H(1)H(2)\cdots H(m-1)\: \text{and} \: P = G(1)G(2)\cdots G(m), & \: \text{if}\: m < n. \end{array} \end{split}\]

Each Householder matrix \(H(i)\) and \(G(i)\) is given by

\[\begin{split} \begin{array}{cl} H(i) = I - \text{tauq}[i] \cdot v_i^{} v_i', & \: \text{and}\\ G(i) = I - \text{taup}[i] \cdot u_i' u_i^{}. \end{array} \end{split}\]

If m >= n, the first i-1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\); while the first i elements of the Householder vector \(u_i\) are zero, and \(u_i[i+1] = 1\). If m < n, the first i elements of the Householder vector \(v_i\) are zero, and \(v_i[i+1] = 1\); while the first i-1 elements of the Householder vector \(u_i\) are zero, and \(u_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B. If m >= n, the elements below the diagonal are the last m - i elements of Householder vector v_i, and the elements above the superdiagonal are the last n - i - 1 elements of Householder vector u_i. If m < n, the elements below the subdiagonal are the last m - i - 1 elements of Householder vector v_i, and the elements above the diagonal are the last n - i elements of Householder vector u_i.

  • lda[in] rocblas_int. lda >= m. specifies the leading dimension of A.

  • D[out] pointer to real type. Array on the GPU of dimension min(m,n). The diagonal elements of B.

  • E[out] pointer to real type. Array on the GPU of dimension min(m,n)-1. The off-diagonal elements of B.

  • tauq[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars associated with matrix Q.

  • taup[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars associated with matrix P.

rocsolver_<type>gebd2_batched()#

rocblas_status rocsolver_zgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tauq, const rocblas_stride strideQ, rocblas_float_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tauq, const rocblas_stride strideQ, double *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tauq, const rocblas_stride strideQ, float *taup, const rocblas_stride strideP, const rocblas_int batch_count)#

GEBD2_BATCHED computes the bidiagonal form of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

For each instance in the batch, the bidiagonal form is given by:

\[ B_l^{} = Q_l' A_l^{} P_l^{} \]

where \(B_l\) is upper bidiagonal if m >= n and lower bidiagonal if m < n, and \(Q_l\) and \(P_l\) are orthogonal/unitary matrices represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_l = H_l(1)H_l(2)\cdots H_l(n)\: \text{and} \: P_l = G_l(1)G_l(2)\cdots G_l(n-1), & \: \text{if}\: m >= n, \:\text{or}\\ Q_l = H_l(1)H_l(2)\cdots H_l(m-1)\: \text{and} \: P_l = G_l(1)G_l(2)\cdots G_l(m), & \: \text{if}\: m < n. \end{array} \end{split}\]

Each Householder matrix \(H_l(i)\) and \(G_l(i)\) is given by

\[\begin{split} \begin{array}{cl} H_l^{}(i) = I - \text{tauq}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}', & \: \text{and}\\ G_l^{}(i) = I - \text{taup}_l^{}[i] \cdot u_{l_i}' u_{l_i}^{}. \end{array} \end{split}\]

If m >= n, the first i-1 elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\); while the first i elements of the Householder vector \(u_{l_i}\) are zero, and \(u_{l_i}[i+1] = 1\). If m < n, the first i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i+1] = 1\); while the first i-1 elements of the Householder vector \(u_{l_i}\) are zero, and \(u_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B_l. If m >= n, the elements below the diagonal are the last m - i elements of Householder vector v_(l_i), and the elements above the superdiagonal are the last n - i - 1 elements of Householder vector u_(l_i). If m < n, the elements below the subdiagonal are the last m - i - 1 elements of Householder vector v_(l_i), and the elements above the diagonal are the last n - i elements of Householder vector u_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • D[out] pointer to real type. Array on the GPU (the size depends on the value of strideD). The diagonal elements of B_l.

  • strideD[in] rocblas_stride. Stride from the start of one vector D_l to the next one D_(l+1). There is no restriction for the value of strideD. Normal use case is strideD >= min(m,n).

  • E[out] pointer to real type. Array on the GPU (the size depends on the value of strideE). The off-diagonal elements of B_l.

  • strideE[in] rocblas_stride. Stride from the start of one vector E_l to the next one E_(l+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • tauq[out] pointer to type. Array on the GPU (the size depends on the value of strideQ). Contains the vectors tauq_l of Householder scalars associated with matrices Q_l.

  • strideQ[in] rocblas_stride. Stride from the start of one vector tauq_l to the next one tauq_(l+1). There is no restriction for the value of strideQ. Normal use is strideQ >= min(m,n).

  • taup[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors taup_l of Householder scalars associated with matrices P_l.

  • strideP[in] rocblas_stride. Stride from the start of one vector taup_l to the next one taup_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>gebd2_strided_batched()#

rocblas_status rocsolver_zgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tauq, const rocblas_stride strideQ, rocblas_float_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tauq, const rocblas_stride strideQ, double *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tauq, const rocblas_stride strideQ, float *taup, const rocblas_stride strideP, const rocblas_int batch_count)#

GEBD2_STRIDED_BATCHED computes the bidiagonal form of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

For each instance in the batch, the bidiagonal form is given by:

\[ B_l^{} = Q_l' A_l^{} P_l^{} \]

where \(B_l\) is upper bidiagonal if m >= n and lower bidiagonal if m < n, and \(Q_l\) and \(P_l\) are orthogonal/unitary matrices represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_l = H_l(1)H_l(2)\cdots H_l(n)\: \text{and} \: P_1 = G_l(1)G_l(2)\cdots G_l(n-1), & \: \text{if}\: m >= n, \:\text{or}\\ Q_l = H_l(1)H_l(2)\cdots H_l(m-1)\: \text{and} \: P_1 = G_l(1)G_l(2)\cdots G_l(m), & \: \text{if}\: m < n. \end{array} \end{split}\]

Each Householder matrix \(H_l(i)\) and \(G_l(i)\) is given by

\[\begin{split} \begin{array}{cl} H_l^{}(i) = I - \text{tauq}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}', & \: \text{and}\\ G_l^{}(i) = I - \text{taup}_l^{}[i] \cdot u_{l_i}' u_{l_i}^{}. \end{array} \end{split}\]

If m >= n, the first i-1 elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\); while the first i elements of the Householder vector \(u_{l_i}\) are zero, and \(u_{l_i}[i+1] = 1\). If m < n, the first i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i+1] = 1\); while the first i-1 elements of the Householder vector \(u_{l_i}\) are zero, and \(u_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B_l. If m >= n, the elements below the diagonal are the last m - i elements of Householder vector v_(l_i), and the elements above the superdiagonal are the last n - i - 1 elements of Householder vector u_(l_i). If m < n, the elements below the subdiagonal are the last m - i - 1 elements of Householder vector v_(l_i), and the elements above the diagonal are the last n - i elements of Householder vector u_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • D[out] pointer to real type. Array on the GPU (the size depends on the value of strideD). The diagonal elements of B_l.

  • strideD[in] rocblas_stride. Stride from the start of one vector D_l to the next one D_(l+1). There is no restriction for the value of strideD. Normal use case is strideD >= min(m,n).

  • E[out] pointer to real type. Array on the GPU (the size depends on the value of strideE). The off-diagonal elements of B_l.

  • strideE[in] rocblas_stride. Stride from the start of one vector E_l to the next one E_(l+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • tauq[out] pointer to type. Array on the GPU (the size depends on the value of strideQ). Contains the vectors tauq_l of Householder scalars associated with matrices Q_l.

  • strideQ[in] rocblas_stride. Stride from the start of one vector tauq_l to the next one tauq_(l+1). There is no restriction for the value of strideQ. Normal use is strideQ >= min(m,n).

  • taup[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors taup_l of Householder scalars associated with matrices P_l.

  • strideP[in] rocblas_stride. Stride from the start of one vector taup_l to the next one taup_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>gebrd()#

rocblas_status rocsolver_zgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tauq, rocblas_double_complex *taup)#
rocblas_status rocsolver_cgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tauq, rocblas_float_complex *taup)#
rocblas_status rocsolver_dgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, double *tauq, double *taup)#
rocblas_status rocsolver_sgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, float *tauq, float *taup)#

GEBRD computes the bidiagonal form of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The bidiagonal form is given by:

\[ B = Q' A P \]

where B is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q and P are orthogonal/unitary matrices represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q = H(1)H(2)\cdots H(n)\: \text{and} \: P = G(1)G(2)\cdots G(n-1), & \: \text{if}\: m >= n, \:\text{or}\\ Q = H(1)H(2)\cdots H(m-1)\: \text{and} \: P = G(1)G(2)\cdots G(m), & \: \text{if}\: m < n. \end{array} \end{split}\]

Each Householder matrix \(H(i)\) and \(G(i)\) is given by

\[\begin{split} \begin{array}{cl} H(i) = I - \text{tauq}[i] \cdot v_i^{} v_i', & \: \text{and}\\ G(i) = I - \text{taup}[i] \cdot u_i' u_i^{}. \end{array} \end{split}\]

If m >= n, the first i-1 elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\); while the first i elements of the Householder vector \(u_i\) are zero, and \(u_i[i+1] = 1\). If m < n, the first i elements of the Householder vector \(v_i\) are zero, and \(v_i[i+1] = 1\); while the first i-1 elements of the Householder vector \(u_i\) are zero, and \(u_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of the matrix A.

  • n[in] rocblas_int. n >= 0. The number of columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the m-by-n matrix to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B. If m >= n, the elements below the diagonal are the last m - i elements of Householder vector v_i, and the elements above the superdiagonal are the last n - i - 1 elements of Householder vector u_i. If m < n, the elements below the subdiagonal are the last m - i - 1 elements of Householder vector v_i, and the elements above the diagonal are the last n - i elements of Householder vector u_i.

  • lda[in] rocblas_int. lda >= m. specifies the leading dimension of A.

  • D[out] pointer to real type. Array on the GPU of dimension min(m,n). The diagonal elements of B.

  • E[out] pointer to real type. Array on the GPU of dimension min(m,n)-1. The off-diagonal elements of B.

  • tauq[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars associated with matrix Q.

  • taup[out] pointer to type. Array on the GPU of dimension min(m,n). The Householder scalars associated with matrix P.

rocsolver_<type>gebrd_batched()#

rocblas_status rocsolver_zgebrd_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgebrd_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tauq, const rocblas_stride strideQ, rocblas_float_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgebrd_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tauq, const rocblas_stride strideQ, double *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgebrd_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tauq, const rocblas_stride strideQ, float *taup, const rocblas_stride strideP, const rocblas_int batch_count)#

GEBRD_BATCHED computes the bidiagonal form of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

For each instance in the batch, the bidiagonal form is given by:

\[ B_l^{} = Q_l' A_l^{} P_l^{} \]

where \(B_l\) is upper bidiagonal if m >= n and lower bidiagonal if m < n, and \(Q_l\) and \(P_l\) are orthogonal/unitary matrices represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_l = H_l(1)H_l(2)\cdots H_l(n)\: \text{and} \: P_l = G_l(1)G_l(2)\cdots G_l(n-1), & \: \text{if}\: m >= n, \:\text{or}\\ Q_l = H_l(1)H_l(2)\cdots H_l(m-1)\: \text{and} \: P_l = G_l(1)G_l(2)\cdots G_l(m), & \: \text{if}\: m < n. \end{array} \end{split}\]

Each Householder matrix \(H_l(i)\) and \(G_l(i)\) is given by

\[\begin{split} \begin{array}{cl} H_l^{}(i) = I - \text{tauq}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}', & \: \text{and}\\ G_l^{}(i) = I - \text{taup}_l^{}[i] \cdot u_{l_i}' u_{l_i}^{}. \end{array} \end{split}\]

If m >= n, the first i-1 elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\); while the first i elements of the Householder vector \(u_{l_i}\) are zero, and \(u_{l_i}[i+1] = 1\). If m < n, the first i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i+1] = 1\); while the first i-1 elements of the Householder vector \(u_{l_i}\) are zero, and \(u_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the m-by-n matrices A_l to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B_l. If m >= n, the elements below the diagonal are the last m - i elements of Householder vector v_(l_i), and the elements above the superdiagonal are the last n - i - 1 elements of Householder vector u_(l_i). If m < n, the elements below the subdiagonal are the last m - i - 1 elements of Householder vector v_(l_i), and the elements above the diagonal are the last n - i elements of Householder vector u_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • D[out] pointer to real type. Array on the GPU (the size depends on the value of strideD). The diagonal elements of B_l.

  • strideD[in] rocblas_stride. Stride from the start of one vector D_l to the next one D_(l+1). There is no restriction for the value of strideD. Normal use case is strideD >= min(m,n).

  • E[out] pointer to real type. Array on the GPU (the size depends on the value of strideE). The off-diagonal elements of B_l.

  • strideE[in] rocblas_stride. Stride from the start of one vector E_l to the next one E_(l+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • tauq[out] pointer to type. Array on the GPU (the size depends on the value of strideQ). Contains the vectors tauq_l of Householder scalars associated with matrices Q_l.

  • strideQ[in] rocblas_stride. Stride from the start of one vector tauq_l to the next one tauq_(l+1). There is no restriction for the value of strideQ. Normal use is strideQ >= min(m,n).

  • taup[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors taup_l of Householder scalars associated with matrices P_l.

  • strideP[in] rocblas_stride. Stride from the start of one vector taup_l to the next one taup_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>gebrd_strided_batched()#

rocblas_status rocsolver_zgebrd_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgebrd_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tauq, const rocblas_stride strideQ, rocblas_float_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgebrd_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tauq, const rocblas_stride strideQ, double *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgebrd_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tauq, const rocblas_stride strideQ, float *taup, const rocblas_stride strideP, const rocblas_int batch_count)#

GEBRD_STRIDED_BATCHED computes the bidiagonal form of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

For each instance in the batch, the bidiagonal form is given by:

\[ B_l^{} = Q_l' A_l^{} P_l^{} \]

where \(B_l\) is upper bidiagonal if m >= n and lower bidiagonal if m < n, and \(Q_l\) and \(P_l\) are orthogonal/unitary matrices represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_l = H_l(1)H_l(2)\cdots H_l(n)\: \text{and} \: P_l = G_l(1)G_l(2)\cdots G_l(n-1), & \: \text{if}\: m >= n, \:\text{or}\\ Q_l = H_l(1)H_l(2)\cdots H_l(m-1)\: \text{and} \: P_l = G_l(1)G_l(2)\cdots G_l(m), & \: \text{if}\: m < n. \end{array} \end{split}\]

Each Householder matrix \(H_l(i)\) and \(G_l(i)\) is given by

\[\begin{split} \begin{array}{cl} H_l^{}(i) = I - \text{tauq}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}', & \: \text{and}\\ G_l^{}(i) = I - \text{taup}_l^{}[i] \cdot u_{l_i}' u_{l_i}^{}. \end{array} \end{split}\]

If m >= n, the first i-1 elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\); while the first i elements of the Householder vector \(u_{l_i}\) are zero, and \(u_{l_i}[i+1] = 1\). If m < n, the first i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i+1] = 1\); while the first i-1 elements of the Householder vector \(u_{l_i}\) are zero, and \(u_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • m[in] rocblas_int. m >= 0. The number of rows of all the matrices A_l in the batch.

  • n[in] rocblas_int. n >= 0. The number of columns of all the matrices A_l in the batch.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the m-by-n matrices A_l to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B_l. If m >= n, the elements below the diagonal are the last m - i elements of Householder vector v_(l_i), and the elements above the superdiagonal are the last n - i - 1 elements of Householder vector u_(l_i). If m < n, the elements below the subdiagonal are the last m - i - 1 elements of Householder vector v_(l_i), and the elements above the diagonal are the last n - i elements of Householder vector u_(l_i).

  • lda[in] rocblas_int. lda >= m. Specifies the leading dimension of matrices A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • D[out] pointer to real type. Array on the GPU (the size depends on the value of strideD). The diagonal elements of B_l.

  • strideD[in] rocblas_stride. Stride from the start of one vector D_l to the next one D_(l+1). There is no restriction for the value of strideD. Normal use case is strideD >= min(m,n).

  • E[out] pointer to real type. Array on the GPU (the size depends on the value of strideE). The off-diagonal elements of B_l.

  • strideE[in] rocblas_stride. Stride from the start of one vector E_l to the next one E_(l+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

  • tauq[out] pointer to type. Array on the GPU (the size depends on the value of strideQ). Contains the vectors tauq_l of Householder scalars associated with matrices Q_l.

  • strideQ[in] rocblas_stride. Stride from the start of one vector tauq_l to the next one tauq_(l+1). There is no restriction for the value of strideQ. Normal use is strideQ >= min(m,n).

  • taup[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors taup_l of Householder scalars associated with matrices P_l.

  • strideP[in] rocblas_stride. Stride from the start of one vector taup_l to the next one taup_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>sytd2()#

rocblas_status rocsolver_dsytd2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, double *tau)#
rocblas_status rocsolver_ssytd2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, float *tau)#

SYTD2 computes the tridiagonal form of a real symmetric matrix A.

(This is the unblocked version of the algorithm).

The tridiagonal form is given by:

\[ T = Q' A Q \]

where T is symmetric tridiagonal and Q is an orthogonal matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q = H(1)H(2)\cdots H(n-1) & \: \text{if uplo indicates lower, or}\\ Q = H(n-1)H(n-2)\cdots H(1) & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H(i)\) is given by

\[ H(i) = I - \text{tau}[i] \cdot v_i^{} v_i' \]

where tau[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_i\) are zero, and \(v_i[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the symmetric matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the matrix to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_i stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_i stored as columns.

  • lda[in] rocblas_int. lda >= n. The leading dimension of A.

  • D[out] pointer to type. Array on the GPU of dimension n. The diagonal elements of T.

  • E[out] pointer to type. Array on the GPU of dimension n-1. The off-diagonal elements of T.

  • tau[out] pointer to type. Array on the GPU of dimension n-1. The Householder scalars.

rocsolver_<type>sytd2_batched()#

rocblas_status rocsolver_dsytd2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tau, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytd2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tau, const rocblas_stride strideP, const rocblas_int batch_count)#

SYTD2_BATCHED computes the tridiagonal form of a batch of real symmetric matrices A_l.

(This is the unblocked version of the algorithm).

The tridiagonal form of \(A_l\) is given by:

\[ T_l^{} = Q_l' A_l^{} Q_l^{} \]

where \(T_l\) is symmetric tridiagonal and \(Q_l\) is an orthogonal matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_l = H_l(1)H_l(2)\cdots H_l(n-1) & \: \text{if uplo indicates lower, or}\\ Q_l = H_l(n-1)H_l(n-2)\cdots H_l(1) & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{tau}_l^{}[i] \cdot v_{l_i}^{} v_{l_i}' \]

where \(\text{tau}_l[i]\) is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the symmetric matrix A_l is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of the matrices A_l.

  • A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the matrices A_l to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T_l; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_(l_i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_l; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_(l_i) stored as columns.

  • lda[in] rocblas_int. lda >= n. The leading dimension of A_l.

  • D[out] pointer to type. Array on the GPU (the size depends on the value of strideD). The diagonal elements of T_l.

  • strideD[in] rocblas_stride. Stride from the start of one vector D_l to the next one D_(l+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • E[out] pointer to type. Array on the GPU (the size depends on the value of strideE). The off-diagonal elements of T_l.

  • strideE[in] rocblas_stride. Stride from the start of one vector E_l to the next one E_(l+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • tau[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors tau_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector tau_l to the next one tau_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>sytd2_strided_batched()#

rocblas_status rocsolver_dsytd2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tau, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytd2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tau, const rocblas_stride strideP, const rocblas_int batch_count)#

SYTD2_STRIDED_BATCHED computes the tridiagonal form of a batch of real symmetric matrices A_l.

(This is the unblocked version of the algorithm).

The tridiagonal form of \(A_l\) is given by:

\[ T_l^{} = Q_l' A_l^{} Q_l^{} \]

where \(T_l\) is symmetric tridiagonal and \(Q_l\) is an orthogonal matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_l = H_l(1)H_l(2)\cdots H_l(n-1) & \: \text{if uplo indicates lower, or}\\ Q_l = H_l(n-1)H_l(n-2)\cdots H_l(1) & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{tau}_l[i] \cdot v_{l_i}^{} v_{l_i}' \]

where \(\text{tau}_l[i]\) is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the symmetric matrix A_l is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of the matrices A_l.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the matrices A_l to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T_l; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_(l_i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_l; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_(l_i) stored as columns.

  • lda[in] rocblas_int. lda >= n. The leading dimension of A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • D[out] pointer to type. Array on the GPU (the size depends on the value of strideD). The diagonal elements of T_l.

  • strideD[in] rocblas_stride. Stride from the start of one vector D_l to the next one D_(l+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • E[out] pointer to type. Array on the GPU (the size depends on the value of strideE). The off-diagonal elements of T_l.

  • strideE[in] rocblas_stride. Stride from the start of one vector E_l to the next one E_(l+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • tau[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors tau_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector tau_l to the next one tau_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>hetd2()#

rocblas_status rocsolver_zhetd2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tau)#
rocblas_status rocsolver_chetd2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tau)#

HETD2 computes the tridiagonal form of a complex hermitian matrix A.

(This is the unblocked version of the algorithm).

The tridiagonal form is given by:

\[ T = Q' A Q \]

where T is hermitian tridiagonal and Q is an unitary matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q = H(1)H(2)\cdots H(n-1) & \: \text{if uplo indicates lower, or}\\ Q = H(n-1)H(n-2)\cdots H(1) & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H(i)\) is given by

\[ H(i) = I - \text{tau}[i] \cdot v_i^{} v_i' \]

where tau[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_i\) are zero, and \(v_i[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the hermitian matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the matrix to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T; the elements above the superdiagonal contain the first i-1 elements of the Householders vector v_i stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_i stored as columns.

  • lda[in] rocblas_int. lda >= n. The leading dimension of A.

  • D[out] pointer to real type. Array on the GPU of dimension n. The diagonal elements of T.

  • E[out] pointer to real type. Array on the GPU of dimension n-1. The off-diagonal elements of T.

  • tau[out] pointer to type. Array on the GPU of dimension n-1. The Householder scalars.

rocsolver_<type>hetd2_batched()#

rocblas_status rocsolver_zhetd2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_chetd2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)#

HETD2_BATCHED computes the tridiagonal form of a batch of complex hermitian matrices A_l.

(This is the unblocked version of the algorithm).

The tridiagonal form of \(A_l\) is given by:

\[ T_l^{} = Q_l' A_l^{} Q_l^{} \]

where \(T_l\) is Hermitian tridiagonal and \(Q_l\) is a unitary matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_l = H_l(1)H_l(2)\cdots H_l(n-1) & \: \text{if uplo indicates lower, or}\\ Q_l = H_l(n-1)H_l(n-2)\cdots H_l(1) & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{tau}_l[i] \cdot v_{l_i}^{} v_{l_i}' \]

where \(\text{tau}_l[i]\) is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the hermitian matrix A_l is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of the matrices A_l.

  • A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the matrices A_l to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T_l; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_(l_i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_l; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_(l_i) stored as columns.

  • lda[in] rocblas_int. lda >= n. The leading dimension of A_l.

  • D[out] pointer to real type. Array on the GPU (the size depends on the value of strideD). The diagonal elements of T_l.

  • strideD[in] rocblas_stride. Stride from the start of one vector D_l to the next one D_(l+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • E[out] pointer to real type. Array on the GPU (the size depends on the value of strideE). The off-diagonal elements of T_l.

  • strideE[in] rocblas_stride. Stride from the start of one vector E_l to the next one E_(l+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • tau[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors tau_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector tau_l to the next one tau_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>hetd2_strided_batched()#

rocblas_status rocsolver_zhetd2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_chetd2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)#

HETD2_STRIDED_BATCHED computes the tridiagonal form of a batch of complex hermitian matrices A_l.

(This is the unblocked version of the algorithm).

The tridiagonal form of \(A_l\) is given by:

\[ T_l^{} = Q_l' A_l^{} Q_l^{} \]

where \(T_l\) is Hermitian tridiagonal and \(Q_l\) is a unitary matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_l = H_l(1)H_l(2)\cdots H_l(n-1) & \: \text{if uplo indicates lower, or}\\ Q_l = H_l(n-1)H_l(n-2)\cdots H_l(1) & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{tau}_l[i] \cdot v_{l_i}^{} v_{l_i}' \]

where \(\text{tau}_l[i]\) is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the hermitian matrix A_l is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of the matrices A_l.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the matrices A_l to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T_l; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_(l_i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_l; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_(l_i) stored as columns.

  • lda[in] rocblas_int. lda >= n. The leading dimension of A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • D[out] pointer to real type. Array on the GPU (the size depends on the value of strideD). The diagonal elements of T_l.

  • strideD[in] rocblas_stride. Stride from the start of one vector D_l to the next one D_(l+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • E[out] pointer to real type. Array on the GPU (the size depends on the value of strideE). The off-diagonal elements of T_l.

  • strideE[in] rocblas_stride. Stride from the start of one vector E_l to the next one E_(l+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • tau[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors tau_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector tau_l to the next one tau_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>sytrd()#

rocblas_status rocsolver_dsytrd(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, double *tau)#
rocblas_status rocsolver_ssytrd(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, float *tau)#

SYTRD computes the tridiagonal form of a real symmetric matrix A.

(This is the blocked version of the algorithm).

The tridiagonal form is given by:

\[ T = Q' A Q \]

where T is symmetric tridiagonal and Q is an orthogonal matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q = H(1)H(2)\cdots H_(n-1) & \: \text{if uplo indicates lower, or}\\ Q = H(n-1)H(n-2)\cdots H(1) & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H(i)\) is given by

\[ H(i) = I - \text{tau}[i] \cdot v_i^{} v_i' \]

where tau[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_i\) are zero, and \(v_i[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the symmetric matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the matrix to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_i stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_i stored as columns.

  • lda[in] rocblas_int. lda >= n. The leading dimension of A.

  • D[out] pointer to type. Array on the GPU of dimension n. The diagonal elements of T.

  • E[out] pointer to type. Array on the GPU of dimension n-1. The off-diagonal elements of T.

  • tau[out] pointer to type. Array on the GPU of dimension n-1. The Householder scalars.

rocsolver_<type>sytrd_batched()#

rocblas_status rocsolver_dsytrd_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tau, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytrd_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tau, const rocblas_stride strideP, const rocblas_int batch_count)#

SYTRD_BATCHED computes the tridiagonal form of a batch of real symmetric matrices A_l.

(This is the blocked version of the algorithm).

The tridiagonal form of \(A_l\) is given by:

\[ T_l^{} = Q_l' A_l^{} Q_l^{} \]

where \(T_l\) is symmetric tridiagonal and \(Q_l\) is an orthogonal matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_l = H_l(1)H_l(2)\cdots H_l(n-1) & \: \text{if uplo indicates lower, or}\\ Q_l = H_l(n-1)H_l(n-2)\cdots H_l(1) & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{tau}_l[i] \cdot v_{l_i}^{} v_{l_i}' \]

where \(\text{tau}_l[i]\) is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the symmetric matrix A_l is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of the matrices A_l.

  • A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the matrices A_l to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T_l; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_(l_i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_l; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_(l_i) stored as columns.

  • lda[in] rocblas_int. lda >= n. The leading dimension of A_l.

  • D[out] pointer to type. Array on the GPU (the size depends on the value of strideD). The diagonal elements of T_l.

  • strideD[in] rocblas_stride. Stride from the start of one vector D_l to the next one D_(l+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • E[out] pointer to type. Array on the GPU (the size depends on the value of strideE). The off-diagonal elements of T_l.

  • strideE[in] rocblas_stride. Stride from the start of one vector E_l to the next one E_(l+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • tau[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors tau_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector tau_l to the next one tau_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>sytrd_strided_batched()#

rocblas_status rocsolver_dsytrd_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tau, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytrd_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tau, const rocblas_stride strideP, const rocblas_int batch_count)#

SYTRD_STRIDED_BATCHED computes the tridiagonal form of a batch of real symmetric matrices A_l.

(This is the blocked version of the algorithm).

The tridiagonal form of \(A_l\) is given by:

\[ T_l^{} = Q_l' A_l^{} Q_l^{} \]

where \(T_l\) is symmetric tridiagonal and \(Q_l\) is an orthogonal matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_l = H_l(1)H_l(2)\cdots H_l(n-1) & \: \text{if uplo indicates lower, or}\\ Q_l = H_l(n-1)H_l(n-2)\cdots H_l(1) & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{tau}_l[i] \cdot v_{l_i}^{} v_{l_i}' \]

where \(\text{tau}_l[i]\) is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the symmetric matrix A_l is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of the matrices A_l.

  • A[inout] pointer to type. Array on the GPU (the size depends on the value of strideA). On entry, the matrices A_l to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T_l; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_(l_i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_l; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_(l_i) stored as columns.

  • lda[in] rocblas_int. lda >= n. The leading dimension of A_l.

  • strideA[in] rocblas_stride. Stride from the start of one matrix A_l to the next one A_(l+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • D[out] pointer to type. Array on the GPU (the size depends on the value of strideD). The diagonal elements of T_l.

  • strideD[in] rocblas_stride. Stride from the start of one vector D_l to the next one D_(l+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • E[out] pointer to type. Array on the GPU (the size depends on the value of strideE). The off-diagonal elements of T_l.

  • strideE[in] rocblas_stride. Stride from the start of one vector E_l to the next one E_(l+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • tau[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors tau_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector tau_l to the next one tau_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>hetrd()#

rocblas_status rocsolver_zhetrd(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tau)#
rocblas_status rocsolver_chetrd(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tau)#

HETRD computes the tridiagonal form of a complex hermitian matrix A.

(This is the blocked version of the algorithm).

The tridiagonal form is given by:

\[ T = Q' A Q \]

where T is hermitian tridiagonal and Q is an unitary matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q = H(1)H(2)\cdots H(n-1) & \: \text{if uplo indicates lower, or}\\ Q = H(n-1)H(n-2)\cdots H(1) & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H(i)\) is given by

\[ H(i) = I - \text{tau}[i] \cdot v_i^{} v_i' \]

where tau[i] is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_i\) are zero, and \(v_i[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_i\) are zero, and \(v_i[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the hermitian matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of the matrix A.

  • A[inout] pointer to type. Array on the GPU of dimension lda*n. On entry, the matrix to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_i stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_i stored as columns.

  • lda[in] rocblas_int. lda >= n. The leading dimension of A.

  • D[out] pointer to real type. Array on the GPU of dimension n. The diagonal elements of T.

  • E[out] pointer to real type. Array on the GPU of dimension n-1. The off-diagonal elements of T.

  • tau[out] pointer to type. Array on the GPU of dimension n-1. The Householder scalars.

rocsolver_<type>hetrd_batched()#

rocblas_status rocsolver_zhetrd_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_chetrd_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)#

HETRD_BATCHED computes the tridiagonal form of a batch of complex hermitian matrices A_l.

(This is the blocked version of the algorithm).

The tridiagonal form of \(A_l\) is given by:

\[ T_l^{} = Q_l' A_l^{} Q_l^{} \]

where \(T_l\) is Hermitian tridiagonal and \(Q_l\) is a unitary matrix represented as the product of Householder matrices

\[\begin{split} \begin{array}{cl} Q_l = H_l(1)H_l(2)\cdots H_l(n-1) & \: \text{if uplo indicates lower, or}\\ Q_l = H_l(n-1)H_l(n-2)\cdots H_l(1) & \: \text{if uplo indicates upper.} \end{array} \end{split}\]

Each Householder matrix \(H_l(i)\) is given by

\[ H_l^{}(i) = I - \text{tau}_l[i] \cdot v_{l_i}^{} v_{l_i}' \]

where \(\text{tau}_l[i]\) is the corresponding Householder scalar. When uplo indicates lower, the first i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i+1] = 1\). If uplo indicates upper, the last n-i elements of the Householder vector \(v_{l_i}\) are zero, and \(v_{l_i}[i] = 1\).

Parameters:
  • handle[in] rocblas_handle.

  • uplo[in] rocblas_fill. Specifies whether the upper or lower part of the hermitian matrix A_l is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_l is not used.

  • n[in] rocblas_int. n >= 0. The number of rows and columns of the matrices A_l.

  • A[inout] array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n. On entry, the matrices A_l to be factored. On exit, if upper, then the elements on the diagonal and superdiagonal contain the tridiagonal form T_l; the elements above the superdiagonal contain the first i-1 elements of the Householder vectors v_(l_i) stored as columns. If lower, then the elements on the diagonal and subdiagonal contain the tridiagonal form T_l; the elements below the subdiagonal contain the last n-i-1 elements of the Householder vectors v_(l_i) stored as columns.

  • lda[in] rocblas_int. lda >= n. The leading dimension of A_l.

  • D[out] pointer to real type. Array on the GPU (the size depends on the value of strideD). The diagonal elements of T_l.

  • strideD[in] rocblas_stride. Stride from the start of one vector D_l to the next one D_(l+1). There is no restriction for the value of strideD. Normal use case is strideD >= n.

  • E[out] pointer to real type. Array on the GPU (the size depends on the value of strideE). The off-diagonal elements of T_l.

  • strideE[in] rocblas_stride. Stride from the start of one vector E_l to the next one E_(l+1). There is no restriction for the value of strideE. Normal use case is strideE >= n-1.

  • tau[out] pointer to type. Array on the GPU (the size depends on the value of strideP). Contains the vectors tau_l of corresponding Householder scalars.

  • strideP[in] rocblas_stride. Stride from the start of one vector tau_l to the next one tau_(l+1). There is no restriction for the value of strideP. Normal use is strideP >= n-1.

  • batch_count[in] rocblas_int. batch_count >= 0. Number of matrices in the batch.

rocsolver_<type>hetrd_strided_batched()#

rocblas_status rocsolver_zhetrd_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tau, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_chetrd_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tau,