Lapack-like Functions

Contents

Lapack-like Functions#

Other Lapack-like routines provided by rocSOLVER. These are divided into the following subcategories:

Note

Throughout the APIs’ descriptions, we use the following notations:

  • x[i] stands for the i-th element of vector x, while A[i,j] represents the element in the i-th row and j-th column of matrix A. Indices are 1-based, i.e. x[1] is the first element of x.

  • If X is a real vector or matrix, \(X^T\) indicates its transpose; if X is complex, then \(X^H\) represents its conjugate transpose. When X could be real or complex, we use X’ to indicate X transposed or X conjugate transposed, accordingly.

  • x_i \(=x_i\); we sometimes use both notations, \(x_i\) when displaying mathematical equations, and x_i in the text describing the function parameters.

Triangular factorizations#

rocsolver_<type>getf2_npvt()#

rocblas_status rocsolver_zgetf2_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_cgetf2_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_dgetf2_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_sgetf2_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)#

GETF2_NPVT computes the LU factorization of a general m-by-n matrix A without partial pivoting.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization has the form

\[ A = LU \]

where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

Note: Although this routine can offer better performance, Gaussian elimination without pivoting is not backward stable. If numerical accuracy is compromised, use the legacy-LAPACK-like API GETF2 routines instead.

Parameters:
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • info[out]

    pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero element in the diagonal. The factorization from this point might be incomplete.

rocsolver_<type>getf2_npvt_batched()#

rocblas_status rocsolver_zgetf2_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetf2_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetf2_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetf2_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#

GETF2_NPVT_BATCHED computes the LU factorization of a batch of general m-by-n matrices without partial pivoting.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix \(A_j\) in the batch has the form

\[ A_j = L_jU_j \]

where \(L_j\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_j\) is upper triangular (upper trapezoidal if m < n).

Note: Although this routine can offer better performance, Gaussian elimination without pivoting is not backward stable. If numerical accuracy is compromised, use the legacy-LAPACK-like API GETF2_BATCHED routines instead.

Parameters:
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all matrices A_j in the batch.

  • A[inout]

    array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the factors L_j and U_j from the factorizations. The unit diagonal elements of L_j are not stored.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for factorization of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero element in the diagonal. The factorization from this point might be incomplete.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getf2_npvt_strided_batched()#

rocblas_status rocsolver_zgetf2_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetf2_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetf2_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetf2_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#

GETF2_NPVT_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices without partial pivoting.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix \(A_j\) in the batch has the form

\[ A_j = L_jU_j \]

where \(L_j\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_j\) is upper triangular (upper trapezoidal if m < n).

Note: Although this routine can offer better performance, Gaussian elimination without pivoting is not backward stable. If numerical accuracy is compromised, use the legacy-LAPACK-like API GETF2_STRIDED_BATCHED routines instead.

Parameters:
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all matrices A_j in the batch.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the factors L_j and U_j from the factorization. The unit diagonal elements of L_j are not stored.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for factorization of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero element in the diagonal. The factorization from this point might be incomplete.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getrf_npvt()#

rocblas_status rocsolver_zgetrf_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_cgetrf_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_dgetrf_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_sgetrf_npvt(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)#

GETRF_NPVT computes the LU factorization of a general m-by-n matrix A without partial pivoting.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization has the form

\[ A = LU \]

where L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

Note: Although this routine can offer better performance, Gaussian elimination without pivoting is not backward stable. If numerical accuracy is compromised, use the legacy-LAPACK-like API GETRF routines instead.

Parameters:
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of the matrix A.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of the matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of A.

  • info[out]

    pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero element in the diagonal. The factorization from this point might be incomplete.

rocsolver_<type>getrf_npvt_batched()#

rocblas_status rocsolver_zgetrf_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetrf_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetrf_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetrf_npvt_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#

GETRF_NPVT_BATCHED computes the LU factorization of a batch of general m-by-n matrices without partial pivoting.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix \(A_j\) in the batch has the form

\[ A_j = L_jU_j \]

where \(L_j\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_j\) is upper triangular (upper trapezoidal if m < n).

Note: Although this routine can offer better performance, Gaussian elimination without pivoting is not backward stable. If numerical accuracy is compromised, use the legacy-LAPACK-like API GETRF_BATCHED routines instead.

Parameters:
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all matrices A_j in the batch.

  • A[inout]

    array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the m-by-n matrices A_j to be factored. On exit, the factors L_j and U_j from the factorizations. The unit diagonal elements of L_j are not stored.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for factorization of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero element in the diagonal. The factorization from this point might be incomplete.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getrf_npvt_strided_batched()#

rocblas_status rocsolver_zgetrf_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetrf_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetrf_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetrf_npvt_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#

GETRF_NPVT_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices without partial pivoting.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix \(A_j\) in the batch has the form

\[ A_j = L_jU_j \]

where \(L_j\) is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and \(U_j\) is upper triangular (upper trapezoidal if m < n).

Note: Although this routine can offer better performance, Gaussian elimination without pivoting is not backward stable. If numerical accuracy is compromised, use the legacy-LAPACK-like API GETRF_STRIDED_BATCHED routines instead.

Parameters:
  • handle[in] rocblas_handle.

  • m[in]

    rocblas_int. m >= 0.

    The number of rows of all matrices A_j in the batch.

  • n[in]

    rocblas_int. n >= 0.

    The number of columns of all matrices A_j in the batch.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the m-by-n matrices A_j to be factored. On exit, the factors L_j and U_j from the factorization. The unit diagonal elements of L_j are not stored.

  • lda[in]

    rocblas_int. lda >= m.

    Specifies the leading dimension of matrices A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for factorization of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero element in the diagonal. The factorization from this point might be incomplete.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

Linear-systems solvers#

rocsolver_<type>getri_npvt()#

rocblas_status rocsolver_zgetri_npvt(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_cgetri_npvt(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_dgetri_npvt(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_sgetri_npvt(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)#

GETRI_NPVT inverts a general n-by-n matrix A using the LU factorization computed by GETRF_NPVT.

The inverse is computed by solving the linear system

\[ A^{-1}L = U^{-1} \]

where L is the lower triangular factor of A with unit diagonal elements, and U is the upper triangular factor.

Parameters:
  • handle[in] rocblas_handle.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of the matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the factors L and U of the factorization A = L*U returned by

    GETRF_NPVT. On exit, the inverse of A if info = 0; otherwise undefined.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of A.

  • info[out]

    pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero pivot.

rocsolver_<type>getri_npvt_batched()#

rocblas_status rocsolver_zgetri_npvt_batched(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetri_npvt_batched(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetri_npvt_batched(rocblas_handle handle, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetri_npvt_batched(rocblas_handle handle, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#

GETRI_NPVT_BATCHED inverts a batch of general n-by-n matrices using the LU factorization computed by GETRF_NPVT_BATCHED.

The inverse of matrix \(A_j\) in the batch is computed by solving the linear system

\[ A_j^{-1} L_j = U_j^{-1} \]

where \(L_j\) is the lower triangular factor of \(A_j\) with unit diagonal elements, and \(U_j\) is the upper triangular factor.

Parameters:
  • handle[in] rocblas_handle.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of all matrices A_j in the batch.

  • A[inout]

    array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the factors L_j and U_j of the factorization A = L_j*U_j returned by

    GETRF_NPVT_BATCHED. On exit, the inverses of A_j if info[j] = 0; otherwise undefined.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for inversion of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero pivot.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getri_npvt_strided_batched()#

rocblas_status rocsolver_zgetri_npvt_strided_batched(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetri_npvt_strided_batched(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetri_npvt_strided_batched(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetri_npvt_strided_batched(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#

GETRI_NPVT_STRIDED_BATCHED inverts a batch of general n-by-n matrices using the LU factorization computed by GETRF_NPVT_STRIDED_BATCHED.

The inverse of matrix \(A_j\) in the batch is computed by solving the linear system

\[ A_j^{-1} L_j = U_j^{-1} \]

where \(L_j\) is the lower triangular factor of \(A_j\) with unit diagonal elements, and \(U_j\) is the upper triangular factor.

Parameters:
  • handle[in] rocblas_handle.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of all matrices A_j in the batch.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the factors L_j and U_j of the factorization A_j = L_j*U_j returned by

    GETRF_NPVT_STRIDED_BATCHED. On exit, the inverses of A_j if info[j] = 0; otherwise undefined.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for inversion of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero pivot.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getri_outofplace()#

rocblas_status rocsolver_zgetri_outofplace(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_double_complex *C, const rocblas_int ldc, rocblas_int *info)#
rocblas_status rocsolver_cgetri_outofplace(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_float_complex *C, const rocblas_int ldc, rocblas_int *info)#
rocblas_status rocsolver_dgetri_outofplace(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, double *C, const rocblas_int ldc, rocblas_int *info)#
rocblas_status rocsolver_sgetri_outofplace(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, float *C, const rocblas_int ldc, rocblas_int *info)#

GETRI_OUTOFPLACE computes the inverse \(C = A^{-1}\) of a general n-by-n matrix A.

The inverse is computed by solving the linear system

\[ AC = I \]

where I is the identity matrix, and A is factorized as \(A = PLU\) as given by GETRF.

Parameters:
  • handle[in] rocblas_handle.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of the matrix A.

  • A[in]

    pointer to type. Array on the GPU of dimension lda*n.

    The factors L and U of the factorization A = P*L*U returned by

    GETRF.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of A.

  • ipiv[in]

    pointer to rocblas_int. Array on the GPU of dimension n.

    The pivot indices returned by

    GETRF.

  • C[out]

    pointer to type. Array on the GPU of dimension ldc*n.

    If info = 0, the inverse of A. Otherwise, undefined.

  • ldc[in]

    rocblas_int. ldc >= n.

    Specifies the leading dimension of C.

  • info[out]

    pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero pivot.

rocsolver_<type>getri_outofplace_batched()#

rocblas_status rocsolver_zgetri_outofplace_batched(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_double_complex *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetri_outofplace_batched(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_float_complex *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetri_outofplace_batched(rocblas_handle handle, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, double *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetri_outofplace_batched(rocblas_handle handle, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, float *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#

GETRI_OUTOFPLACE_BATCHED computes the inverse \(C_j = A_j^{-1}\) of a batch of general n-by-n matrices \(A_j\).

The inverse is computed by solving the linear system

\[ A_j C_j = I \]

where I is the identity matrix, and \(A_j\) is factorized as \(A_j = P_j L_j U_j\) as given by GETRF_BATCHED.

Parameters:
  • handle[in] rocblas_handle.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of all matrices A_j in the batch.

  • A[in]

    array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    The factors L_j and U_j of the factorization A_j = P_j*L_j*U_j returned by

    GETRF_BATCHED.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • ipiv[in]

    pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    The pivot indices returned by

    GETRF_BATCHED.

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(i+j). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • C[out]

    array of pointers to type. Each pointer points to an array on the GPU of dimension ldc*n.

    If info[j] = 0, the inverse of matrices A_j. Otherwise, undefined.

  • ldc[in]

    rocblas_int. ldc >= n.

    Specifies the leading dimension of C_j.

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for inversion of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero pivot.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getri_outofplace_strided_batched()#

rocblas_status rocsolver_zgetri_outofplace_strided_batched(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_double_complex *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetri_outofplace_strided_batched(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_float_complex *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetri_outofplace_strided_batched(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, double *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetri_outofplace_strided_batched(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, float *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#

GETRI_OUTOFPLACE_STRIDED_BATCHED computes the inverse \(C_j = A_j^{-1}\) of a batch of general n-by-n matrices \(A_j\).

The inverse is computed by solving the linear system

\[ A_j C_j = I \]

where I is the identity matrix, and \(A_j\) is factorized as \(A_j = P_j L_j U_j\) as given by GETRF_STRIDED_BATCHED.

Parameters:
  • handle[in] rocblas_handle.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of all matrices A_j in the batch.

  • A[in]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    The factors L_j and U_j of the factorization A_j = P_j*L_j*U_j returned by

    GETRF_STRIDED_BATCHED.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • ipiv[in]

    pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

    The pivot indices returned by

    GETRF_STRIDED_BATCHED.

  • strideP[in]

    rocblas_stride.

    Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

  • C[out]

    pointer to type. Array on the GPU (the size depends on the value of strideC).

    If info[j] = 0, the inverse of matrices A_j. Otherwise, undefined.

  • ldc[in]

    rocblas_int. ldc >= n.

    Specifies the leading dimension of C_j.

  • strideC[in]

    rocblas_stride.

    Stride from the start of one matrix C_j to the next one C_(j+1). There is no restriction for the value of strideC. Normal use case is strideC >= ldc*n

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for inversion of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero pivot.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getri_npvt_outofplace()#

rocblas_status rocsolver_zgetri_npvt_outofplace(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *C, const rocblas_int ldc, rocblas_int *info)#
rocblas_status rocsolver_cgetri_npvt_outofplace(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *C, const rocblas_int ldc, rocblas_int *info)#
rocblas_status rocsolver_dgetri_npvt_outofplace(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, double *C, const rocblas_int ldc, rocblas_int *info)#
rocblas_status rocsolver_sgetri_npvt_outofplace(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, float *C, const rocblas_int ldc, rocblas_int *info)#

GETRI_NPVT_OUTOFPLACE computes the inverse \(C = A^{-1}\) of a general n-by-n matrix A without partial pivoting.

The inverse is computed by solving the linear system

\[ AC = I \]

where I is the identity matrix, and A is factorized as \(A = LU\) as given by GETRF_NPVT.

Parameters:
  • handle[in] rocblas_handle.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of the matrix A.

  • A[in]

    pointer to type. Array on the GPU of dimension lda*n.

    The factors L and U of the factorization A = L*U returned by

    GETRF_NPVT.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of A.

  • C[out]

    pointer to type. Array on the GPU of dimension ldc*n.

    If info = 0, the inverse of A. Otherwise, undefined.

  • ldc[in]

    rocblas_int. ldc >= n.

    Specifies the leading dimension of C.

  • info[out]

    pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero pivot.

rocsolver_<type>getri_npvt_outofplace_batched()#

rocblas_status rocsolver_zgetri_npvt_outofplace_batched(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetri_npvt_outofplace_batched(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetri_npvt_outofplace_batched(rocblas_handle handle, const rocblas_int n, double *const A[], const rocblas_int lda, double *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetri_npvt_outofplace_batched(rocblas_handle handle, const rocblas_int n, float *const A[], const rocblas_int lda, float *const C[], const rocblas_int ldc, rocblas_int *info, const rocblas_int batch_count)#

GETRI_NPVT_OUTOFPLACE_BATCHED computes the inverse \(C_j = A_j^{-1}\) of a batch of general n-by-n matrices \(A_j\) without partial pivoting.

The inverse is computed by solving the linear system

\[ A_j C_j = I \]

where I is the identity matrix, and \(A_j\) is factorized as \(A_j = L_j U_j\) as given by GETRF_NPVT_BATCHED.

Parameters:
  • handle[in] rocblas_handle.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of all matrices A_j in the batch.

  • A[in]

    array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    The factors L_j and U_j of the factorization A_j = L_j*U_j returned by

    GETRF_NPVT_BATCHED.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • C[out]

    array of pointers to type. Each pointer points to an array on the GPU of dimension ldc*n.

    If info[j] = 0, the inverse of matrices A_j. Otherwise, undefined.

  • ldc[in]

    rocblas_int. ldc >= n.

    Specifies the leading dimension of C_j.

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for inversion of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero pivot.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>getri_npvt_outofplace_strided_batched()#

rocblas_status rocsolver_zgetri_npvt_outofplace_strided_batched(rocblas_handle handle, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetri_npvt_outofplace_strided_batched(rocblas_handle handle, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetri_npvt_outofplace_strided_batched(rocblas_handle handle, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetri_npvt_outofplace_strided_batched(rocblas_handle handle, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *C, const rocblas_int ldc, const rocblas_stride strideC, rocblas_int *info, const rocblas_int batch_count)#

GETRI_NPVT_OUTOFPLACE_STRIDED_BATCHED computes the inverse \(C_j = A_j^{-1}\) of a batch of general n-by-n matrices \(A_j\) without partial pivoting.

The inverse is computed by solving the linear system

\[ A_j C_j = I \]

where I is the identity matrix, and \(A_j\) is factorized as \(A_j = L_j U_j\) as given by GETRF_NPVT_STRIDED_BATCHED.

Parameters:
  • handle[in] rocblas_handle.

  • n[in]

    rocblas_int. n >= 0.

    The number of rows and columns of all matrices A_j in the batch.

  • A[in]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    The factors L_j and U_j of the factorization A_j = L_j*U_j returned by

    GETRF_NPVT_STRIDED_BATCHED.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

  • C[out]

    pointer to type. Array on the GPU (the size depends on the value of strideC).

    If info[j] = 0, the inverse of matrices A_j. Otherwise, undefined.

  • ldc[in]

    rocblas_int. ldc >= n.

    Specifies the leading dimension of C_j.

  • strideC[in]

    rocblas_stride.

    Stride from the start of one matrix C_j to the next one C_(j+1). There is no restriction for the value of strideC. Normal use case is strideC >= ldc*n

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for inversion of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero pivot.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

Symmetric eigensolvers#

rocsolver_<type>syevj()#

rocblas_status rocsolver_dsyevj(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, rocblas_int *info)#
rocblas_status rocsolver_ssyevj(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, rocblas_int *info)#

SYEVJ computes the eigenvalues and optionally the eigenvectors of a real symmetric matrix A.

The eigenvalues are found using the iterative Jacobi algorithm and are returned in an order depending on the value of esort. The eigenvectors are computed depending on the value of evect. The computed eigenvectors are orthonormal.

At the \(k\)-th iteration (or “sweep”), \(A\) is transformed by a product of Jacobi rotations \(V\) as

\[ A^{(k)} = V' A^{(k-1)} V \]

such that \(off(A^{(k)}) < off(A^{(k-1)})\), where \(A^{(0)} = A\) and \(off(A^{(k)})\) is the Frobenius norm of the off-diagonal elements of \(A^{(k)}\). As \(off(A^{(k)}) \rightarrow 0\), the diagonal elements of \(A^{(k)}\) increasingly resemble the eigenvalues of \(A\).

Parameters:
  • handle[in] rocblas_handle.

  • esort[in] rocblas_esort

    .

    Specifies the order of the returned eigenvalues. If esort is rocblas_esort_ascending, then the eigenvalues are sorted and returned in ascending order. If esort is rocblas_esort_none, then the order of the returned eigenvalues is unspecified.

  • evect[in] rocblas_evect

    .

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    Number of rows and columns of matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A. On exit, the eigenvectors of A if they were computed and the algorithm converged; otherwise the contents of A are unchanged.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of matrix A.

  • abstol[in]

    type.

    The absolute tolerance. The algorithm is considered to have converged once off(A) is <= abstol. If abstol <= 0, then the tolerance will be set to machine precision.

  • residual[out]

    pointer to type on the GPU.

    The Frobenius norm of the off-diagonal elements of A (i.e. off(A)) at the final iteration.

  • max_sweeps[in]

    rocblas_int. max_sweeps > 0.

    Maximum number of sweeps (iterations) to be used by the algorithm.

  • n_sweeps[out]

    pointer to a rocblas_int on the GPU.

    The actual number of sweeps (iterations) used by the algorithm.

  • W[out]

    pointer to type. Array on the GPU of dimension n.

    The eigenvalues of A in increasing order.

  • info[out]

    pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = 1, the algorithm did not converge.

rocsolver_<type>syevj_batched()#

rocblas_status rocsolver_dsyevj_batched(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssyevj_batched(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#

SYEVJ_BATCHED computes the eigenvalues and optionally the eigenvectors of a batch of real symmetric matrices A_j.

The eigenvalues are found using the iterative Jacobi algorithm and are returned in an order depending on the value of esort. The eigenvectors are computed depending on the value of evect. The computed eigenvectors are orthonormal.

At the \(k\)-th iteration (or “sweep”), \(A_j\) is transformed by a product of Jacobi rotations \(V_j\) as

\[ A_j^{(k)} = V_j' A_j^{(k-1)} V_j \]

such that \(off(A_j^{(k)}) < off(A_j^{(k-1)})\), where \(A_j^{(0)} = A_j\) and \(off(A_j^{(k)})\) is the Frobenius norm of the off-diagonal elements of \(A_j^{(k)}\). As \(off(A_j^{(k)}) \rightarrow 0\), the diagonal elements of \(A_j^{(k)}\) increasingly resemble the eigenvalues of \(A_j\).

Parameters:
  • handle[in] rocblas_handle.

  • esort[in] rocblas_esort

    .

    Specifies the order of the returned eigenvalues. If esort is rocblas_esort_ascending, then the eigenvalues are sorted and returned in ascending order. If esort is rocblas_esort_none, then the order of the returned eigenvalues is unspecified.

  • evect[in] rocblas_evect

    .

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrices A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.

  • n[in]

    rocblas_int. n >= 0.

    Number of rows and columns of matrices A_j.

  • A[inout]

    Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_j. On exit, the eigenvectors of A_j if they were computed and the algorithm converged; otherwise the contents of A_j are unchanged.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • abstol[in]

    type.

    The absolute tolerance. The algorithm is considered to have converged once off(A) is <= abstol. If abstol <= 0, then the tolerance will be set to machine precision.

  • residual[out]

    pointer to type. Array of batch_count scalars on the GPU.

    The Frobenius norm of the off-diagonal elements of A_j (i.e. off(A_j)) at the final iteration.

  • max_sweeps[in]

    rocblas_int. max_sweeps > 0.

    Maximum number of sweeps (iterations) to be used by the algorithm.

  • n_sweeps[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    The actual number of sweeps (iterations) used by the algorithm for each batch instance.

  • W[out]

    pointer to type. Array on the GPU (the size depends on the value of strideW).

    The eigenvalues of A_j in increasing order.

  • strideW[in]

    rocblas_stride.

    Stride from the start of one vector W_j to the next one W_(j+1). There is no restriction for the value of strideW. Normal use case is strideW >= n.

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for matrix A_j. If info[j] = 1, the algorithm did not converge.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>syevj_strided_batched()#

rocblas_status rocsolver_dsyevj_strided_batched(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssyevj_strided_batched(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#

SYEVJ_STRIDED_BATCHED computes the eigenvalues and optionally the eigenvectors of a batch of real symmetric matrices A_j.

The eigenvalues are found using the iterative Jacobi algorithm and are returned in an order depending on the value of esort. The eigenvectors are computed depending on the value of evect. The computed eigenvectors are orthonormal.

At the \(k\)-th iteration (or “sweep”), \(A_j\) is transformed by a product of Jacobi rotations \(V_j\) as

\[ A_j^{(k)} = V_j' A_j^{(k-1)} V_j \]

such that \(off(A_j^{(k)}) < off(A_j^{(k-1)})\), where \(A_j^{(0)} = A_j\) and \(off(A_j^{(k)})\) is the Frobenius norm of the off-diagonal elements of \(A_j^{(k)}\). As \(off(A_j^{(k)}) \rightarrow 0\), the diagonal elements of \(A_j^{(k)}\) increasingly resemble the eigenvalues of \(A_j\).

Parameters:
  • handle[in] rocblas_handle.

  • esort[in] rocblas_esort

    .

    Specifies the order of the returned eigenvalues. If esort is rocblas_esort_ascending, then the eigenvalues are sorted and returned in ascending order. If esort is rocblas_esort_none, then the order of the returned eigenvalues is unspecified.

  • evect[in] rocblas_evect

    .

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower part of the symmetric matrices A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.

  • n[in]

    rocblas_int. n >= 0.

    Number of rows and columns of matrices A_j.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_j. On exit, the eigenvectors of A_j if they were computed and the algorithm converged; otherwise the contents of A_j are unchanged.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • abstol[in]

    type.

    The absolute tolerance. The algorithm is considered to have converged once off(A) is <= abstol. If abstol <= 0, then the tolerance will be set to machine precision.

  • residual[out]

    pointer to type. Array of batch_count scalars on the GPU.

    The Frobenius norm of the off-diagonal elements of A_j (i.e. off(A_j)) at the final iteration.

  • max_sweeps[in]

    rocblas_int. max_sweeps > 0.

    Maximum number of sweeps (iterations) to be used by the algorithm.

  • n_sweeps[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    The actual number of sweeps (iterations) used by the algorithm for each batch instance.

  • W[out]

    pointer to type. Array on the GPU (the size depends on the value of strideW).

    The eigenvalues of A_j in increasing order.

  • strideW[in]

    rocblas_stride.

    Stride from the start of one vector W_j to the next one W_(j+1). There is no restriction for the value of strideW. Normal use case is strideW >= n.

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for matrix A_j. If info[j] = 1, the algorithm did not converge.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>heevj()#

rocblas_status rocsolver_zheevj(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, rocblas_int *info)#
rocblas_status rocsolver_cheevj(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, rocblas_int *info)#

HEEVJ computes the eigenvalues and optionally the eigenvectors of a complex Hermitian matrix A.

The eigenvalues are found using the iterative Jacobi algorithm and are returned in an order depending on the value of esort. The eigenvectors are computed depending on the value of evect. The computed eigenvectors are orthonormal.

At the \(k\)-th iteration (or “sweep”), \(A\) is transformed by a product of Jacobi rotations \(V\) as

\[ A^{(k)} = V' A^{(k-1)} V \]

such that \(off(A^{(k)}) < off(A^{(k-1)})\), where \(A^{(0)} = A\) and \(off(A^{(k)})\) is the Frobenius norm of the off-diagonal elements of \(A^{(k)}\). As \(off(A^{(k)}) \rightarrow 0\), the diagonal elements of \(A^{(k)}\) increasingly resemble the eigenvalues of \(A\).

Parameters:
  • handle[in] rocblas_handle.

  • esort[in] rocblas_esort

    .

    Specifies the order of the returned eigenvalues. If esort is rocblas_esort_ascending, then the eigenvalues are sorted and returned in ascending order. If esort is rocblas_esort_none, then the order of the returned eigenvalues is unspecified.

  • evect[in] rocblas_evect

    .

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower part of the Hermitian matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

  • n[in]

    rocblas_int. n >= 0.

    Number of rows and columns of matrix A.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the matrix A. On exit, the eigenvectors of A if they were computed and the algorithm converged; otherwise the contents of A are unchanged.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of matrix A.

  • abstol[in]

    real type.

    The absolute tolerance. The algorithm is considered to have converged once off(A) is <= abstol. If abstol <= 0, then the tolerance will be set to machine precision.

  • residual[out]

    pointer to real type on the GPU.

    The Frobenius norm of the off-diagonal elements of A (i.e. off(A)) at the final iteration.

  • max_sweeps[in]

    rocblas_int. max_sweeps > 0.

    Maximum number of sweeps (iterations) to be used by the algorithm.

  • n_sweeps[out]

    pointer to a rocblas_int on the GPU.

    The actual number of sweeps (iterations) used by the algorithm.

  • W[out]

    pointer to real type. Array on the GPU of dimension n.

    The eigenvalues of A in increasing order.

  • info[out]

    pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = 1, the algorithm did not converge.

rocsolver_<type>heevj_batched()#

rocblas_status rocsolver_zheevj_batched(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cheevj_batched(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#

HEEVJ_BATCHED computes the eigenvalues and optionally the eigenvectors of a batch of complex Hermitian matrices A_j.

The eigenvalues are found using the iterative Jacobi algorithm and are returned in an order depending on the value of esort. The eigenvectors are computed depending on the value of evect. The computed eigenvectors are orthonormal.

At the \(k\)-th iteration (or “sweep”), \(A_j\) is transformed by a product of Jacobi rotations \(V_j\) as

\[ A_j^{(k)} = V_j' A_j^{(k-1)} V_j \]

such that \(off(A_j^{(k)}) < off(A_j^{(k-1)})\), where \(A_j^{(0)} = A_j\) and \(off(A_j^{(k)})\) is the Frobenius norm of the off-diagonal elements of \(A_j^{(k)}\). As \(off(A_j^{(k)}) \rightarrow 0\), the diagonal elements of \(A_j^{(k)}\) increasingly resemble the eigenvalues of \(A_j\).

Parameters:
  • handle[in] rocblas_handle.

  • esort[in] rocblas_esort

    .

    Specifies the order of the returned eigenvalues. If esort is rocblas_esort_ascending, then the eigenvalues are sorted and returned in ascending order. If esort is rocblas_esort_none, then the order of the returned eigenvalues is unspecified.

  • evect[in] rocblas_evect

    .

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower part of the Hermitian matrices A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.

  • n[in]

    rocblas_int. n >= 0.

    Number of rows and columns of matrices A_j.

  • A[inout]

    Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the matrices A_j. On exit, the eigenvectors of A_j if they were computed and the algorithm converged; otherwise the contents of A_j are unchanged.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • abstol[in]

    real type.

    The absolute tolerance. The algorithm is considered to have converged once off(A) is <= abstol. If abstol <= 0, then the tolerance will be set to machine precision.

  • residual[out]

    pointer to real type. Array of batch_count scalars on the GPU.

    The Frobenius norm of the off-diagonal elements of A_j (i.e. off(A_j)) at the final iteration.

  • max_sweeps[in]

    rocblas_int. max_sweeps > 0.

    Maximum number of sweeps (iterations) to be used by the algorithm.

  • n_sweeps[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    The actual number of sweeps (iterations) used by the algorithm for each batch instance.

  • W[out]

    pointer to real type. Array on the GPU (the size depends on the value of strideW).

    The eigenvalues of A_j in increasing order.

  • strideW[in]

    rocblas_stride.

    Stride from the start of one vector W_j to the next one W_(j+1). There is no restriction for the value of strideW. Normal use case is strideW >= n.

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for matrix A_j. If info[j] = 1, the algorithm did not converge.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>heevj_strided_batched()#

rocblas_status rocsolver_zheevj_strided_batched(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cheevj_strided_batched(rocblas_handle handle, const rocblas_esort esort, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#

CHEVJ_STRIDED_BATCHED computes the eigenvalues and optionally the eigenvectors of a batch of complex Hermitian matrices A_j.

The eigenvalues are found using the iterative Jacobi algorithm and are returned in an order depending on the value of esort. The eigenvectors are computed depending on the value of evect. The computed eigenvectors are orthonormal.

At the \(k\)-th iteration (or “sweep”), \(A_j\) is transformed by a product of Jacobi rotations \(V_j\) as

\[ A_j^{(k)} = V_j' A_j^{(k-1)} V_j \]

such that \(off(A_j^{(k)}) < off(A_j^{(k-1)})\), where \(A_j^{(0)} = A_j\) and \(off(A_j^{(k)})\) is the Frobenius norm of the off-diagonal elements of \(A_j^{(k)}\). As \(off(A_j^{(k)}) \rightarrow 0\), the diagonal elements of \(A_j^{(k)}\) increasingly resemble the eigenvalues of \(A_j\).

Parameters:
  • handle[in] rocblas_handle.

  • esort[in] rocblas_esort

    .

    Specifies the order of the returned eigenvalues. If esort is rocblas_esort_ascending, then the eigenvalues are sorted and returned in ascending order. If esort is rocblas_esort_none, then the order of the returned eigenvalues is unspecified.

  • evect[in] rocblas_evect

    .

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower part of the Hermitian matrices A_j is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.

  • n[in]

    rocblas_int. n >= 0.

    Number of rows and columns of matrices A_j.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the matrices A_j. On exit, the eigenvectors of A_j if they were computed and the algorithm converged; otherwise the contents of A_j are unchanged.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of matrices A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

  • abstol[in]

    real type.

    The absolute tolerance. The algorithm is considered to have converged once off(A) is <= abstol. If abstol <= 0, then the tolerance will be set to machine precision.

  • residual[out]

    pointer to real type. Array of batch_count scalars on the GPU.

    The Frobenius norm of the off-diagonal elements of A_j (i.e. off(A_j)) at the final iteration.

  • max_sweeps[in]

    rocblas_int. max_sweeps > 0.

    Maximum number of sweeps (iterations) to be used by the algorithm.

  • n_sweeps[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    The actual number of sweeps (iterations) used by the algorithm for each batch instance.

  • W[out]

    pointer to real type. Array on the GPU (the size depends on the value of strideW).

    The eigenvalues of A_j in increasing order.

  • strideW[in]

    rocblas_stride.

    Stride from the start of one vector W_j to the next one W_(j+1). There is no restriction for the value of strideW. Normal use case is strideW >= n.

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit for matrix A_j. If info[j] = 1, the algorithm did not converge.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sygvj()#

rocblas_status rocsolver_dsygvj(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, double *B, const rocblas_int ldb, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, rocblas_int *info)#
rocblas_status rocsolver_ssygvj(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, float *B, const rocblas_int ldb, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, rocblas_int *info)#

SYGVJ computes the eigenvalues and (optionally) eigenvectors of a real generalized symmetric-definite eigenproblem.

The problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A X = \lambda B X & \: \text{1st form,}\\ A B X = \lambda X & \: \text{2nd form, or}\\ B A X = \lambda X & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype. The eigenvalues are found using the iterative Jacobi algorithm, and are returned in ascending order. The eigenvectors are computed depending on the value of evect.

When computed, the matrix Z of eigenvectors is normalized as follows:

\[\begin{split} \begin{array}{cl} Z^T B Z=I & \: \text{if 1st or 2nd form, or}\\ Z^T B^{-1} Z=I & \: \text{if 3rd form.} \end{array} \end{split}\]

Parameters:
  • handle[in] rocblas_handle.

  • itype[in] rocblas_eform

    .

    Specifies the form of the generalized eigenproblem.

  • evect[in] rocblas_evect

    .

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower parts of the matrices A and B are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A and B are not used.

  • n[in]

    rocblas_int. n >= 0.

    The matrix dimensions.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the symmetric matrix A. On exit, if evect is original, the normalized matrix Z of eigenvectors. If evect is none, then the upper or lower triangular part of the matrix A (including the diagonal) is destroyed, depending on the value of uplo.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of A.

  • B[out]

    pointer to type. Array on the GPU of dimension ldb*n.

    On entry, the symmetric positive definite matrix B. On exit, the triangular factor of B as returned by

    POTRF.

  • ldb[in]

    rocblas_int. ldb >= n.

    Specifies the leading dimension of B.

  • abstol[in]

    type.

    The absolute tolerance. The algorithm is considered to have converged once the residual is <= abstol. If abstol <= 0, then the tolerance will be set to machine precision.

  • residual[out]

    pointer to type on the GPU.

    The Frobenius norm of the off-diagonal elements at the final iteration.

  • max_sweeps[in]

    rocblas_int. max_sweeps > 0.

    Maximum number of sweeps (iterations) to be used by the algorithm.

  • n_sweeps[out]

    pointer to a rocblas_int on the GPU.

    The actual number of sweeps (iterations) used by the algorithm.

  • W[out]

    pointer to type. Array on the GPU of dimension n.

    On exit, the eigenvalues in increasing order.

  • info[out]

    pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = 1, the algorithm did not converge. If info = n + i, the leading minor of order i of B is not positive definite.

rocsolver_<type>sygvj_batched()#

rocblas_status rocsolver_dsygvj_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, double *const B[], const rocblas_int ldb, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssygvj_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, float *const B[], const rocblas_int ldb, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#

SYGVJ_BATCHED computes the eigenvalues and (optionally) eigenvectors of a batch of real generalized symmetric-definite eigenproblems.

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_j X_j = \lambda B_j X_j & \: \text{1st form,}\\ A_j B_j X_j = \lambda X_j & \: \text{2nd form, or}\\ B_j A_j X_j = \lambda X_j & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype. The eigenvalues are found using the iterative Jacobi algorithm, and are returned in ascending order. The eigenvectors are computed depending on the value of evect.

When computed, the matrix \(Z_j\) of eigenvectors is normalized as follows:

\[\begin{split} \begin{array}{cl} Z_j^T B_j Z_j=I & \: \text{if 1st or 2nd form, or}\\ Z_j^T B_j^{-1} Z_j=I & \: \text{if 3rd form.} \end{array} \end{split}\]

Parameters:
  • handle[in] rocblas_handle.

  • itype[in] rocblas_eform

    .

    Specifies the form of the generalized eigenproblems.

  • evect[in] rocblas_evect

    .

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower parts of the matrices A_j and B_j are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A_j and B_j are not used.

  • n[in]

    rocblas_int. n >= 0.

    The matrix dimensions.

  • A[inout]

    array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the symmetric matrices A_j. On exit, if evect is original, the normalized matrix Z_j of eigenvectors. If evect is none, then the upper or lower triangular part of the matrices A_j (including the diagonal) are destroyed, depending on the value of uplo.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of A_j.

  • B[out]

    array of pointers to type. Each pointer points to an array on the GPU of dimension ldb*n.

    On entry, the symmetric positive definite matrices B_j. On exit, the triangular factor of B_j as returned by

    POTRF_BATCHED.

  • ldb[in]

    rocblas_int. ldb >= n.

    Specifies the leading dimension of B_j.

  • abstol[in]

    type.

    The absolute tolerance. The algorithm is considered to have converged once the residual is <= abstol. If abstol <= 0, then the tolerance will be set to machine precision.

  • residual[out]

    pointer to type. Array of batch_count scalars on the GPU.

    The Frobenius norm of the off-diagonal elements at the final iteration for each batch instance.

  • max_sweeps[in]

    rocblas_int. max_sweeps > 0.

    Maximum number of sweeps (iterations) to be used by the algorithm.

  • n_sweeps[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    The actual number of sweeps (iterations) used by the algorithm for each batch instance.

  • W[out]

    pointer to type. Array on the GPU (the size depends on the value of strideW).

    On exit, the eigenvalues in increasing order.

  • strideW[in]

    rocblas_stride.

    Stride from the start of one vector W_j to the next one W_(j+1). There is no restriction for the value of strideW. Normal use is strideW >= n.

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit of batch instance j. If info[j] = 1, the algorithm did not converge. If info[j] = n + i, the leading minor of order i of B_j is not positive definite.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>sygvj_strided_batched()#

rocblas_status rocsolver_dsygvj_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *B, const rocblas_int ldb, const rocblas_stride strideB, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssygvj_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *B, const rocblas_int ldb, const rocblas_stride strideB, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#

SYGVJ_STRIDED_BATCHED computes the eigenvalues and (optionally) eigenvectors of a batch of real generalized symmetric-definite eigenproblems.

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_j X_j = \lambda B_j X_j & \: \text{1st form,}\\ A_j B_j X_j = \lambda X_j & \: \text{2nd form, or}\\ B_j A_j X_j = \lambda X_j & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype. The eigenvalues are found using the iterative Jacobi algorithm, and are returned in ascending order. The eigenvectors are computed depending on the value of evect.

When computed, the matrix \(Z_j\) of eigenvectors is normalized as follows:

\[\begin{split} \begin{array}{cl} Z_j^T B_j Z_j=I & \: \text{if 1st or 2nd form, or}\\ Z_j^T B_j^{-1} Z_j=I & \: \text{if 3rd form.} \end{array} \end{split}\]

Parameters:
  • handle[in] rocblas_handle.

  • itype[in] rocblas_eform

    .

    Specifies the form of the generalized eigenproblems.

  • evect[in] rocblas_evect

    .

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower parts of the matrices A_j and B_j are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A_j and B_j are not used.

  • n[in]

    rocblas_int. n >= 0.

    The matrix dimensions.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the symmetric matrices A_j. On exit, if evect is original, the normalized matrix Z_j of eigenvectors. If evect is none, then the upper or lower triangular part of the matrices A_j (including the diagonal) are destroyed, depending on the value of uplo.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use is strideA >= lda*n.

  • B[out]

    pointer to type. Array on the GPU (the size depends on the value of strideB).

    On entry, the symmetric positive definite matrices B_j. On exit, the triangular factor of B_j as returned by

    POTRF_STRIDED_BATCHED.

  • ldb[in]

    rocblas_int. ldb >= n.

    Specifies the leading dimension of B_j.

  • strideB[in]

    rocblas_stride.

    Stride from the start of one matrix B_j to the next one B_(j+1). There is no restriction for the value of strideB. Normal use is strideB >= ldb*n.

  • abstol[in]

    type.

    The absolute tolerance. The algorithm is considered to have converged once the residual is <= abstol. If abstol <= 0, then the tolerance will be set to machine precision.

  • residual[out]

    pointer to type. Array of batch_count scalars on the GPU.

    The Frobenius norm of the off-diagonal elements at the final iteration for each batch instance.

  • max_sweeps[in]

    rocblas_int. max_sweeps > 0.

    Maximum number of sweeps (iterations) to be used by the algorithm.

  • n_sweeps[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    The actual number of sweeps (iterations) used by the algorithm for each batch instance.

  • W[out]

    pointer to type. Array on the GPU (the size depends on the value of strideW).

    On exit, the eigenvalues in increasing order.

  • strideW[in]

    rocblas_stride.

    Stride from the start of one vector W_j to the next one W_(j+1). There is no restriction for the value of strideW. Normal use is strideW >= n.

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit of batch j. If info[j] = 1, the algorithm did not converge. If info[j] = n + i, the leading minor of order i of B_j is not positive definite.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>hegvj()#

rocblas_status rocsolver_zhegvj(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *B, const rocblas_int ldb, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, rocblas_int *info)#
rocblas_status rocsolver_chegvj(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *B, const rocblas_int ldb, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, rocblas_int *info)#

HEGVJ computes the eigenvalues and (optionally) eigenvectors of a complex generalized hermitian-definite eigenproblem.

The problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A X = \lambda B X & \: \text{1st form,}\\ A B X = \lambda X & \: \text{2nd form, or}\\ B A X = \lambda X & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype. The eigenvalues are found using the iterative Jacobi algorithm, and are returned in ascending order. The eigenvectors are computed depending on the value of evect.

When computed, the matrix Z of eigenvectors is normalized as follows:

\[\begin{split} \begin{array}{cl} Z^H B Z=I & \: \text{if 1st or 2nd form, or}\\ Z^H B^{-1} Z=I & \: \text{if 3rd form.} \end{array} \end{split}\]

Parameters:
  • handle[in] rocblas_handle.

  • itype[in] rocblas_eform

    .

    Specifies the form of the generalized eigenproblem.

  • evect[in] rocblas_evect

    .

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower parts of the matrices A and B are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A and B are not used.

  • n[in]

    rocblas_int. n >= 0.

    The matrix dimensions.

  • A[inout]

    pointer to type. Array on the GPU of dimension lda*n.

    On entry, the hermitian matrix A. On exit, if evect is original, the normalized matrix Z of eigenvectors. If evect is none, then the upper or lower triangular part of the matrix A (including the diagonal) is destroyed, depending on the value of uplo.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of A.

  • B[out]

    pointer to type. Array on the GPU of dimension ldb*n.

    On entry, the hermitian positive definite matrix B. On exit, the triangular factor of B as returned by

    POTRF.

  • ldb[in]

    rocblas_int. ldb >= n.

    Specifies the leading dimension of B.

  • abstol[in]

    real type.

    The absolute tolerance. The algorithm is considered to have converged once the residual is <= abstol. If abstol <= 0, then the tolerance will be set to machine precision.

  • residual[out]

    pointer to real type on the GPU.

    The Frobenius norm of the off-diagonal elements at the final iteration.

  • max_sweeps[in]

    rocblas_int. max_sweeps > 0.

    Maximum number of sweeps (iterations) to be used by the algorithm.

  • n_sweeps[out]

    pointer to a rocblas_int on the GPU.

    The actual number of sweeps (iterations) used by the algorithm.

  • W[out]

    pointer to real type. Array on the GPU of dimension n.

    On exit, the eigenvalues in increasing order.

  • info[out]

    pointer to a rocblas_int on the GPU.

    If info = 0, successful exit. If info = 1, the algorithm did not converge. If info = n + i, the leading minor of order i of B is not positive definite.

rocsolver_<type>hegvj_batched()#

rocblas_status rocsolver_zhegvj_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *const B[], const rocblas_int ldb, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_chegvj_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *const B[], const rocblas_int ldb, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#

HEGVJ_BATCHED computes the eigenvalues and (optionally) eigenvectors of a batch of complex generalized hermitian-definite eigenproblems.

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_j X_j = \lambda B_j X_j & \: \text{1st form,}\\ A_j B_j X_j = \lambda X_j & \: \text{2nd form, or}\\ B_j A_j X_j = \lambda X_j & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype. The eigenvalues are found using the iterative Jacobi algorithm, and are returned in ascending order. The eigenvectors are computed depending on the value of evect.

When computed, the matrix \(Z_j\) of eigenvectors is normalized as follows:

\[\begin{split} \begin{array}{cl} Z_j^H B_j Z_j=I & \: \text{if 1st or 2nd form, or}\\ Z_j^H B_j^{-1} Z_j=I & \: \text{if 3rd form.} \end{array} \end{split}\]

Parameters:
  • handle[in] rocblas_handle.

  • itype[in] rocblas_eform

    .

    Specifies the form of the generalized eigenproblems.

  • evect[in] rocblas_evect

    .

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower parts of the matrices A_j and B_j are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A_j and B_j are not used.

  • n[in]

    rocblas_int. n >= 0.

    The matrix dimensions.

  • A[inout]

    array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

    On entry, the hermitian matrices A_j. On exit, if evect is original, the normalized matrix Z_j of eigenvectors. If evect is none, then the upper or lower triangular part of the matrices A_j (including the diagonal) are destroyed, depending on the value of uplo.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of A_j.

  • B[out]

    array of pointers to type. Each pointer points to an array on the GPU of dimension ldb*n.

    On entry, the hermitian positive definite matrices B_j. On exit, the triangular factor of B_j as returned by

    POTRF_BATCHED.

  • ldb[in]

    rocblas_int. ldb >= n.

    Specifies the leading dimension of B_j.

  • abstol[in]

    real type.

    The absolute tolerance. The algorithm is considered to have converged once the residual is <= abstol. If abstol <= 0, then the tolerance will be set to machine precision.

  • residual[out]

    pointer to real type. Array of batch_count scalars on the GPU.

    The Frobenius norm of the off-diagonal elements at the final iteration for each batch instance.

  • max_sweeps[in]

    rocblas_int. max_sweeps > 0.

    Maximum number of sweeps (iterations) to be used by the algorithm.

  • n_sweeps[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    The actual number of sweeps (iterations) used by the algorithm for each batch instance.

  • W[out]

    pointer to real type. Array on the GPU (the size depends on the value of strideW).

    On exit, the eigenvalues in increasing order.

  • strideW[in]

    rocblas_stride.

    Stride from the start of one vector W_j to the next one W_(j+1). There is no restriction for the value of strideW. Normal use is strideW >= n.

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit of batch j. If info[j] = 1, the algorithm did not converge. If info[j] = n + i, the leading minor of order i of B_j is not positive definite.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.

rocsolver_<type>hegvj_strided_batched()#

rocblas_status rocsolver_zhegvj_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *B, const rocblas_int ldb, const rocblas_stride strideB, const double abstol, double *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, double *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_chegvj_strided_batched(rocblas_handle handle, const rocblas_eform itype, const rocblas_evect evect, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *B, const rocblas_int ldb, const rocblas_stride strideB, const float abstol, float *residual, const rocblas_int max_sweeps, rocblas_int *n_sweeps, float *W, const rocblas_stride strideW, rocblas_int *info, const rocblas_int batch_count)#

HEGVJ_STRIDED_BATCHED computes the eigenvalues and (optionally) eigenvectors of a batch of complex generalized hermitian-definite eigenproblems.

For each instance in the batch, the problem solved by this function is either of the form

\[\begin{split} \begin{array}{cl} A_j X_j = \lambda B_j X_j & \: \text{1st form,}\\ A_j B_j X_j = \lambda X_j & \: \text{2nd form, or}\\ B_j A_j X_j = \lambda X_j & \: \text{3rd form,} \end{array} \end{split}\]

depending on the value of itype. The eigenvalues are found using the iterative Jacobi algorithm, and are returned in ascending order. The eigenvectors are computed depending on the value of evect.

When computed, the matrix \(Z_j\) of eigenvectors is normalized as follows:

\[\begin{split} \begin{array}{cl} Z_j^H B_j Z_j=I & \: \text{if 1st or 2nd form, or}\\ Z_j^H B_j^{-1} Z_j=I & \: \text{if 3rd form.} \end{array} \end{split}\]

Parameters:
  • handle[in] rocblas_handle.

  • itype[in] rocblas_eform

    .

    Specifies the form of the generalized eigenproblems.

  • evect[in] rocblas_evect

    .

    Specifies whether the eigenvectors are to be computed. If evect is rocblas_evect_original, then the eigenvectors are computed. rocblas_evect_tridiagonal is not supported.

  • uplo[in]

    rocblas_fill.

    Specifies whether the upper or lower parts of the matrices A_j and B_j are stored. If uplo indicates lower (or upper), then the upper (or lower) parts of A_j and B_j are not used.

  • n[in]

    rocblas_int. n >= 0.

    The matrix dimensions.

  • A[inout]

    pointer to type. Array on the GPU (the size depends on the value of strideA).

    On entry, the hermitian matrices A_j. On exit, if evect is original, the normalized matrix Z_j of eigenvectors. If evect is none, then the upper or lower triangular part of the matrices A_j (including the diagonal) are destroyed, depending on the value of uplo.

  • lda[in]

    rocblas_int. lda >= n.

    Specifies the leading dimension of A_j.

  • strideA[in]

    rocblas_stride.

    Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use is strideA >= lda*n.

  • B[out]

    pointer to type. Array on the GPU (the size depends on the value of strideB).

    On entry, the hermitian positive definite matrices B_j. On exit, the triangular factor of B_j as returned by

    POTRF_STRIDED_BATCHED.

  • ldb[in]

    rocblas_int. ldb >= n.

    Specifies the leading dimension of B_j.

  • strideB[in]

    rocblas_stride.

    Stride from the start of one matrix B_j to the next one B_(j+1). There is no restriction for the value of strideB. Normal use is strideB >= ldb*n.

  • abstol[in]

    real type.

    The absolute tolerance. The algorithm is considered to have converged once the residual is <= abstol. If abstol <= 0, then the tolerance will be set to machine precision.

  • residual[out]

    pointer to real type. Array of batch_count scalars on the GPU.

    The Frobenius norm of the off-diagonal elements at the final iteration for each batch instance.

  • max_sweeps[in]

    rocblas_int. max_sweeps > 0.

    Maximum number of sweeps (iterations) to be used by the algorithm.

  • n_sweeps[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    The actual number of sweeps (iterations) used by the algorithm for each batch instance.

  • W[out]

    pointer to real type. Array on the GPU (the size depends on the value of strideW).

    On exit, the eigenvalues in increasing order.

  • strideW[in]

    rocblas_stride.

    Stride from the start of one vector W_j to the next one W_(j+1). There is no restriction for the value of strideW. Normal use is strideW >= n.

  • info[out]

    pointer to rocblas_int. Array of batch_count integers on the GPU.

    If info[j] = 0, successful exit of batch j. If info[j] = 1, the algorithm did not converge. If info[j] = n + i, the leading minor of order i of B_j is not positive definite.

  • batch_count[in]

    rocblas_int. batch_count >= 0.

    Number of matrices in the batch.