3.5. Re-factorization and direct solvers#
These are functions that implement direct solvers for sparse systems with different coefficient matrices that share the same sparsity pattern. The re-factorization functions are divided into the following categories:
Initialization and meta data. Basic functions to initialize and destroy meta data.
Triangular re-factorization. Re-factorization of new matrices given a known sparsity pattern.
Direct sparse solvers. Based on triangular re-factorization.
Note
Throughout the APIs’ descriptions, we use the following notations:
x[i] stands for the i-th element of vector x, while A[i,j] represents the element in the i-th row and j-th column of matrix A. Indices are 1-based, i.e. x[1] is the first element of x.
If X is a real vector or matrix, \(X^T\) indicates its transpose; if X is complex, then \(X^H\) represents its conjugate transpose. When X could be real or complex, we use X’ to indicate X transposed or X conjugate transposed, accordingly.
x_i \(=x_i\); we sometimes use both notations, \(x_i\) when displaying mathematical equations, and x_i in the text describing the function parameters.
3.5.1. Initialization and meta data#
3.5.1.1. rocsolver_create_rfinfo()#
-
rocblas_status rocsolver_create_rfinfo(rocsolver_rfinfo *rfinfo, rocblas_handle handle)#
CREATE_RFINFO initializes the structure rfinfo, required by the re-factorization functions CSRRF_REFACTLU and CSRRF_SOLVE, that contains the meta data and descriptors of the involved matrices.
- Parameters:
rfinfo – [out] rocsolver_rfinfo
.
The pointer to the rfinfo struct to be initialized.
handle – [in] rocblas_handle.
3.5.1.2. rocsolver_destroy_rfinfo()#
-
rocblas_status rocsolver_destroy_rfinfo(rocsolver_rfinfo rfinfo)#
DESTROY_RFINFO destroys the structure rfinfo used by the re-factorization functions CSRRF_REFACTLU and CSRRF_SOLVE.
- Parameters:
rfinfo – [in] rocsolver_rfinfo
.
The rfinfo struct to be destroyed.
3.5.1.3. rocsolver_csrrf_analysis()#
-
rocblas_status rocsolver_dcsrrf_analysis(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, const rocblas_int nnzM, rocblas_int *ptrM, rocblas_int *indM, double *valM, const rocblas_int nnzT, rocblas_int *ptrT, rocblas_int *indT, double *valT, rocblas_int *pivP, rocblas_int *pivQ, double *B, const rocblas_int ldb, rocsolver_rfinfo rfinfo)#
-
rocblas_status rocsolver_scsrrf_analysis(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, const rocblas_int nnzM, rocblas_int *ptrM, rocblas_int *indM, float *valM, const rocblas_int nnzT, rocblas_int *ptrT, rocblas_int *indT, float *valT, rocblas_int *pivP, rocblas_int *pivQ, float *B, const rocblas_int ldb, rocsolver_rfinfo rfinfo)#
CSRRF_ANALYSIS performs the analysis phase required by the re-factorization functions CSRRF_REFACTLU and CSRRF_SOLVE.
Consider a sparse matrix \(M\) previously factorized as
\[ PMQ = L_MU_M \]where \(L_M\) is lower triangular with unit diagonal, \(U_M\) is upper triangular, and \(P\) and \(Q\) are permutation matrices associated with pivoting and re-ordering (to minimize fill-in), respectively. The meta data generated by this routine is collected in the output parameter rfinfo. This information will allow the fast LU re-factorization of another sparse matrix \(A\) as
\[ PAQ = L_AU_A \]and, eventually, the computation of the solution vector \(X\) of any linear system of the form
\[ AX = B \]as long as \(A\) has the same sparsity pattern as the previous matrix \(M\).
This function supposes that the LU factors \(L_M\) and \(U_M\) are passed in a bundle matrix \(T=(L_M-I)+U_M\) as returned by CSRRF_SUMLU, and that rfinfo has been initialized by RFINFO_CREATE.
Note
If only a re-factorization will be executed (i.e. no solver phase), then nrhs can be set to zero and B can be null.
- Parameters:
handle – [in] rocblas_handle.
n – [in]
rocblas_int. n >= 0.
The number of rows (and columns) of matrix M.
nrhs – [in]
rocblas_int. nrhs >= 0.
The number of right-hand-sides (columns of matrix B). Set nrhs to zero when only the re-factorization is needed.
nnzM – [in]
rocblas_int. nnzM >= 0.
The number of non-zero elements in M.
ptrM – [in]
pointer to rocblas_int. Array on the GPU of dimension n+1.
It contains the positions of the beginning of each row in indM and valM. The last element of ptrM is equal to nnzM.
indM – [in]
pointer to rocblas_int. Array on the GPU of dimension nnzM.
It contains the column indices of the non-zero elements of M. Indices are sorted by row and by column within each row.
valM – [in]
pointer to type. Array on the GPU of dimension nnzM.
The values of the non-zero elements of M.
nnzT – [in]
rocblas_int. nnzT >= 0.
The number of non-zero elements in T.
ptrT – [in]
pointer to rocblas_int. Array on the GPU of dimension n+1.
It contains the positions of the beginning of each row in indT and valT. The last element of ptrT is equal to nnzT.
indT – [in]
pointer to rocblas_int. Array on the GPU of dimension nnzT.
It contains the column indices of the non-zero elements of T. Indices are sorted by row and by column within each row.
valT – [in]
pointer to type. Array on the GPU of dimension nnzT.
The values of the non-zero elements of T.
pivP – [in]
pointer to rocblas_int. Array on the GPU of dimension n.
Contains the pivot indices representing the permutation matrix P, i.e. the order in which the rows of matrix M were re-arranged.
pivQ – [in]
pointer to rocblas_int. Array on the GPU of dimension n.
Contains the pivot indices representing the permutation matrix Q, i.e. the order in which the columns of matrix M were re-arranged.
B – [in]
pointer to type. Array on the GPU of dimension ldb*nrhs.
The right hand side matrix B. It can be null if only the re-factorization is needed.
ldb – [in] rocblas_int. ldb >= n. The leading dimension of B.
rfinfo – [out]
rocsolver_rfinfo.
Structure that holds the meta data generated in the analysis phase.
3.5.2. Triangular re-factorization#
3.5.2.1. rocsolver_<type>csrrf_sumlu()#
-
rocblas_status rocsolver_dcsrrf_sumlu(rocblas_handle handle, const rocblas_int n, const rocblas_int nnzL, rocblas_int *ptrL, rocblas_int *indL, double *valL, const rocblas_int nnzU, rocblas_int *ptrU, rocblas_int *indU, double *valU, rocblas_int *ptrT, rocblas_int *indT, double *valT)#
-
rocblas_status rocsolver_scsrrf_sumlu(rocblas_handle handle, const rocblas_int n, const rocblas_int nnzL, rocblas_int *ptrL, rocblas_int *indL, float *valL, const rocblas_int nnzU, rocblas_int *ptrU, rocblas_int *indU, float *valU, rocblas_int *ptrT, rocblas_int *indT, float *valT)#
CSRRF_SUMLU bundles the factors \(L\) and \(U\), associated with the LU factorization of a sparse matrix \(A\), into a single sparse matrix \(T=(L-I)+U\).
Factor \(L\) is a sparse lower triangular matrix with unit diagonal elements, and \(U\) is a sparse upper triangular matrix. The resulting sparse matrix \(T\) combines both sparse factors without storing the unit diagonal; in other words, the number of non-zero elements of T, nnzT, is given by nnzT = nnzL - n + nnzU.
- Parameters:
handle – [in] rocblas_handle.
n – [in]
rocblas_int. n >= 0.
The number of rows (and columns) of matrix A.
nnzL – [in]
rocblas_int. nnzL >= n.
The number of non-zero elements in L.
ptrL – [in]
pointer to rocblas_int. Array on the GPU of dimension n+1.
It contains the positions of the beginning of each row in indL and valL. The last element of ptrL is equal to nnzL.
indL – [in]
pointer to rocblas_int. Array on the GPU of dimension nnzL.
It contains the column indices of the non-zero elements of L. Indices are sorted by row and by column within each row.
valL – [in]
pointer to type. Array on the GPU of dimension nnzL.
The values of the non-zero elements of L.
nnzU – [in]
rocblas_int. nnzU >= 0.
The number of non-zero elements in U.
ptrU – [in]
pointer to rocblas_int. Array on the GPU of dimension n+1.
It contains the positions of the beginning of each row in indU and valU. The last element of ptrU is equal to nnzU.
indU – [in]
pointer to rocblas_int. Array on the GPU of dimension nnzU.
It contains the column indices of the non-zero elements of U. Indices are sorted by row and by column within each row.
valU – [in]
pointer to type. Array on the GPU of dimension nnzU.
The values of the non-zero elements of U.
ptrT – [out]
pointer to rocblas_int. Array on the GPU of dimension n+1.
It contains the positions of the beginning of each row in indT and valT. The last element of ptrT is equal to nnzT.
indT – [out]
pointer to rocblas_int. Array on the GPU of dimension nnzT.
It contains the column indices of the non-zero elements of T. Indices are sorted by row and by column within each row.
valT – [out]
pointer to type. Array on the GPU of dimension nnzT.
The values of the non-zero elements of T.
3.5.2.2. rocsolver_<type>csrrf_splitlu()#
-
rocblas_status rocsolver_dcsrrf_splitlu(rocblas_handle handle, const rocblas_int n, const rocblas_int nnzT, rocblas_int *ptrT, rocblas_int *indT, double *valT, rocblas_int *ptrL, rocblas_int *indL, double *valL, rocblas_int *ptrU, rocblas_int *indU, double *valU)#
-
rocblas_status rocsolver_scsrrf_splitlu(rocblas_handle handle, const rocblas_int n, const rocblas_int nnzT, rocblas_int *ptrT, rocblas_int *indT, float *valT, rocblas_int *ptrL, rocblas_int *indL, float *valL, rocblas_int *ptrU, rocblas_int *indU, float *valU)#
CSRRF_SPLITLU splits the factors \(L\) and \(U\), associated with the LU factorization of a sparse matrix \(A\), from a bundled matrix \(T=(L-I)+U\).
Factor \(L\) is a sparse lower triangular matrix with unit diagonal elements, and \(U\) is a sparse upper triangular matrix. Conceptually, on input, U is stored on the diagonal and upper part of \(T\), while the non diagonal elements of \(L\) are stored on the strictly lower part of \(T\).
- Parameters:
handle – [in] rocblas_handle.
n – [in]
rocblas_int. n >= 0.
The number of rows (and columns) of matrix A.
nnzT – [in]
rocblas_int. nnzT >= 0.
The number of non-zero elements in T.
ptrT – [in]
pointer to rocblas_int. Array on the GPU of dimension n+1.
It contains the positions of the beginning of each row in indT and valT. The last element of ptrT is equal to nnzT.
indT – [in]
pointer to rocblas_int. Array on the GPU of dimension nnzT.
It contains the column indices of the non-zero elements of T. Indices are sorted by row and by column within each row.
valT – [in]
pointer to type. Array on the GPU of dimension nnzT.
The values of the non-zero elements of T.
ptrL – [out]
pointer to rocblas_int. Array on the GPU of dimension n+1.
It contains the positions of the beginning of each row in indL and valL. The last element of ptrL is equal to nnzL.
indL – [out]
pointer to rocblas_int. Array on the GPU of dimension nnzL.
It contains the column indices of the non-zero elements of L. Indices are sorted by row and by column within each row. (If nnzL is not known in advance, the size of this array could be set to nnzT + n as an upper bound).
valL – [out]
pointer to type. Array on the GPU of dimension nnzL.
The values of the non-zero elements of L. (If nnzL is not known in advance, the size of this array could be set to nnzT + n as an upper bound).
ptrU – [out]
pointer to rocblas_int. Array on the GPU of dimension n+1.
It contains the positions of the beginning of each row in indU and valU. The last element of ptrU is equal to nnzU.
indU – [out]
pointer to rocblas_int. Array on the GPU of dimension nnzU.
It contains the column indices of the non-zero elements of U. Indices are sorted by row and by column within each row. (If nnzU is not known in advance, the size of this array could be set to nnzT as an upper bound).
valU – [out]
pointer to type. Array on the GPU of dimension nnzU.
The values of the non-zero elements of U. (If nnzU is not known in advance, the size of this array could be set to nnzT as an upper bound).
3.5.2.3. rocsolver_<type>csrrf_refactlu()#
-
rocblas_status rocsolver_dcsrrf_refactlu(rocblas_handle handle, const rocblas_int n, const rocblas_int nnzA, rocblas_int *ptrA, rocblas_int *indA, double *valA, const rocblas_int nnzT, rocblas_int *ptrT, rocblas_int *indT, double *valT, rocblas_int *pivP, rocblas_int *pivQ, rocsolver_rfinfo rfinfo)#
-
rocblas_status rocsolver_scsrrf_refactlu(rocblas_handle handle, const rocblas_int n, const rocblas_int nnzA, rocblas_int *ptrA, rocblas_int *indA, float *valA, const rocblas_int nnzT, rocblas_int *ptrT, rocblas_int *indT, float *valT, rocblas_int *pivP, rocblas_int *pivQ, rocsolver_rfinfo rfinfo)#
CSRRF_REFACTLU performs a fast LU factorization of a sparse matrix \(A\) based on the information from the factorization of a previous matrix \(M\) with the same sparsity pattern (re-factorization).
Consider a sparse matrix \(M\) previously factorized as
\[ PMQ = L_MU_M \]where \(L_M\) is lower triangular with unit diagonal, \(U_M\) is upper triangular, and \(P\) and \(Q\) are permutation matrices associated with pivoting and re-ordering (to minimize fill-in), respectively. If \(A\) has the same sparsity pattern as \(M\), then the re-factorization
\[ PAQ = L_AU_A \]can be computed numerically without any symbolic or analysis phases.
This function supposes that rfinfo has been updated, by function CSRRF_ANALYSIS, after the analysis phase of the previous matrix M and its initial factorization.
- Parameters:
handle – [in] rocblas_handle.
n – [in]
rocblas_int. n >= 0.
The number of rows (and columns) of matrix A.
nnzA – [in]
rocblas_int. nnzA >= 0.
The number of non-zero elements in A.
ptrA – [in]
pointer to rocblas_int. Array on the GPU of dimension n+1.
It contains the positions of the beginning of each row in indA and valA. The last element of ptrM is equal to nnzA.
indA – [in]
pointer to rocblas_int. Array on the GPU of dimension nnzA.
It contains the column indices of the non-zero elements of M. Indices are sorted by row and by column within each row.
valA – [in]
pointer to type. Array on the GPU of dimension nnzA.
The values of the non-zero elements of A.
nnzT – [in]
rocblas_int. nnzT >= 0.
The number of non-zero elements in T.
ptrT – [in]
pointer to rocblas_int. Array on the GPU of dimension n+1.
It contains the positions of the beginning of each row in indT and valT. The last element of ptrT is equal to nnzT.
indT – [in]
pointer to rocblas_int. Array on the GPU of dimension nnzT.
It contains the column indices of the non-zero elements of T. Indices are sorted by row and by column within each row.
valT – [out]
pointer to type. Array on the GPU of dimension nnzT.
The values of the non-zero elements of the new bundle matrix (L_A - I) + U_A.
pivP – [in]
pointer to rocblas_int. Array on the GPU of dimension n.
Contains the pivot indices representing the permutation matrix P, i.e. the order in which the rows of matrix M were re-arranged.
pivQ – [in]
pointer to rocblas_int. Array on the GPU of dimension n.
Contains the pivot indices representing the permutation matrix Q, i.e. the order in which the columns of matrix M were re-arranged.
rfinfo – [in]
rocsolver_rfinfo.
Structure that holds the meta data generated in the analysis phase.
3.5.3. Direct sparse solvers#
3.5.3.1. rocsolver_<type>csrrf_solve()#
-
rocblas_status rocsolver_dcsrrf_solve(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, const rocblas_int nnzT, rocblas_int *ptrT, rocblas_int *indT, double *valT, rocblas_int *pivP, rocblas_int *pivQ, double *B, const rocblas_int ldb, rocsolver_rfinfo rfinfo)#
-
rocblas_status rocsolver_scsrrf_solve(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, const rocblas_int nnzT, rocblas_int *ptrT, rocblas_int *indT, float *valT, rocblas_int *pivP, rocblas_int *pivQ, float *B, const rocblas_int ldb, rocsolver_rfinfo rfinfo)#
CSRRF_SOLVE solves a linear system with sparse coefficient matrix \(A\) in its factorized form.
The linear system is of the form
\[ AX = B \]where the sparse matrix \(A\) is factorized as
\[ PAQ = L_AU_A \]and \(B\) is a dense matrix of right hand sides.
This function supposes that the LU factors \(L_A\) and \(U_A\) are passed in a bundle matrix \(T=(L_A-I)+U_A\) as returned by CSRRF_REFACTLU or CSRRF_SUMLU, and that rfinfo has been updated, by function CSRRF_ANALYSIS, after the analysis phase.
- Parameters:
handle – [in] rocblas_handle.
n – [in]
rocblas_int. n >= 0.
The number of rows (and columns) of matrix A.
nrhs – [in]
rocblas_int. nrhs >= 0.
The number of right hand sides, i.e. the number of columns of matrix B.
nnzT – [in]
rocblas_int. nnzT >= 0.
The number of non-zero elements in T.
ptrT – [in]
pointer to rocblas_int. Array on the GPU of dimension n+1.
It contains the positions of the beginning of each row in indT and valT. The last element of ptrT is equal to nnzT.
indT – [in]
pointer to rocblas_int. Array on the GPU of dimension nnzT.
It contains the column indices of the non-zero elements of T. Indices are sorted by row and by column within each row.
valT – [in]
pointer to type. Array on the GPU of dimension nnzT.
The values of the non-zero elements of T.
pivP – [in]
pointer to rocblas_int. Array on the GPU of dimension n.
Contains the pivot indices representing the permutation matrix P, i.e. the order in which the rows of matrix A were re-arranged.
pivQ – [in]
pointer to rocblas_int. Array on the GPU of dimension n.
Contains the pivot indices representing the permutation matrix Q, i.e. the order in which the columns of matrix A were re-arranged.
B – [inout]
pointer to type. Array on the GPU of dimension ldb*nrhs.
On entry the right hand side matrix B. On exit, the solution matrix X.
ldb – [in]
rocblas_int. ldb >= n.
The leading dimension of B.
rfinfo – [in]
rocsolver_rfinfo.
Structure that holds the meta data generated in the analysis phase.