# rocSOLVER Re-Factorization and Direct Solvers#

These are functions that implement direct solvers for sparse systems with different coefficient matrices that share the same sparsity pattern. The re-factorization functions are divided into the following categories:

Note

Throughout the APIs’ descriptions, we use the following notations:

• i, j, and k are used as general purpose indices. In some legacy LAPACK APIs, k could be a parameter indicating some problem/matrix dimension.

• Depending on the context, when it is necessary to index rows, columns and blocks or submatrices, i is assigned to rows, j to columns and k to blocks. $$l$$ is always used to index matrices/problems in a batch.

• x[i] stands for the i-th element of vector x, while A[i,j] represents the element in the i-th row and j-th column of matrix A. Indices are 1-based, i.e. x[1] is the first element of x.

• To identify a block in a matrix or a matrix in the batch, k and $$l$$ are used as sub-indices

• x_i $$=x_i$$; we sometimes use both notations, $$x_i$$ when displaying mathematical equations, and x_i in the text describing the function parameters.

• If X is a real vector or matrix, $$X^T$$ indicates its transpose; if X is complex, then $$X^H$$ represents its conjugate transpose. When X could be real or complex, we use X’ to indicate X transposed or X conjugate transposed, accordingly.

• When a matrix A is formed as the product of several matrices, the following notation is used: A=M(1)M(2)…M(t).

## Initialization and meta data#

### rocsolver_create_rfinfo()#

rocblas_status rocsolver_create_rfinfo(rocsolver_rfinfo *rfinfo, rocblas_handle handle)#

CREATE_RFINFO initializes the structure rfinfo that contains the meta data and descriptors of the involved matrices required by the re-factorization functions CSRRF_REFACTLU and CSRRF_REFACTCHOL, and by the direct solver CSRRF_SOLVE.

Parameters:
• rfinfo[out] rocsolver_rfinfo. The pointer to the rfinfo struct to be initialized.

• handle[in] rocblas_handle.

### rocsolver_destroy_rfinfo()#

rocblas_status rocsolver_destroy_rfinfo(rocsolver_rfinfo rfinfo)#

DESTROY_RFINFO destroys the structure rfinfo used by the re-factorization functions CSRRF_REFACTLU and CSRRF_REFACTCHOL, and by the direct solver CSRRF_SOLVE.

Parameters:

rfinfo[in] rocsolver_rfinfo. The rfinfo struct to be destroyed.

### rocsolver_set_rfinfo_mode()#

rocblas_status rocsolver_set_rfinfo_mode(rocsolver_rfinfo rfinfo, rocsolver_rfinfo_mode mode)#

SET_RFINFO_MODE sets the mode of the structure rfinfo required by the re-factorization functions CSRRF_REFACTLU and CSRRF_REFACTCHOL, and by the direct solver CSRRF_SOLVE.

Parameters:

### rocsolver_get_rfinfo_mode()#

rocblas_status rocsolver_get_rfinfo_mode(rocsolver_rfinfo rfinfo, rocsolver_rfinfo_mode *mode)#

GET_RFINFO_MODE gets the mode of the structure rfinfo required by the re-factorization functions CSRRF_REFACTLU and CSRRF_REFACTCHOL, and by the direct solver CSRRF_SOLVE.

Parameters:

### rocsolver_csrrf_analysis()#

rocblas_status rocsolver_dcsrrf_analysis(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, const rocblas_int nnzM, rocblas_int *ptrM, rocblas_int *indM, double *valM, const rocblas_int nnzT, rocblas_int *ptrT, rocblas_int *indT, double *valT, rocblas_int *pivP, rocblas_int *pivQ, double *B, const rocblas_int ldb, rocsolver_rfinfo rfinfo)#
rocblas_status rocsolver_scsrrf_analysis(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, const rocblas_int nnzM, rocblas_int *ptrM, rocblas_int *indM, float *valM, const rocblas_int nnzT, rocblas_int *ptrT, rocblas_int *indT, float *valT, rocblas_int *pivP, rocblas_int *pivQ, float *B, const rocblas_int ldb, rocsolver_rfinfo rfinfo)#

CSRRF_ANALYSIS performs the analysis phase required by the re-factorization functions CSRRF_REFACTLU and CSRRF_REFACTCHOL, and by the direct solver CSRRF_SOLVE.

Consider a sparse matrix $$M$$ previously factorized as

$Q^TMQ = L_ML_M^T$

(Cholesky factorization for the symmetric positive definite case), or

$PMQ = L_MU_M$

(LU factorization for the general case)

where $$L_M$$ is lower triangular (with unit diagonal in the general case), $$U_M$$ is upper triangular, and $$P$$ and $$Q$$ are permutation matrices associated with pivoting and re-ordering (to minimize fill-in), respectively. The meta data generated by this routine is collected in the output parameter rfinfo. This information will allow the fast re-factorization of another sparse matrix $$A$$ as

$Q^TAQ = L_AL_A^T, \quad \text{or}$

$PAQ = L_AU_A,$

and, eventually, the computation of the solution vector $$X$$ of any linear system of the form

$AX = B$

as long as $$A$$ has the same sparsity pattern as the previous matrix $$M$$.

This function supposes that the rfinfo struct has been initialized by RFINFO_CREATE. By default, rfinfo is set up to work with the LU factorization (general matrices). If the matrix is symmetric positive definite, and the Cholesky factorization is desired, then the corresponding mode must be manually set up by SET_RFINFO_MODE. This function does not automatically detect symmetry.

For the LU factorization mode, the LU factors $$L_M$$ and $$U_M$$ must be passed in a bundle matrix $$T=(L_M-I)+U_M$$ as returned by CSRRF_SUMLU. For the Cholesky mode, the lower triangular part of $$T$$ must contain the Cholesky factor $$L_M$$; the strictly upper triangular part of $$T$$ will be ignored. Similarly, the strictly upper triangular part of $$M$$ is ignored when working in Cholesky mode.

Note

If only a re-factorization will be executed (i.e. no solver phase), then nrhs can be set to zero and B can be null.

Parameters:
• handle[in] rocblas_handle.

• n[in] rocblas_int. n >= 0. The number of rows (and columns) of matrix M.

• nrhs[in] rocblas_int. nrhs >= 0. The number of right-hand-sides (columns of matrix B). Set nrhs to zero when only the re-factorization is needed.

• nnzM[in] rocblas_int. nnzM >= 0. The number of non-zero elements in M.

• ptrM[in] pointer to rocblas_int. Array on the GPU of dimension n+1. It contains the positions of the beginning of each row in indM and valM. The last element of ptrM is equal to nnzM.

• indM[in] pointer to rocblas_int. Array on the GPU of dimension nnzM. It contains the column indices of the non-zero elements of M. Indices are sorted by row and by column within each row.

• valM[in] pointer to type. Array on the GPU of dimension nnzM. The values of the non-zero elements of M. The strictly upper triangular entries are not referenced when working in Cholesky mode.

• nnzT[in] rocblas_int. nnzT >= 0. The number of non-zero elements in T.

• ptrT[in] pointer to rocblas_int. Array on the GPU of dimension n+1. It contains the positions of the beginning of each row in indT and valT. The last element of ptrT is equal to nnzT.

• indT[in] pointer to rocblas_int. Array on the GPU of dimension nnzT. It contains the column indices of the non-zero elements of T. Indices are sorted by row and by column within each row.

• valT[in] pointer to type. Array on the GPU of dimension nnzT. The values of the non-zero elements of T. The strictly upper triangular entries are not referenced when working in Cholesky mode.

• pivP[in] pointer to rocblas_int. Array on the GPU of dimension n. Contains the pivot indices representing the permutation matrix P, i.e. the order in which the rows of matrix M were re-arranged. When working in Cholesky mode, this array is not referenced and can be null.

• pivQ[in] pointer to rocblas_int. Array on the GPU of dimension n. Contains the pivot indices representing the permutation matrix Q, i.e. the order in which the columns of matrix M were re-arranged.

• B[in] pointer to type. Array on the GPU of dimension ldb*nrhs. The right hand side matrix B. It can be null if only the re-factorization is needed.

• ldb[in] rocblas_int. ldb >= n. The leading dimension of B.

• rfinfo[out] rocsolver_rfinfo. Structure that holds the meta data generated in the analysis phase.

## Triangular re-factorization#

### rocsolver_<type>csrrf_sumlu()#

rocblas_status rocsolver_dcsrrf_sumlu(rocblas_handle handle, const rocblas_int n, const rocblas_int nnzL, rocblas_int *ptrL, rocblas_int *indL, double *valL, const rocblas_int nnzU, rocblas_int *ptrU, rocblas_int *indU, double *valU, rocblas_int *ptrT, rocblas_int *indT, double *valT)#
rocblas_status rocsolver_scsrrf_sumlu(rocblas_handle handle, const rocblas_int n, const rocblas_int nnzL, rocblas_int *ptrL, rocblas_int *indL, float *valL, const rocblas_int nnzU, rocblas_int *ptrU, rocblas_int *indU, float *valU, rocblas_int *ptrT, rocblas_int *indT, float *valT)#

CSRRF_SUMLU bundles the factors $$L$$ and $$U$$, associated with the LU factorization of a sparse matrix $$A$$, into a single sparse matrix $$T=(L-I)+U$$.

Factor $$L$$ is a sparse lower triangular matrix with unit diagonal elements, and $$U$$ is a sparse upper triangular matrix. The resulting sparse matrix $$T$$ combines both sparse factors without storing the unit diagonal; in other words, the number of non-zero elements of T, nnzT, is given by nnzT = nnzL - n + nnzU.

Parameters:
• handle[in] rocblas_handle.

• n[in] rocblas_int. n >= 0. The number of rows (and columns) of matrix A.

• nnzL[in] rocblas_int. nnzL >= n. The number of non-zero elements in L.

• ptrL[in] pointer to rocblas_int. Array on the GPU of dimension n+1. It contains the positions of the beginning of each row in indL and valL. The last element of ptrL is equal to nnzL.

• indL[in] pointer to rocblas_int. Array on the GPU of dimension nnzL. It contains the column indices of the non-zero elements of L. Indices are sorted by row and by column within each row.

• valL[in] pointer to type. Array on the GPU of dimension nnzL. The values of the non-zero elements of L.

• nnzU[in] rocblas_int. nnzU >= 0. The number of non-zero elements in U.

• ptrU[in] pointer to rocblas_int. Array on the GPU of dimension n+1. It contains the positions of the beginning of each row in indU and valU. The last element of ptrU is equal to nnzU.

• indU[in] pointer to rocblas_int. Array on the GPU of dimension nnzU. It contains the column indices of the non-zero elements of U. Indices are sorted by row and by column within each row.

• valU[in] pointer to type. Array on the GPU of dimension nnzU. The values of the non-zero elements of U.

• ptrT[out] pointer to rocblas_int. Array on the GPU of dimension n+1. It contains the positions of the beginning of each row in indT and valT. The last element of ptrT is equal to nnzT.

• indT[out] pointer to rocblas_int. Array on the GPU of dimension nnzT. It contains the column indices of the non-zero elements of T. Indices are sorted by row and by column within each row.

• valT[out] pointer to type. Array on the GPU of dimension nnzT. The values of the non-zero elements of T.

### rocsolver_<type>csrrf_splitlu()#

rocblas_status rocsolver_dcsrrf_splitlu(rocblas_handle handle, const rocblas_int n, const rocblas_int nnzT, rocblas_int *ptrT, rocblas_int *indT, double *valT, rocblas_int *ptrL, rocblas_int *indL, double *valL, rocblas_int *ptrU, rocblas_int *indU, double *valU)#
rocblas_status rocsolver_scsrrf_splitlu(rocblas_handle handle, const rocblas_int n, const rocblas_int nnzT, rocblas_int *ptrT, rocblas_int *indT, float *valT, rocblas_int *ptrL, rocblas_int *indL, float *valL, rocblas_int *ptrU, rocblas_int *indU, float *valU)#

CSRRF_SPLITLU splits the factors $$L$$ and $$U$$, associated with the LU factorization of a sparse matrix $$A$$, from a bundled matrix $$T=(L-I)+U$$.

Factor $$L$$ is a sparse lower triangular matrix with unit diagonal elements, and $$U$$ is a sparse upper triangular matrix. Conceptually, on input, U is stored on the diagonal and upper part of $$T$$, while the non diagonal elements of $$L$$ are stored on the strictly lower part of $$T$$.

Parameters:
• handle[in] rocblas_handle.

• n[in] rocblas_int. n >= 0. The number of rows (and columns) of matrix A.

• nnzT[in] rocblas_int. nnzT >= 0. The number of non-zero elements in T.

• ptrT[in] pointer to rocblas_int. Array on the GPU of dimension n+1. It contains the positions of the beginning of each row in indT and valT. The last element of ptrT is equal to nnzT.

• indT[in] pointer to rocblas_int. Array on the GPU of dimension nnzT. It contains the column indices of the non-zero elements of T. Indices are sorted by row and by column within each row.

• valT[in] pointer to type. Array on the GPU of dimension nnzT. The values of the non-zero elements of T.

• ptrL[out] pointer to rocblas_int. Array on the GPU of dimension n+1. It contains the positions of the beginning of each row in indL and valL. The last element of ptrL is equal to nnzL.

• indL[out] pointer to rocblas_int. Array on the GPU of dimension nnzL. It contains the column indices of the non-zero elements of L. Indices are sorted by row and by column within each row. (If nnzL is not known in advance, the size of this array could be set to nnzT + n as an upper bound).

• valL[out] pointer to type. Array on the GPU of dimension nnzL. The values of the non-zero elements of L. (If nnzL is not known in advance, the size of this array could be set to nnzT + n as an upper bound).

• ptrU[out] pointer to rocblas_int. Array on the GPU of dimension n+1. It contains the positions of the beginning of each row in indU and valU. The last element of ptrU is equal to nnzU.

• indU[out] pointer to rocblas_int. Array on the GPU of dimension nnzU. It contains the column indices of the non-zero elements of U. Indices are sorted by row and by column within each row. (If nnzU is not known in advance, the size of this array could be set to nnzT as an upper bound).

• valU[out] pointer to type. Array on the GPU of dimension nnzU. The values of the non-zero elements of U. (If nnzU is not known in advance, the size of this array could be set to nnzT as an upper bound).

### rocsolver_<type>csrrf_refactlu()#

rocblas_status rocsolver_dcsrrf_refactlu(rocblas_handle handle, const rocblas_int n, const rocblas_int nnzA, rocblas_int *ptrA, rocblas_int *indA, double *valA, const rocblas_int nnzT, rocblas_int *ptrT, rocblas_int *indT, double *valT, rocblas_int *pivP, rocblas_int *pivQ, rocsolver_rfinfo rfinfo)#
rocblas_status rocsolver_scsrrf_refactlu(rocblas_handle handle, const rocblas_int n, const rocblas_int nnzA, rocblas_int *ptrA, rocblas_int *indA, float *valA, const rocblas_int nnzT, rocblas_int *ptrT, rocblas_int *indT, float *valT, rocblas_int *pivP, rocblas_int *pivQ, rocsolver_rfinfo rfinfo)#

CSRRF_REFACTLU performs a fast LU factorization of a sparse matrix $$A$$ based on the information from the factorization of a previous matrix $$M$$ with the same sparsity pattern (re-factorization).

Consider a sparse matrix $$M$$ previously factorized as

$PMQ = L_MU_M$

where $$L_M$$ is lower triangular with unit diagonal, $$U_M$$ is upper triangular, and $$P$$ and $$Q$$ are permutation matrices associated with pivoting and re-ordering (to minimize fill-in), respectively. If $$A$$ has the same sparsity pattern as $$M$$, then the re-factorization

$PAQ = L_AU_A$

can be computed numerically without a symbolic analysis phase.

This function supposes that rfinfo has been updated, by function CSRRF_ANALYSIS, after the analysis phase of the previous matrix M and its initial factorization. Both functions, CSRRF_ANALYSIS and CSRRF_REFACTLU must be run with the same rfinfo mode (LU factorization, the default mode), otherwise the workflow will result in an error.

Parameters:
• handle[in] rocblas_handle.

• n[in] rocblas_int. n >= 0. The number of rows (and columns) of matrix A.

• nnzA[in] rocblas_int. nnzA >= 0. The number of non-zero elements in A.

• ptrA[in] pointer to rocblas_int. Array on the GPU of dimension n+1. It contains the positions of the beginning of each row in indA and valA. The last element of ptrM is equal to nnzA.

• indA[in] pointer to rocblas_int. Array on the GPU of dimension nnzA. It contains the column indices of the non-zero elements of M. Indices are sorted by row and by column within each row.

• valA[in] pointer to type. Array on the GPU of dimension nnzA. The values of the non-zero elements of A.

• nnzT[in] rocblas_int. nnzT >= 0. The number of non-zero elements in T.

• ptrT[in] pointer to rocblas_int. Array on the GPU of dimension n+1. It contains the positions of the beginning of each row in indT and valT. The last element of ptrT is equal to nnzT.

• indT[in] pointer to rocblas_int. Array on the GPU of dimension nnzT. It contains the column indices of the non-zero elements of T. Indices are sorted by row and by column within each row.

• valT[out] pointer to type. Array on the GPU of dimension nnzT. The values of the non-zero elements of the new bundle matrix (L_A - I) + U_A.

• pivP[in] pointer to rocblas_int. Array on the GPU of dimension n. Contains the pivot indices representing the permutation matrix P, i.e. the order in which the rows of matrix M were re-arranged.

• pivQ[in] pointer to rocblas_int. Array on the GPU of dimension n. Contains the pivot indices representing the permutation matrix Q, i.e. the order in which the columns of matrix M were re-arranged.

• rfinfo[in] rocsolver_rfinfo. Structure that holds the meta data generated in the analysis phase.

### rocsolver_<type>csrrf_refactchol()#

rocblas_status rocsolver_dcsrrf_refactchol(rocblas_handle handle, const rocblas_int n, const rocblas_int nnzA, rocblas_int *ptrA, rocblas_int *indA, double *valA, const rocblas_int nnzT, rocblas_int *ptrT, rocblas_int *indT, double *valT, rocblas_int *pivQ, rocsolver_rfinfo rfinfo)#
rocblas_status rocsolver_scsrrf_refactchol(rocblas_handle handle, const rocblas_int n, const rocblas_int nnzA, rocblas_int *ptrA, rocblas_int *indA, float *valA, const rocblas_int nnzT, rocblas_int *ptrT, rocblas_int *indT, float *valT, rocblas_int *pivQ, rocsolver_rfinfo rfinfo)#

CSRRF_REFACTCHOL performs a fast Cholesky factorization of a sparse symmetric positive definite matrix $$A$$ based on the information from the factorization of a previous matrix $$M$$ with the same sparsity pattern (re-factorization).

Consider a sparse matrix $$M$$ previously factorized as

$Q^TMQ = L_ML_M^T$

where $$L_M$$ is lower triangular, and $$Q$$ is a permutation matrices associated with re-ordering to minimize fill-in. If $$A$$ has the same sparsity pattern as $$M$$, then the re-factorization

$Q^TAQ = L_AL_A^T$

can be computed numerically without a symbolic analysis phase.

This function supposes that rfinfo has been updated by function CSRRF_ANALYSIS, after the analysis phase of the previous matrix M and its initial factorization. Both functions, CSRRF_ANALYSIS and CSRRF_REFACTCHOL must be run with the same rfinfo mode (Cholesky factorization), otherwise the workflow will result in an error.

Parameters:
• handle[in] rocblas_handle.

• n[in] rocblas_int. n >= 0. The number of rows (and columns) of matrix A.

• nnzA[in] rocblas_int. nnzA >= 0. The number of non-zero elements in A.

• ptrA[in] pointer to rocblas_int. Array on the GPU of dimension n+1. It contains the positions of the beginning of each row in indA and valA. The last element of ptrM is equal to nnzA.

• indA[in] pointer to rocblas_int. Array on the GPU of dimension nnzA. It contains the column indices of the non-zero elements of M. Indices are sorted by row and by column within each row.

• valA[in] pointer to type. Array on the GPU of dimension nnzA. The values of the non-zero elements of A. The strictly upper triangular entries are not referenced.

• nnzT[in] rocblas_int. nnzT >= 0. The number of non-zero elements in T.

• ptrT[in] pointer to rocblas_int. Array on the GPU of dimension n+1. It contains the positions of the beginning of each row in indT and valT. The last element of ptrT is equal to nnzT.

• indT[in] pointer to rocblas_int. Array on the GPU of dimension nnzT. It contains the column indices of the non-zero elements of T. Indices are sorted by row and by column within each row.

• valT[out] pointer to type. Array on the GPU of dimension nnzT. The values of the non-zero elements of the new Cholesky factor L_A. The strictly upper triangular entries of this array are not referenced.

• pivQ[in] pointer to rocblas_int. Array on the GPU of dimension n. Contains the pivot indices representing the permutation matrix Q, i.e. the order in which the columns of matrix M were re-arranged.

• rfinfo[in] rocsolver_rfinfo. Structure that holds the meta data generated in the analysis phase.

## Direct sparse solvers#

### rocsolver_<type>csrrf_solve()#

rocblas_status rocsolver_dcsrrf_solve(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, const rocblas_int nnzT, rocblas_int *ptrT, rocblas_int *indT, double *valT, rocblas_int *pivP, rocblas_int *pivQ, double *B, const rocblas_int ldb, rocsolver_rfinfo rfinfo)#
rocblas_status rocsolver_scsrrf_solve(rocblas_handle handle, const rocblas_int n, const rocblas_int nrhs, const rocblas_int nnzT, rocblas_int *ptrT, rocblas_int *indT, float *valT, rocblas_int *pivP, rocblas_int *pivQ, float *B, const rocblas_int ldb, rocsolver_rfinfo rfinfo)#

CSRRF_SOLVE solves a linear system with sparse coefficient matrix $$A$$ in its factorized form.

The linear system is of the form

$AX = B$

where the sparse matrix $$A$$ is factorized as

$Q^TAQ = L_AL_A^T$

(Cholesky factorization for the symmetric positive definite case), or

$PAQ = L_AU_A$

(LU factorization for the general case),

and $$B$$ is a dense matrix of right hand sides.

This function supposes that rfinfo has been updated by function CSRRF_ANALYSIS, after the analysis phase. Both functions, CSRRF_ANALYSIS and CSRRF_SOLVE must be run with the same rfinfo mode (LU or Cholesky factorization), otherwise the workflow will result in an error.

For the LU factorization mode, the LU factors $$L_A$$ and $$U_A$$ must be passed in a bundle matrix $$T=(L_A-I)+U_A$$ as returned by CSRRF_REFACTLU or CSRRF_SUMLU. For the Cholesky mode, the lower triangular part of $$T$$ must contain the Cholesky factor $$L_A$$; the strictly upper triangular part of $$T$$ will be ignored.

Parameters:
• handle[in] rocblas_handle.

• n[in] rocblas_int. n >= 0. The number of rows (and columns) of matrix A.

• nrhs[in] rocblas_int. nrhs >= 0. The number of right hand sides, i.e. the number of columns of matrix B.

• nnzT[in] rocblas_int. nnzT >= 0. The number of non-zero elements in T.

• ptrT[in] pointer to rocblas_int. Array on the GPU of dimension n+1. It contains the positions of the beginning of each row in indT and valT. The last element of ptrT is equal to nnzT.

• indT[in] pointer to rocblas_int. Array on the GPU of dimension nnzT. It contains the column indices of the non-zero elements of T. Indices are sorted by row and by column within each row.

• valT[in] pointer to type. Array on the GPU of dimension nnzT. The values of the non-zero elements of T. The strictly upper triangular entries are not referenced when working in Cholesky mode.

• pivP[in] pointer to rocblas_int. Array on the GPU of dimension n. Contains the pivot indices representing the permutation matrix P, i.e. the order in which the rows of matrix A were re-arranged. When working in Cholesky mode, this array is not referenced and can be null.

• pivQ[in] pointer to rocblas_int. Array on the GPU of dimension n. Contains the pivot indices representing the permutation matrix Q, i.e. the order in which the columns of matrix A were re-arranged.

• B[inout] pointer to type. Array on the GPU of dimension ldb*nrhs. On entry the right hand side matrix B. On exit, the solution matrix X.

• ldb[in] rocblas_int. ldb >= n. The leading dimension of B.

• rfinfo[in] rocsolver_rfinfo. Structure that holds the meta data generated in the analysis phase.