Sparse generic functions#
This module contains all sparse generic routines.
The sparse generic routines describe some of the most common operations that manipulate sparse matrices and vectors. The generic API is more flexible than the other rocSPARSE APIs because it is easy to set different index types, data types, and compute types. For some generic routines, for example, SpMV, the generic API also lets users select different algorithms which have different performance characteristics depending on the sparse matrix being operated on.
rocsparse_axpby()#
-
rocsparse_status rocsparse_axpby(rocsparse_handle handle, const void *alpha, rocsparse_const_spvec_descr x, const void *beta, rocsparse_dnvec_descr y)#
Scale a sparse vector and add it to a scaled dense vector.
rocsparse_axpbymultiplies the sparse vector \(x\) with scalar \(\alpha\) and adds the result to the dense vector \(y\) that is multiplied with scalar \(\beta\), such that\[ y := \alpha \cdot x + \beta \cdot y \]for(i = 0; i < size; ++i) { y[i] = beta * y[i] } for(i = 0; i < nnz; ++i) { y[x_ind[i]] += alpha * x_val[i] }
rocsparse_axpbysupports the following uniform-precision data types for the sparse and dense vectorsxandyand compute types for the scalars \(\alpha\) and \(\beta\).- Uniform Precisions:
X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Mixed Precisions:
X / Y
compute_type
rocsparse_datatype_f16_r
rocsparse_datatype_f32_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
- Example
int main() { // Number of non-zeros of the sparse vector int nnz = 3; // Size of sparse and dense vector int size = 9; // Sparse index vector std::vector<int> hx_ind = {0, 3, 5}; // Sparse value vector std::vector<float> hx_val = {1.0f, 2.0f, 3.0f}; // Dense vector std::vector<float> hy = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f}; // Scalar alpha float alpha = 3.7f; // Scalar beta float beta = 1.2f; // Offload data to device int* dx_ind; float* dx_val; float* dy; HIP_CHECK(hipMalloc(&dx_ind, sizeof(int) * nnz)); HIP_CHECK(hipMalloc(&dx_val, sizeof(float) * nnz)); HIP_CHECK(hipMalloc(&dy, sizeof(float) * size)); HIP_CHECK(hipMemcpy(dx_ind, hx_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dx_val, hx_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dy, hy.data(), sizeof(float) * size, hipMemcpyHostToDevice)); rocsparse_handle handle; rocsparse_spvec_descr vecX; rocsparse_dnvec_descr vecY; rocsparse_indextype idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // Create sparse vector X ROCSPARSE_CHECK(rocsparse_create_spvec_descr( &vecX, size, nnz, dx_ind, dx_val, idx_type, idx_base, data_type)); // Create dense vector Y ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, size, dy, data_type)); // Call axpby to perform y = beta * y + alpha * x ROCSPARSE_CHECK(rocsparse_axpby(handle, &alpha, vecX, &beta, vecY)); ROCSPARSE_CHECK(rocsparse_dnvec_get_values(vecY, (void**)&dy)); // Copy result back to host HIP_CHECK(hipMemcpy(hy.data(), dy, sizeof(float) * size, hipMemcpyDeviceToHost)); std::cout << "y" << std::endl; for(size_t i = 0; i < hy.size(); ++i) { std::cout << hy[i] << " "; } std::cout << std::endl; // Clear rocSPARSE ROCSPARSE_CHECK(rocsparse_destroy_spvec_descr(vecX)); ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); // Clear device memory HIP_CHECK(hipFree(dx_ind)); HIP_CHECK(hipFree(dx_val)); HIP_CHECK(hipFree(dy)); return 0; }
Note
This function is non-blocking and executed asynchronously with respect to the host. It can return before the actual computation has finished.
Note
This routine supports execution in a hipGraph context.
Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
alpha – [in] scalar \(\alpha\).
x – [in] sparse matrix descriptor.
beta – [in] scalar \(\beta\).
y – [inout] dense matrix descriptor.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
alpha,x,beta, orypointer is invalid.
rocsparse_gather()#
-
rocsparse_status rocsparse_gather(rocsparse_handle handle, rocsparse_const_dnvec_descr y, rocsparse_spvec_descr x)#
Gather elements from a dense vector and store them in a sparse vector.
rocsparse_gathergathers the elements from the dense vector \(y\) and stores them in the sparse vector \(x\).for(i = 0; i < nnz; ++i) { x_val[i] = y[x_ind[i]]; }
rocsparse_gathersupports the following uniform-precision data types for the sparse and dense vectorsxandy.- Uniform Precisions:
X / Y
rocsparse_datatype_i8_r
rocsparse_datatype_f16_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Example
int main() { // Number of non-zeros of the sparse vector int nnz = 3; // Size of sparse and dense vector int size = 9; // Sparse index vector std::vector<int> hx_ind = {0, 3, 5}; // Dense vector std::vector<float> hy = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f}; // Offload data to device int* dx_ind; float* dx_val; float* dy; HIP_CHECK(hipMalloc(&dx_ind, sizeof(int) * nnz)); HIP_CHECK(hipMalloc(&dx_val, sizeof(float) * nnz)); HIP_CHECK(hipMalloc(&dy, sizeof(float) * size)); HIP_CHECK(hipMemcpy(dx_ind, hx_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dy, hy.data(), sizeof(float) * size, hipMemcpyHostToDevice)); rocsparse_handle handle; rocsparse_spvec_descr vecX; rocsparse_dnvec_descr vecY; rocsparse_indextype idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // Create sparse vector X ROCSPARSE_CHECK(rocsparse_create_spvec_descr( &vecX, size, nnz, dx_ind, dx_val, idx_type, idx_base, data_type)); // Create dense vector Y ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, size, dy, data_type)); // Call axpby to perform gather ROCSPARSE_CHECK(rocsparse_gather(handle, vecY, vecX)); ROCSPARSE_CHECK(rocsparse_spvec_get_values(vecX, (void**)&dx_val)); // Copy result back to host std::vector<float> hx_val(nnz, 0.0f); HIP_CHECK(hipMemcpy(hx_val.data(), dx_val, sizeof(float) * nnz, hipMemcpyDeviceToHost)); std::cout << "x" << std::endl; for(size_t i = 0; i < hx_val.size(); ++i) { std::cout << hx_val[i] << " "; } std::cout << std::endl; // Clear rocSPARSE ROCSPARSE_CHECK(rocsparse_destroy_spvec_descr(vecX)); ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); // Clear device memory HIP_CHECK(hipFree(dx_ind)); HIP_CHECK(hipFree(dx_val)); HIP_CHECK(hipFree(dy)); return 0; }
Note
This function is non-blocking and executed asynchronously with respect to the host. It can return before the actual computation has finished.
Note
This routine supports execution in a hipGraph context.
Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
y – [in] dense vector \(y\).
x – [out] sparse vector \(x\).
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
xorypointer is invalid.
rocsparse_scatter()#
-
rocsparse_status rocsparse_scatter(rocsparse_handle handle, rocsparse_const_spvec_descr x, rocsparse_dnvec_descr y)#
Scatter elements from a sparse vector into a dense vector.
rocsparse_scatterscatters the elements from the sparse vector \(x\) into the dense vector \(y\).for(i = 0; i < nnz; ++i) { y[x_ind[i]] = x_val[i]; }
rocsparse_scattersupports the following uniform-precision data types for the sparse and dense vectorsxandy.- Uniform Precisions:
X / Y
rocsparse_datatype_i8_r
rocsparse_datatype_f16_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Example
int main() { // Number of non-zeros of the sparse vector int nnz = 3; // Size of sparse and dense vector int size = 9; // Sparse index vector std::vector<int> hx_ind = {0, 3, 5}; // Sparse value vector std::vector<float> hx_val = {1.0f, 2.0f, 3.0f}; // Dense vector std::vector<float> hy = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f}; // Offload data to device int* dx_ind; float* dx_val; float* dy; HIP_CHECK(hipMalloc(&dx_ind, sizeof(int) * nnz)); HIP_CHECK(hipMalloc(&dx_val, sizeof(float) * nnz)); HIP_CHECK(hipMalloc(&dy, sizeof(float) * size)); HIP_CHECK(hipMemcpy(dx_ind, hx_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dx_val, hx_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dy, hy.data(), sizeof(float) * size, hipMemcpyHostToDevice)); rocsparse_handle handle; rocsparse_spvec_descr vecX; rocsparse_dnvec_descr vecY; rocsparse_indextype idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // Create sparse vector X ROCSPARSE_CHECK(rocsparse_create_spvec_descr( &vecX, size, nnz, dx_ind, dx_val, idx_type, idx_base, data_type)); // Create dense vector Y ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, size, dy, data_type)); // Call axpby to perform scatter ROCSPARSE_CHECK(rocsparse_scatter(handle, vecX, vecY)); ROCSPARSE_CHECK(rocsparse_dnvec_get_values(vecY, (void**)&dy)); // Copy result back to host HIP_CHECK(hipMemcpy(hy.data(), dy, sizeof(float) * size, hipMemcpyDeviceToHost)); std::cout << "y" << std::endl; for(size_t i = 0; i < hy.size(); ++i) { std::cout << hy[i] << " "; } std::cout << std::endl; // Clear rocSPARSE ROCSPARSE_CHECK(rocsparse_destroy_spvec_descr(vecX)); ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); // Clear device memory HIP_CHECK(hipFree(dx_ind)); HIP_CHECK(hipFree(dx_val)); HIP_CHECK(hipFree(dy)); return 0; }
Note
This function is non-blocking and executed asynchronously with respect to the host. It can return before the actual computation has finished.
Note
This routine supports execution in a hipGraph context.
Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
x – [in] sparse vector \(x\).
y – [out] dense vector \(y\).
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
xorypointer is invalid.
rocsparse_rot()#
-
rocsparse_status rocsparse_rot(rocsparse_handle handle, const void *c, const void *s, rocsparse_spvec_descr x, rocsparse_dnvec_descr y)#
Apply Givens rotation to a dense and a sparse vector.
rocsparse_rotapplies the Givens rotation matrix \(G\) to the sparse vector \(x\) and the dense vector \(y\), where\[\begin{split} G = \begin{pmatrix} c & s \\ -s & c \end{pmatrix} \end{split}\]for(i = 0; i < nnz; ++i) { x_tmp = x_val[i]; y_tmp = y[x_ind[i]]; x_val[i] = c * x_tmp + s * y_tmp; y[x_ind[i]] = c * y_tmp - s * x_tmp; }
rocsparse_rotsupports the following uniform-precision data types for the sparse and dense vectorsxandyand compute types for the scalars \(c\) and \(s\).- Uniform Precisions:
X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Example
int main() { // Number of non-zeros of the sparse vector int nnz = 3; // Size of sparse and dense vector int size = 9; // Sparse index vector std::vector<int> hx_ind = {0, 3, 5}; // Sparse value vector std::vector<float> hx_val = {1.0f, 2.0f, 3.0f}; // Dense vector std::vector<float> hy = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f}; // Scalar c float c = 3.7f; // Scalar s float s = 1.2f; // Offload data to device int* dx_ind; float* dx_val; float* dy; HIP_CHECK(hipMalloc(&dx_ind, sizeof(int) * nnz)); HIP_CHECK(hipMalloc(&dx_val, sizeof(float) * nnz)); HIP_CHECK(hipMalloc(&dy, sizeof(float) * size)); HIP_CHECK(hipMemcpy(dx_ind, hx_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dx_val, hx_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dy, hy.data(), sizeof(float) * size, hipMemcpyHostToDevice)); rocsparse_handle handle; rocsparse_spvec_descr vecX; rocsparse_dnvec_descr vecY; rocsparse_indextype idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // Create sparse vector X ROCSPARSE_CHECK(rocsparse_create_spvec_descr( &vecX, size, nnz, dx_ind, dx_val, idx_type, idx_base, data_type)); // Create dense vector Y ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, size, dy, data_type)); // Call rot ROCSPARSE_CHECK(rocsparse_rot(handle, (void*)&c, (void*)&s, vecX, vecY)); ROCSPARSE_CHECK(rocsparse_spvec_get_values(vecX, (void**)&dx_val)); ROCSPARSE_CHECK(rocsparse_dnvec_get_values(vecY, (void**)&dy)); // Copy result back to host HIP_CHECK(hipMemcpy(hx_val.data(), dx_val, sizeof(float) * nnz, hipMemcpyDeviceToHost)); HIP_CHECK(hipMemcpy(hy.data(), dy, sizeof(float) * size, hipMemcpyDeviceToHost)); std::cout << "x" << std::endl; for(size_t i = 0; i < hx_val.size(); ++i) { std::cout << hx_val[i] << " "; } std::cout << std::endl; std::cout << "y" << std::endl; for(size_t i = 0; i < hy.size(); ++i) { std::cout << hy[i] << " "; } std::cout << std::endl; // Clear rocSPARSE ROCSPARSE_CHECK(rocsparse_destroy_spvec_descr(vecX)); ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); // Clear device memory HIP_CHECK(hipFree(dx_ind)); HIP_CHECK(hipFree(dx_val)); HIP_CHECK(hipFree(dy)); return 0; }
Note
This function is non-blocking and executed asynchronously with respect to the host. It can return before the actual computation has finished.
Note
This routine supports execution in a hipGraph context.
Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
c – [in] pointer to the cosine element of \(G\), which can be on host or device.
s – [in] pointer to the sine element of \(G\), which can be on host or device.
x – [inout] sparse vector \(x\).
y – [inout] dense vector \(y\).
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
c,s,x, orypointer is invalid.
rocsparse_spvv()#
-
rocsparse_status rocsparse_spvv(rocsparse_handle handle, rocsparse_operation trans, rocsparse_const_spvec_descr x, rocsparse_const_dnvec_descr y, void *result, rocsparse_datatype compute_type, size_t *buffer_size, void *temp_buffer)#
Sparse vector inner dot product.
rocsparse_spvvcomputes the inner dot product of the sparse vector \(x\) with the dense vector \(y\), such that\[ \text{result} := op(x) \cdot y, \]with\[\begin{split} op(x) = \left\{ \begin{array}{ll} x, & \text{if trans == rocsparse_operation_none} \\ \bar{x}, & \text{if trans == rocsparse_operation_conjugate_transpose} \\ \end{array} \right. \end{split}\]result = 0; for(i = 0; i < nnz; ++i) { result += x_val[i] * y[x_ind[i]]; }
Performing the above operation involves two steps. First, call
rocsparse_spvvwithtemp_bufferset tonullptr, which will return the required temporary buffer size in the parameterbuffer_size. Then allocate this buffer. Finally, complete the computation by callingrocsparse_spvva second time with the newly allocated buffer. After the computation is complete, deallocate the buffer.rocsparse_spvvsupports the following uniform and mixed-precision data types for the sparse and dense vectors \(x\) and \(y\) and compute types for the scalarresult.- Uniform Precisions:
X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Mixed Precisions:
X / Y
compute_type / result
rocsparse_datatype_i8_r
rocsparse_datatype_i32_r
rocsparse_datatype_i8_r
rocsparse_datatype_f32_r
rocsparse_datatype_f16_r
rocsparse_datatype_f32_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
- Example
int main() { // Number of non-zeros of the sparse vector int nnz = 3; // Size of sparse and dense vector int size = 9; // Sparse index vector std::vector<int> hx_ind = {0, 3, 5}; // Sparse value vector std::vector<float> hx_val = {1.0f, 2.0f, 3.0f}; // Dense vector std::vector<float> hy = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f}; // Offload data to device int* dx_ind; float* dx_val; float* dy; HIP_CHECK(hipMalloc(&dx_ind, sizeof(int) * nnz)); HIP_CHECK(hipMalloc(&dx_val, sizeof(float) * nnz)); HIP_CHECK(hipMalloc(&dy, sizeof(float) * size)); HIP_CHECK(hipMemcpy(dx_ind, hx_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dx_val, hx_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dy, hy.data(), sizeof(float) * size, hipMemcpyHostToDevice)); rocsparse_handle handle; rocsparse_spvec_descr vecX; rocsparse_dnvec_descr vecY; rocsparse_indextype idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_datatype compute_type = rocsparse_datatype_f32_r; rocsparse_operation trans = rocsparse_operation_none; rocsparse_index_base idx_base = rocsparse_index_base_zero; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // Create sparse vector X ROCSPARSE_CHECK(rocsparse_create_spvec_descr( &vecX, size, nnz, dx_ind, dx_val, idx_type, idx_base, data_type)); // Create dense vector Y ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, size, dy, data_type)); // Obtain buffer size float hresult = 0.0f; size_t buffer_size; ROCSPARSE_CHECK( rocsparse_spvv(handle, trans, vecX, vecY, &hresult, compute_type, &buffer_size, nullptr)); void* temp_buffer; HIP_CHECK(hipMalloc(&temp_buffer, buffer_size)); // SpVV ROCSPARSE_CHECK(rocsparse_spvv( handle, trans, vecX, vecY, &hresult, compute_type, &buffer_size, temp_buffer)); HIP_CHECK(hipDeviceSynchronize()); // Clear rocSPARSE ROCSPARSE_CHECK(rocsparse_destroy_spvec_descr(vecX)); ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); // Clear device memory HIP_CHECK(hipFree(dx_ind)); HIP_CHECK(hipFree(dx_val)); HIP_CHECK(hipFree(dy)); HIP_CHECK(hipFree(temp_buffer)); return 0; }
Note
This function writes the required allocation size (in bytes) to
buffer_sizeand returns without performing the SpVV operation when a nullptr is passed fortemp_buffer.Note
This function is blocking with respect to the host.
Note
This routine does not support execution in a hipGraph context.
Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
trans – [in] sparse vector operation type.
x – [in] sparse vector descriptor.
y – [in] dense vector descriptor.
result – [out] pointer to the result, which can be in host or device memory.
compute_type – [in] floating point precision for the SpVV computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when
temp_bufferis nullptr.temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to
buffer_sizeand the function returns without performing the SpVV operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
x,y,result, orbuffer_sizepointer is invalid.rocsparse_status_not_implemented –
compute_typeis currently not supported.
rocsparse_spmv()#
-
rocsparse_status rocsparse_spmv(rocsparse_handle handle, rocsparse_operation trans, const void *alpha, rocsparse_const_spmat_descr mat, rocsparse_const_dnvec_descr x, const void *beta, const rocsparse_dnvec_descr y, rocsparse_datatype compute_type, rocsparse_spmv_alg alg, rocsparse_spmv_stage stage, size_t *buffer_size, void *temp_buffer)#
Sparse matrix vector multiplication.
rocsparse_spmvmultiplies the scalar \(\alpha\) with a sparse \(m \times n\) matrix \(op(A)\), defined in CSR, CSC, COO, COO (AoS), BSR, or ELL format, with the dense vector \(x\) and adds the result to the dense vector \(y\) that is multiplied by the scalar \(\beta\), such that\[ y := \alpha \cdot op(A) \cdot x + \beta \cdot y, \]with\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans == rocsparse_operation_none} \\ A^T, & \text{if trans == rocsparse_operation_transpose} \\ A^H, & \text{if trans == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]Performing the above operation involves multiple steps. First, call
rocsparse_spmvwith the stage parameter set to rocsparse_spmv_stage_buffer_size to determine the size of the required temporary storage buffer. Then allocate this buffer and callrocsparse_spmvwith the stage parameter set to rocsparse_spmv_stage_preprocess. Depending on the algorithm and sparse matrix format, this will perform analysis on the sparsity pattern of \(op(A)\). Finally, complete the operation by callingrocsparse_spmvwith the stage parmeter set to rocsparse_spmv_stage_compute. The buffer size, buffer allocation, and preprocess stages only need to be called once for a given sparse matrix \(op(A)\), while the computation stage can be repeatedly used with different \(x\) and \(y\) vectors. After all calls torocsparse_spmvare complete, the temporary buffer can be deallocated.rocsparse_spmvsupports multiple different algorithms. These algorithms have different trade-offs depending on the sparsity pattern of the matrix, whether or not the results need to be deterministic, and how many times the sparse-vector product will be performed.Algorithm
Deterministic
Preprocessing
Notes
rocsparse_spmv_alg_csr_rowsplit
Yes
No
Is best suited for matrices with all rows having a similar number of non-zeros. Can outperform adaptive and LRB algorithms in certain sparsity patterns. Will perform very poorly if some rows have few non-zeros and some rows have many non-zeros.
rocsparse_spmv_alg_csr_stream
Yes
No
[Deprecated] The old name for rocsparse_spmv_alg_csr_rowsplit.
rocsparse_spmv_alg_csr_adaptive
No
Yes
Generally the fastest algorithm across all matrix sparsity patterns. This includes matrices that have some rows with many non-zeros and some rows with few non-zeros. Requires lengthy preprocessing that needs to be amortized over many subsequent sparse vector products.
rocsparse_spmv_alg_csr_lrb
No
Yes
Like the adaptive algorithm, it generally performs well across all matrix sparsity patterns. Generally not as fast as the adaptive algorithm, however, it uses a much faster pre-processing step. Good for when only a small number of sparse vector products will be performed.
rocsparse_spmv_alg_csr_nnzsplit
No
Yes
Like the adaptive algorithm, it generally performs well across all matrix sparsity patterns. Generally not as fast as the adaptive algorithm but faster than the LRB algorithm. It uses a much faster preprocessing step than LRB. Good when the number of sparse vector products that will be performed is less than one hundred. If more products need to be computed, the adaptive algorithm is probably faster.
COO Algorithms
Deterministic
Preprocessing
Notes
rocsparse_spmv_alg_coo
Yes
Yes
Generally not as fast as the atomic algorithm but is deterministic.
rocsparse_spmv_alg_coo_atomic
No
No
Generally the fastest COO algorithm.
ELL Algorithms
Deterministic
Preprocessing
Notes
rocsparse_spmv_alg_ell
Yes
No
BSR Algorithm
Deterministic
Preprocessing
Notes
rocsparse_spmv_alg_bsr
Yes
No
rocsparse_spmvsupports multiple combinations of data types and compute types. The tables below indicate the currently supported different data types that can be used for the sparse matrix \(op(A)\), the dense vectors \(x\) and \(y\), and the compute type for \(\alpha\) and \(\beta\). The advantage of using different data types is to save on memory bandwidth and storage when a user application allows, while performing the actual computation in a higher precision.rocsparse_spmvsupports rocsparse_indextype_i32 and rocsparse_indextype_i64 index precisions for storing the row pointer and column indices arrays of the sparse matrices.- Uniform Precisions:
A / X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Mixed Precisions:
A / X
Y
compute_type
rocsparse_datatype_i8_r
rocsparse_datatype_i32_r
rocsparse_datatype_i32_r
rocsparse_datatype_i8_r
rocsparse_datatype_f32_r
rocsparse_datatype_f32_r
rocsparse_datatype_f16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f32_r
rocsparse_datatype_f16_r
rocsparse_datatype_f16_r
rocsparse_datatype_f32_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f32_r
rocsparse_datatype_bf16_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
- Mixed-regular Real Precisions
A
X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Mixed-regular Complex Precisions
A
X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_r
rocsparse_datatype_f64_c
- Example
int main() { // 1 4 0 0 0 0 // A = 0 2 3 0 0 0 // 5 0 0 7 8 0 // 0 0 9 0 6 0 int m = 4; int n = 6; std::vector<int> hcsr_row_ptr = {0, 2, 4, 7, 9}; std::vector<int> hcsr_col_ind = {0, 1, 1, 2, 0, 3, 4, 2, 4}; std::vector<float> hcsr_val = {1, 4, 2, 3, 5, 7, 8, 9, 6}; std::vector<float> hx(n, 1.0f); std::vector<float> hy(m, 0.0f); // Scalar alpha float alpha = 3.7f; // Scalar beta float beta = 0.0f; int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0]; // Offload data to device int* dcsr_row_ptr; int* dcsr_col_ind; float* dcsr_val; float* dx; float* dy; HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1))); HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz)); HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz)); HIP_CHECK(hipMalloc(&dx, sizeof(float) * n)); HIP_CHECK(hipMalloc(&dy, sizeof(float) * m)); HIP_CHECK( hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dx, hx.data(), sizeof(float) * n, hipMemcpyHostToDevice)); rocsparse_handle handle; rocsparse_spmat_descr matA; rocsparse_dnvec_descr vecX; rocsparse_dnvec_descr vecY; rocsparse_indextype row_idx_type = rocsparse_indextype_i32; rocsparse_indextype col_idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_datatype compute_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; rocsparse_operation trans = rocsparse_operation_none; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // Create sparse matrix A ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA, m, n, nnz, dcsr_row_ptr, dcsr_col_ind, dcsr_val, row_idx_type, col_idx_type, idx_base, data_type)); // Create dense vector X ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecX, n, dx, data_type)); // Create dense vector Y ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, m, dy, data_type)); // Call spmv to get buffer size size_t buffer_size; ROCSPARSE_CHECK(rocsparse_spmv(handle, trans, &alpha, matA, vecX, &beta, vecY, compute_type, rocsparse_spmv_alg_csr_adaptive, rocsparse_spmv_stage_buffer_size, &buffer_size, nullptr)); void* temp_buffer; HIP_CHECK(hipMalloc(&temp_buffer, buffer_size)); // Call spmv to perform analysis ROCSPARSE_CHECK(rocsparse_spmv(handle, trans, &alpha, matA, vecX, &beta, vecY, compute_type, rocsparse_spmv_alg_csr_adaptive, rocsparse_spmv_stage_preprocess, &buffer_size, temp_buffer)); // Call spmv to perform computation ROCSPARSE_CHECK(rocsparse_spmv(handle, trans, &alpha, matA, vecX, &beta, vecY, compute_type, rocsparse_spmv_alg_csr_adaptive, rocsparse_spmv_stage_compute, &buffer_size, temp_buffer)); // Copy result back to host HIP_CHECK(hipMemcpy(hy.data(), dy, sizeof(float) * m, hipMemcpyDeviceToHost)); // Clear rocSPARSE ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA)); ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecX)); ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); // Clear device memory HIP_CHECK(hipFree(dcsr_row_ptr)); HIP_CHECK(hipFree(dcsr_col_ind)); HIP_CHECK(hipFree(dcsr_val)); HIP_CHECK(hipFree(dx)); HIP_CHECK(hipFree(dy)); HIP_CHECK(hipFree(temp_buffer)); return 0; }
Note
None of the algorithms above are deterministic when \(A\) is transposed.
Note
The sparse matrix formats currently supported are: rocsparse_format_bsr, rocsparse_format_coo, rocsparse_format_coo_aos, rocsparse_format_csr, rocsparse_format_csc, and rocsparse_format_ell.
Note
Only the rocsparse_spmv_stage_buffer_size stage and the rocsparse_spmv_stage_compute stage are non-blocking and executed asynchronously with respect to the host. They can return before the actual computation has finished. The rocsparse_spmv_stage_preprocess stage is blocking with respect to the host.
Note
Only the rocsparse_spmv_stage_buffer_size stage and the rocsparse_spmv_stage_compute stage support execution in a hipGraph context. The rocsparse_spmv_stage_preprocess stage does not support hipGraph.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
trans – [in] matrix operation type.
alpha – [in] scalar \(\alpha\).
mat – [in] matrix descriptor.
x – [in] vector descriptor.
beta – [in] scalar \(\beta\).
y – [inout] vector descriptor.
compute_type – [in] floating point precision for the SpMV computation.
alg – [in] SpMV algorithm for the SpMV computation.
stage – [in] SpMV stage for the SpMV computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when
temp_bufferis nullptr.temp_buffer – [in] temporary storage buffer allocated by the user. When the rocsparse_spmv_stage_buffer_size stage is passed, the required allocation size (in bytes) is written to
buffer_sizeand function returns without performing the SpMV operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context
handlewas not initialized.rocsparse_status_invalid_pointer –
alpha,mat,x,beta,y, orbuffer_sizepointer is invalid.rocsparse_status_invalid_value – the value of
trans,compute_type,alg, orstageis incorrect.rocsparse_status_not_implemented –
compute_typeoralgis currently not supported.
rocsparse_v2_spmv_buffer_size()#
-
rocsparse_status rocsparse_v2_spmv_buffer_size(rocsparse_handle handle, rocsparse_spmv_descr descr, rocsparse_const_spmat_descr mat, rocsparse_const_dnvec_descr x, rocsparse_const_dnvec_descr y, rocsparse_v2_spmv_stage stage, size_t *buffer_size_in_bytes, rocsparse_error *error)#
rocsparse_v2_spmv_buffer_sizereturns the size of the required buffer to execute the given stage of the Version 2 SpMV operation. This routine is used in conjunction with rocsparse_v2_spmv(). See rocsparse_v2_spmv for a full description and example.Note
This routine does not support execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
descr – [in] SpMV descriptor.
mat – [in] sparse matrix descriptor.
x – [in] dense vector descriptor.
y – [in] dense vector descriptor.
stage – [in] Version 2 SpMV stage for the SpMV computation.
buffer_size_in_bytes – [out] number of bytes of the buffer.
error – [out] error descriptor created if the returned status is not rocsparse_status_success. A null pointer can be passed if an error descriptor is not required.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_value – the
stagevalue is invalid.rocsparse_status_invalid_pointer –
mat,x,y,descr, orbuffer_size_in_bytespointer is invalid.
rocsparse_v2_spmv()#
-
rocsparse_status rocsparse_v2_spmv(rocsparse_handle handle, rocsparse_spmv_descr descr, const void *alpha, rocsparse_const_spmat_descr mat, rocsparse_const_dnvec_descr x, const void *beta, rocsparse_dnvec_descr y, rocsparse_v2_spmv_stage stage, size_t buffer_size_in_bytes, void *buffer, rocsparse_error *error)#
Sparse matrix vector multiplication.
rocsparse_v2_spmvmultiplies the scalar \(\alpha\) with a sparse \(m \times n\) matrix \(op(A)\) with the dense vector \(x\) and adds the result to the dense vector \(y\) that is multiplied by the scalar \(\beta\), such that\[ y := \alpha \cdot op(A) \cdot x + \beta \cdot y, \]with\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans == rocsparse_operation_none} \\ A^T, & \text{if trans == rocsparse_operation_transpose} \\ A^H, & \text{if trans == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]Performing the above operation involves two stages. The first stage is rocsparse_v2_spmv_stage_analysis. This will perform an analysis of the symbolic information of \(op(A)\). The second stage is rocsparse_v2_spmv_stage_compute, which corresponds to the actual calculation. The size of the buffer required for each stage is determined by calling the routine rocsparse_v2_spmv_buffer_size. The stage rocsparse_v2_spmv_stage_analysis only needs to be called once for a given sparse matrix \(op(A)\), while the computation stage can be repeatedly used with different \(x\) and \(y\) vectors.
rocsparse_v2_spmvsupports multiple algorithms. These algorithms have different trade-offs depending on the sparsity pattern of the matrix, whether or not the results need to be deterministic, and how many times the sparse-vector product will be performed.Algorithm
Deterministic
Notes
rocsparse_spmv_alg_csr_rowsplit
Yes
This is best suited for matrices with all rows having a similar number of non-zeros. Can outperform adaptive and LRB algorithms in certain sparsity patterns. Will perform very poorly if some rows have few non-zeros and some rows have many non-zeros.
rocsparse_spmv_alg_csr_stream
Yes
[Deprecated] The old name for rocsparse_spmv_alg_csr_rowsplit.
rocsparse_spmv_alg_csr_adaptive
No
Generally the fastest algorithm across all matrix sparsity patterns. This includes matrices that have some rows with many non-zeros and some rows with few non-zeros. Requires lengthy preprocessing that needs to be amortized over many subsequent sparse vector products.
rocsparse_spmv_alg_csr_lrb
No
Like the adaptive algorithm, this generally performs well across all matrix sparsity patterns. Generally not as fast as the adaptive algorithm. However, it uses a much faster preprocessing step. Good for when only a small number of sparse vector products will be performed.
rocsparse_spmv_alg_csr_nnzsplit
No
Like the adaptive algorithm, this generally performs well across all matrix sparsity patterns. Generally not as fast as the adaptive algorithm but faster than the LRB algorithm. It uses a much faster preprocessing step than LRB. It’s good when the number of sparse vector products that will be performed is less than one hundred. If more products need to be computed, the adaptive algorithm is probably faster.
COO Algorithms
Deterministic
Notes
rocsparse_spmv_alg_coo
Yes
Generally not as fast as the atomic algorithm but is deterministic.
rocsparse_spmv_alg_coo_atomic
No
Generally the fastest COO algorithm.
ELL Algorithms
Deterministic
Notes
rocsparse_spmv_alg_ell
Yes
Sliced ELL Algorithms
Deterministic
Notes
rocsparse_spmv_alg_sell
Yes
BSR Algorithm
Deterministic
Notes
rocsparse_spmv_alg_bsr
Yes
rocsparse_v2_spmvsupports multiple combinations of data types and compute types. The tables below indicate the currently supported different data types that can be used for the sparse matrix \(op(A)\), the dense vectors \(x\) and \(y\), and the compute type for \(\alpha\) and \(\beta\). The advantage of using different data types is to save on memory bandwidth and storage when a user application allows, while performing the actual computation in a higher precision.rocsparse_v2_spmvsupports rocsparse_indextype_i32 and rocsparse_indextype_i64 index precisions for storing the row pointer and column indices arrays of the sparse matrices.- Uniform Precisions:
A / X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Mixed Precisions:
A / X
Y
compute_type
rocsparse_datatype_i8_r
rocsparse_datatype_i32_r
rocsparse_datatype_i32_r
rocsparse_datatype_i8_r
rocsparse_datatype_f32_r
rocsparse_datatype_f32_r
rocsparse_datatype_f16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f32_r
rocsparse_datatype_f16_r
rocsparse_datatype_f16_r
rocsparse_datatype_f32_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f32_r
rocsparse_datatype_bf16_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
- Mixed-regular Real Precisions
A
X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Mixed-regular Complex Precisions
A
X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_r
rocsparse_datatype_f64_c
- Example
int main() { // 1 4 0 0 0 0 // A = 0 2 3 0 0 0 // 5 0 0 7 8 0 // 0 0 9 0 6 0 int m = 4; int n = 6; std::vector<int> hcsr_row_ptr = {0, 2, 4, 7, 9}; std::vector<int> hcsr_col_ind = {0, 1, 1, 2, 0, 3, 4, 2, 4}; std::vector<float> hcsr_val = {1, 4, 2, 3, 5, 7, 8, 9, 6}; std::vector<double> hx(n, 1.0f); std::vector<double> hy(m, 0.0f); // Scalar alpha double alpha = 3.7f; // Scalar beta double beta = 0.0f; int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0]; // Offload data to device int* dcsr_row_ptr; int* dcsr_col_ind; float* dcsr_val; double* dx; double* dy; HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1))); HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz)); HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz)); HIP_CHECK(hipMalloc(&dx, sizeof(double) * n)); HIP_CHECK(hipMalloc(&dy, sizeof(double) * m)); HIP_CHECK( hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dx, hx.data(), sizeof(double) * n, hipMemcpyHostToDevice)); rocsparse_handle handle; rocsparse_error p_error[1] = {}; rocsparse_spmat_descr matA; rocsparse_dnvec_descr vecX; rocsparse_dnvec_descr vecY; rocsparse_indextype row_idx_type = rocsparse_indextype_i32; rocsparse_indextype col_idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // Create sparse matrix A ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA, m, n, nnz, dcsr_row_ptr, dcsr_col_ind, dcsr_val, row_idx_type, col_idx_type, idx_base, data_type)); // Create dense vector X ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecX, n, dx, rocsparse_datatype_f64_r)); // Create dense vector Y ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, m, dy, rocsparse_datatype_f64_r)); rocsparse_spmv_descr spmv_descr; ROCSPARSE_CHECK(rocsparse_create_spmv_descr(&spmv_descr)); const rocsparse_spmv_alg spmv_alg = rocsparse_spmv_alg_csr_adaptive; ROCSPARSE_CHECK(rocsparse_spmv_set_input( handle, spmv_descr, rocsparse_spmv_input_alg, &spmv_alg, sizeof(spmv_alg), p_error)); const rocsparse_operation spmv_operation = rocsparse_operation_none; ROCSPARSE_CHECK(rocsparse_spmv_set_input(handle, spmv_descr, rocsparse_spmv_input_operation, &spmv_operation, sizeof(spmv_operation), p_error)); const rocsparse_datatype spmv_scalar_datatype = rocsparse_datatype_f64_r; ROCSPARSE_CHECK(rocsparse_spmv_set_input(handle, spmv_descr, rocsparse_spmv_input_scalar_datatype, &spmv_scalar_datatype, sizeof(spmv_scalar_datatype), p_error)); const rocsparse_datatype spmv_compute_datatype = rocsparse_datatype_f64_r; ROCSPARSE_CHECK(rocsparse_spmv_set_input(handle, spmv_descr, rocsparse_spmv_input_compute_datatype, &spmv_compute_datatype, sizeof(spmv_compute_datatype), p_error)); // Call spmv to get buffer size size_t buffer_size; ROCSPARSE_CHECK(rocsparse_v2_spmv_buffer_size(handle, spmv_descr, matA, vecX, vecY, rocsparse_v2_spmv_stage_analysis, &buffer_size, p_error)); void* buffer; HIP_CHECK(hipMalloc(&buffer, buffer_size)); // Call spmv to perform analysis ROCSPARSE_CHECK(rocsparse_v2_spmv(handle, spmv_descr, &alpha, matA, vecX, &beta, vecY, rocsparse_v2_spmv_stage_analysis, buffer_size, buffer, p_error)); HIP_CHECK(hipFree(buffer)); ROCSPARSE_CHECK(rocsparse_v2_spmv_buffer_size(handle, spmv_descr, matA, vecX, vecY, rocsparse_v2_spmv_stage_compute, &buffer_size, p_error)); HIP_CHECK(hipMalloc(&buffer, buffer_size)); // Call spmv to perform computation ROCSPARSE_CHECK(rocsparse_v2_spmv(handle, spmv_descr, &alpha, matA, vecX, &beta, vecY, rocsparse_v2_spmv_stage_compute, buffer_size, buffer, p_error)); HIP_CHECK(hipFree(buffer)); ROCSPARSE_CHECK(rocsparse_destroy_error(p_error[0])); ROCSPARSE_CHECK(rocsparse_destroy_spmv_descr(spmv_descr)); // Copy result back to host HIP_CHECK(hipMemcpy(hy.data(), dy, sizeof(double) * m, hipMemcpyDeviceToHost)); // Clear rocSPARSE ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA)); ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecX)); ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); // Clear device memory HIP_CHECK(hipFree(dcsr_row_ptr)); HIP_CHECK(hipFree(dcsr_col_ind)); HIP_CHECK(hipFree(dcsr_val)); HIP_CHECK(hipFree(dx)); HIP_CHECK(hipFree(dy)); return 0; }
Note
The sparse matrix format rocsparse_format_bell is not supported.
Note
The stage rocsparse_v2_spmv_stage_analysis is mandatory. An error will be returned if that stage was not executed before the stage rocsparse_v2_spmv_stage_compute.
Note
None of the algorithms above are deterministic when \(A\) is transposed.
Note
All the sparse matrix formats are supported except rocsparse_format_bell.
Note
The rocsparse_v2_spmv_stage_compute stage is non-blocking and executed asynchronously with respect to the host. It can return before the actual computation has finished. The stage rocsparse_v2_spmv_stage_analysis is blocking with respect to the host.
Note
Only the stage rocsparse_v2_spmv_stage_compute supports execution in a hipGraph context. The rocsparse_v2_spmv_stage_analysis stage does not support hipGraph.
Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
descr – [in] SpMV descriptor.
alpha – [in] scalar \(\alpha\).
mat – [in] matrix descriptor.
x – [in] vector descriptor.
beta – [in] scalar \(\beta\).
y – [inout] vector descriptor.
stage – [in] SpMV stage of the SpMV algorithm.
buffer_size_in_bytes – [in] size in bytes of the buffer, which must be greater or equal to the buffer size obtained from rocsparse_v2_spmv_buffer_size.
buffer – [in] temporary buffer allocated by the user.
error – [out] error descriptor created if the returned status is not rocsparse_status_success. A null pointer can be passed if an error descriptor is not required.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context
handlewas not initialized.rocsparse_status_invalid_pointer –
alpha,mat,x,beta,y, orbufferpointer is invalid.rocsparse_status_invalid_value – the value of
stageis invalid.rocsparse_status_not_implemented – if
algis not supported or if the mixed precision configuration is not supported.
rocsparse_spmv_set_extra()#
-
rocsparse_status rocsparse_spmv_set_extra(rocsparse_handle handle, rocsparse_spmv_descr descr, int64_t num_extras, rocsparse_const_dnvec_descr gamma_vec, rocsparse_const_dnvec_descr *z_vecs, rocsparse_error *p_error)#
Set extra scalar and vector parameters for SpMV.
rocsparse_spmv_set_extrasets a gamma dnvec vector and z vectors that are appended to the SpMV computation. The computation will be: \(y = \alpha * op(A) * x + \beta * y + \sum_{i=1}^{n} \gamma_i z_i\) where \(n\) is the number of extra terms set bynum_extras.This feature can be used to implement residual calculations of the form \(r = b - A * x\) within the SpMV call by setting \(\gamma = 1\) and \(z = b\).
- Data type Requirements
The following data type requirements must be satisfied:
The
gamma_vecdata type must match the scalar data type set using rocsparse_spmv_set_input withrocsparse_spmv_input_scalar_datatype.All
z_vecsmust have the same data type as the compute data type set using rocsparse_spmv_set_input withrocsparse_spmv_input_compute_datatype.The size of
gamma_vecmust equalnum_extras.All
z_vecsmust have the same size (vector length).Both scalar and compute data types must be set on the descriptor before calling this function.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
descr – [inout] SpMV descriptor.
num_extras – [in] number of extra terms (gamma/z pairs).
gamma_vec – [in] dense vector descriptor containing gamma scalars. Must have a data type matching the scalar datatype and a size equal to
num_extras.z_vecs – [in] array of dense vector descriptors for z vectors. All vectors must have a data type matching the compute data type and have the same size.
p_error – [out] error descriptor created if the returned status is not rocsparse_status_success. A null pointer can be passed if an error descriptor is not required.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
descr,gamma_vec, orz_vecsis invalid.rocsparse_status_invalid_value – invalid parameters, including data type mismatches or missing scalar/compute data type configuration.
rocsparse_status_invalid_size – size mismatches between
gamma_vecandnum_extras, or betweenz_vecselements.
rocsparse_spmv_clear_extra()#
-
rocsparse_status rocsparse_spmv_clear_extra(rocsparse_handle handle, rocsparse_spmv_descr descr, rocsparse_error *p_error)#
Clear extra parameters for SpMV.
rocsparse_spmv_clear_extraclears the extra parameters set by rocsparse_spmv_set_extra.- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
descr – [inout] SpMV descriptor.
p_error – [out] error descriptor created if the returned status is not rocsparse_status_success. A null pointer can be passed if an error descriptor is not required.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
descris invalid.
rocsparse_spsv()#
-
rocsparse_status rocsparse_spsv(rocsparse_handle handle, rocsparse_operation trans, const void *alpha, rocsparse_const_spmat_descr mat, rocsparse_const_dnvec_descr x, const rocsparse_dnvec_descr y, rocsparse_datatype compute_type, rocsparse_spsv_alg alg, rocsparse_spsv_stage stage, size_t *buffer_size, void *temp_buffer)#
Sparse triangular system solve.
rocsparse_spsvsolves a triangular linear system of equations defined by a sparse \(m \times m\) square matrix \(op(A)\), given in CSR or COO storage format, such that\[ op(A) \cdot y = \alpha \cdot x, \]with\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans == rocsparse_operation_none} \\ A^T, & \text{if trans == rocsparse_operation_transpose} \end{array} \right. \end{split}\]and where \(y\) is the dense solution vector and \(x\) is the dense right-hand side vector.Performing the above operation requires three stages. First,
rocsparse_spsvmust be called with the stage rocsparse_spsv_stage_buffer_size, which will determine the size of the required temporary storage buffer. The user then allocates this buffer and callsrocsparse_spsvwith the stage rocsparse_spsv_stage_preprocess, which will perform analysis on the sparse matrix \(op(A)\). Finally, complete the computation by callingrocsparse_spsvwith the stage rocsparse_spsv_stage_compute. The buffer size, buffer allocation, and preprocess stages only need to be called once for a given sparse matrix \(op(A)\), while the computation stage can be repeatedly used with different \(x\) and \(y\) vectors.rocsparse_spsvsupports rocsparse_indextype_i32 and rocsparse_indextype_i64 index types for storing the row pointer and column indices arrays of the sparse matrices.rocsparse_spsvsupports the following data types for \(op(A)\), \(x\), \(y\) and compute types for \(\alpha\):- Uniform Precisions:
A / X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Example
int main() { // 1 0 0 0 // A = 4 2 0 0 // 0 3 7 0 // 0 0 0 1 int m = 4; std::vector<int> hcsr_row_ptr = {0, 1, 3, 5, 6}; std::vector<int> hcsr_col_ind = {0, 0, 1, 1, 2, 3}; std::vector<float> hcsr_val = {1, 4, 2, 3, 7, 1}; std::vector<float> hx(m, 1.0f); std::vector<float> hy(m, 0.0f); // Scalar alpha float alpha = 1.0f; int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0]; // Offload data to device int* dcsr_row_ptr; int* dcsr_col_ind; float* dcsr_val; float* dx; float* dy; HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1))); HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz)); HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz)); HIP_CHECK(hipMalloc(&dx, sizeof(float) * m)); HIP_CHECK(hipMalloc(&dy, sizeof(float) * m)); HIP_CHECK( hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dx, hx.data(), sizeof(float) * m, hipMemcpyHostToDevice)); rocsparse_handle handle; rocsparse_spmat_descr matA; rocsparse_dnvec_descr vecX; rocsparse_dnvec_descr vecY; rocsparse_indextype row_idx_type = rocsparse_indextype_i32; rocsparse_indextype col_idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_datatype compute_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; rocsparse_operation trans = rocsparse_operation_none; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // Create sparse matrix A ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA, m, m, nnz, dcsr_row_ptr, dcsr_col_ind, dcsr_val, row_idx_type, col_idx_type, idx_base, data_type)); // Create dense vector X ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecX, m, dx, data_type)); // Create dense vector Y ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, m, dy, data_type)); // Call spsv to get buffer size size_t buffer_size; ROCSPARSE_CHECK(rocsparse_spsv(handle, trans, &alpha, matA, vecX, vecY, compute_type, rocsparse_spsv_alg_default, rocsparse_spsv_stage_buffer_size, &buffer_size, nullptr)); void* temp_buffer; HIP_CHECK(hipMalloc(&temp_buffer, buffer_size)); // Call spsv to perform analysis ROCSPARSE_CHECK(rocsparse_spsv(handle, trans, &alpha, matA, vecX, vecY, compute_type, rocsparse_spsv_alg_default, rocsparse_spsv_stage_preprocess, &buffer_size, temp_buffer)); // Call spsv to perform computation ROCSPARSE_CHECK(rocsparse_spsv(handle, trans, &alpha, matA, vecX, vecY, compute_type, rocsparse_spsv_alg_default, rocsparse_spsv_stage_compute, &buffer_size, temp_buffer)); // Copy result back to host HIP_CHECK(hipMemcpy(hy.data(), dy, sizeof(float) * m, hipMemcpyDeviceToHost)); // Clear rocSPARSE ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA)); ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecX)); ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); // Clear device memory HIP_CHECK(hipFree(dcsr_row_ptr)); HIP_CHECK(hipFree(dcsr_col_ind)); HIP_CHECK(hipFree(dcsr_val)); HIP_CHECK(hipFree(dx)); HIP_CHECK(hipFree(dy)); HIP_CHECK(hipFree(temp_buffer)); return 0; }
Note
The sparse matrix formats currently supported are: rocsparse_format_coo and rocsparse_format_csr.
Note
Only the rocsparse_spsv_stage_buffer_size stage and the rocsparse_spsv_stage_compute stage are non-blocking and executed asynchronously with respect to the host. They can return before the actual computation has finished. The rocsparse_spsv_stage_preprocess stage is blocking with respect to the host.
Note
Currently, only
trans== rocsparse_operation_none andtrans== rocsparse_operation_transpose is supported.Note
Only the rocsparse_spsv_stage_buffer_size stage and the rocsparse_spsv_stage_compute stage support execution in a hipGraph context. The rocsparse_spsv_stage_preprocess stage does not support hipGraph.
Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
trans – [in] matrix operation type.
alpha – [in] scalar \(\alpha\).
mat – [in] matrix descriptor.
x – [in] vector descriptor.
y – [inout] vector descriptor.
compute_type – [in] floating point precision for the SpSV computation.
alg – [in] SpSV algorithm for the SpSV computation.
stage – [in] SpSV stage for the SpSV computation.
buffer_size – [out] number of bytes of the temporary storage buffer.
temp_buffer – [in] temporary storage buffer allocated by the user. When the rocsparse_spsv_stage_buffer_size stage is passed, the required allocation size (in bytes) is written to
buffer_sizeand the function returns without performing the SpSV operation. This buffer is non-persistent, and no data is stored in it. Therefore, this memory can be freed or reused for other tasks between the analysis phase and the compute phase.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
alpha,mat,x,y, orbuffer_sizepointer is invalid.rocsparse_status_not_implemented –
trans,compute_type,stage, oralgis currently not supported.
rocsparse_sptrsv_buffer_size()#
-
rocsparse_status rocsparse_sptrsv_buffer_size(rocsparse_handle handle, rocsparse_sptrsv_descr sptrsv_descr, rocsparse_const_spmat_descr spmat_descr, rocsparse_const_dnvec_descr x, rocsparse_const_dnvec_descr y, rocsparse_sptrsv_stage sptrsv_stage, size_t *buffer_size_in_bytes, rocsparse_error *p_error)#
rocsparse_sptrsv_buffer_sizereturns the size of the required buffer to execute the given stage of the SpTrSV operation. This routine is used in conjunction with rocsparse_sptrsv(). See rocsparse_sptrsv for a full description and example.Note
This routine does not support execution in a hipGraph context.
Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
sptrsv_descr – [in] SpTrSV descriptor.
spmat_descr – [in] sparse matrix descriptor.
x – [in] dense vector descriptor.
y – [in] dense vector descriptor.
sptrsv_stage – [in] stage for the SpTrSV computation.
buffer_size_in_bytes – [out] number of bytes of the buffer.
p_error – [out] error descriptor created if the returned status is not rocsparse_status_success. A null pointer can be passed if an error descriptor is not required.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_value – the
sptrsv_stagevalue is invalid.rocsparse_status_invalid_pointer –
sptrsv_descr,spmat_descr,x,y, orbuffer_size_in_bytespointer is invalid.
rocsparse_sptrsv()#
-
rocsparse_status rocsparse_sptrsv(rocsparse_handle handle, rocsparse_sptrsv_descr sptrsv_descr, rocsparse_const_spmat_descr A, rocsparse_const_dnvec_descr x, rocsparse_dnvec_descr y, rocsparse_sptrsv_stage sptrsv_stage, size_t buffer_size_in_bytes, void *buffer, rocsparse_error *p_error)#
Sparse triangular solve.
rocsparse_sptrsvsolves a triangular linear system of equations defined by a sparse \(m \times m\) square matrix \(op(A)\), such that\[ op(A) \cdot y = \alpha \cdot x, \]with\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if op == rocsparse_operation_none} \\ A^T, & \text{if op == rocsparse_operation_transpose} \\ A^H, & \text{if op == rocsparse_operation_conjugate_transpose} \\ \end{array} \right. \end{split}\]and where \(y\) is the dense solution vector and \(x\) is the dense right-hand side vector.Performing the above operation requires two stages, the stage rocsparse_sptrsv_stage_analysis and the stage rocsparse_sptrsv_stage_compute. The stage rocsparse_sptrsv_stage_analysis is required to perform the stage rocsparse_sptrsv_stage_compute and only need to be called once for a given sparse matrix \(op(A)\), while the stage rocsparse_sptrsv_stage_compute can be repeatedly used with different \(x\) and \(y\) vectors.
rocsparse_sptrsvsupports the following data types for \(op(A)\), \(x\), \(y\), and scalar \(\alpha\):- Uniform Precisions:
A / X / Y / scalar
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Example
int main() { // 1 0 0 0 // A = 4 2 0 0 // 0 3 7 0 // 0 0 0 1 int m = 4; std::vector<int> hcsr_row_ptr = {0, 1, 3, 5, 6}; std::vector<int> hcsr_col_ind = {0, 0, 1, 1, 2, 3}; std::vector<float> hcsr_val = {1, 4, 2, 3, 7, 1}; std::vector<float> hx(m, 1.0f); std::vector<float> hy(m, 0.0f); // Scalar alpha float alpha = 1.0f; int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0]; // Offload data to device int* dcsr_row_ptr; int* dcsr_col_ind; float* dcsr_val; float* dx; float* dy; HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1))); HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz)); HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz)); HIP_CHECK(hipMalloc(&dx, sizeof(float) * m)); HIP_CHECK(hipMalloc(&dy, sizeof(float) * m)); HIP_CHECK( hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dx, hx.data(), sizeof(float) * m, hipMemcpyHostToDevice)); rocsparse_handle handle; rocsparse_spmat_descr matA; rocsparse_dnvec_descr vecX; rocsparse_dnvec_descr vecY; rocsparse_indextype row_idx_type = rocsparse_indextype_i32; rocsparse_indextype col_idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_datatype compute_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; rocsparse_operation trans = rocsparse_operation_none; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // Create sparse matrix A ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA, m, m, nnz, dcsr_row_ptr, dcsr_col_ind, dcsr_val, row_idx_type, col_idx_type, idx_base, data_type)); // Create dense vector X ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecX, m, dx, data_type)); // Create dense vector Y ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, m, dy, data_type)); rocsparse_sptrsv_descr sptrsv_descr; ROCSPARSE_CHECK(rocsparse_create_sptrsv_descr(&sptrsv_descr)); const rocsparse_sptrsv_alg sptrsv_alg = rocsparse_sptrsv_alg_default; ROCSPARSE_CHECK(rocsparse_sptrsv_set_input(handle, sptrsv_descr, rocsparse_sptrsv_input_alg, &sptrsv_alg, sizeof(sptrsv_alg), nullptr)); const rocsparse_operation sptrsv_operation = rocsparse_operation_none; ROCSPARSE_CHECK(rocsparse_sptrsv_set_input(handle, sptrsv_descr, rocsparse_sptrsv_input_operation, &sptrsv_operation, sizeof(sptrsv_operation), nullptr)); const rocsparse_datatype sptrsv_scalar_datatype = rocsparse_datatype_f32_r; ROCSPARSE_CHECK(rocsparse_sptrsv_set_input(handle, sptrsv_descr, rocsparse_sptrsv_input_scalar_datatype, &sptrsv_scalar_datatype, sizeof(sptrsv_scalar_datatype), nullptr)); const rocsparse_datatype sptrsv_compute_datatype = rocsparse_datatype_f32_r; ROCSPARSE_CHECK(rocsparse_sptrsv_set_input(handle, sptrsv_descr, rocsparse_sptrsv_input_compute_datatype, &sptrsv_compute_datatype, sizeof(sptrsv_compute_datatype), nullptr)); const rocsparse_analysis_policy sptrsv_analysis_policy = rocsparse_analysis_policy_reuse; ROCSPARSE_CHECK(rocsparse_sptrsv_set_input(handle, sptrsv_descr, rocsparse_sptrsv_input_analysis_policy, &sptrsv_analysis_policy, sizeof(sptrsv_analysis_policy), nullptr)); size_t buffer_size; void* temp_buffer; // Analysis phase ROCSPARSE_CHECK(rocsparse_sptrsv_buffer_size(handle, sptrsv_descr, matA, vecX, vecY, rocsparse_sptrsv_stage_analysis, &buffer_size, nullptr)); HIP_CHECK(hipMalloc(&temp_buffer, buffer_size)); ROCSPARSE_CHECK(rocsparse_sptrsv(handle, sptrsv_descr, matA, vecX, vecY, rocsparse_sptrsv_stage_analysis, buffer_size, temp_buffer, nullptr)); HIP_CHECK(hipFree(temp_buffer)); temp_buffer = nullptr; int64_t zero_pivot; ROCSPARSE_CHECK(rocsparse_sptrsv_get_output(handle, sptrsv_descr, rocsparse_sptrsv_output_zero_pivot_position, &zero_pivot, sizeof(zero_pivot), nullptr)); if(zero_pivot != -1) { std::cout << "zero pivot detected during analysis at position " << zero_pivot << std::endl; } // // Compute phase. // ROCSPARSE_CHECK(rocsparse_sptrsv_set_input(handle, sptrsv_descr, rocsparse_sptrsv_input_scalar_alpha, &alpha, sizeof(&alpha), nullptr)); ROCSPARSE_CHECK(rocsparse_sptrsv_buffer_size(handle, sptrsv_descr, matA, vecX, vecY, rocsparse_sptrsv_stage_compute, &buffer_size, nullptr)); HIP_CHECK(hipMalloc(&temp_buffer, buffer_size)); ROCSPARSE_CHECK(rocsparse_sptrsv(handle, sptrsv_descr, matA, vecX, vecY, rocsparse_sptrsv_stage_compute, buffer_size, temp_buffer, nullptr)); // Device synchronization hipStream_t stream; ROCSPARSE_CHECK(rocsparse_get_stream(handle, &stream)); HIP_CHECK(hipStreamSynchronize(stream)); ROCSPARSE_CHECK(rocsparse_sptrsv_get_output(handle, sptrsv_descr, rocsparse_sptrsv_output_zero_pivot_position, &zero_pivot, sizeof(zero_pivot), nullptr)); if(zero_pivot != -1) { std::cout << "zero pivot detected during compute phase at position " << zero_pivot << std::endl; } // Clear rocSPARSE ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA)); ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecX)); ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); ROCSPARSE_CHECK(rocsparse_destroy_sptrsv_descr(sptrsv_descr)); // Copy result back to host HIP_CHECK(hipMemcpy(hy.data(), dy, sizeof(float) * m, hipMemcpyDeviceToHost)); // Clear device memory HIP_CHECK(hipFree(dcsr_row_ptr)); HIP_CHECK(hipFree(dcsr_col_ind)); HIP_CHECK(hipFree(dcsr_val)); HIP_CHECK(hipFree(dx)); HIP_CHECK(hipFree(dy)); HIP_CHECK(hipFree(temp_buffer)); return 0; }
Note
The descriptor
rocsparse_sptrsv_descrneeds to be configured with rocsparse_sptrsv_set_input.Note
The sparse matrix formats currently supported are: rocsparse_format_coo and rocsparse_format_csr.
Note
the rocsparse_sptrsv_stage_compute stage is non-blocking and executed asynchronously with respect to the host. It can return before the actual computation has finished. The rocsparse_sptrsv_stage_analysis stage is blocking with respect to the host.
Note
Currently, only
trans== rocsparse_operation_none andtrans== rocsparse_operation_transpose are supported. Only the rocsparse_sptrsv_stage_compute stage supports execution in a hipGraph context. The rocsparse_sptrsv_stage_analysis stage does not support hipGraph.Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
sptrsv_descr – [in] descriptor of the routine.
A – [in] matrix descriptor.
x – [in] vector descriptor.
y – [inout] vector descriptor.
sptrsv_stage – [in] stage for the SpTRSV computation.
buffer_size_in_bytes – [in] number of bytes of the buffer.
buffer – [in] buffer allocated by the user.
p_error – [out] error descriptor created if the returned status is not rocsparse_status_success. A null pointer can be passed if the user is not interested in obtaining an error descriptor.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
sptrsv_descr,A,x, oryis invalid, orbufferis null andbuffer_size_in_bytesis non-zero, orbufferis not null andbuffer_size_in_bytesis zero.rocsparse_status_invalid_value –
sptrsv_stageis invalid.
rocsparse_spilu0_buffer_size()#
-
rocsparse_status rocsparse_spilu0_buffer_size(rocsparse_handle handle, rocsparse_spilu0_descr spilu0_descr, rocsparse_const_spmat_descr A, rocsparse_const_spmat_descr P, rocsparse_spilu0_stage spilu0_stage, size_t *p_buffer_size_in_bytes, rocsparse_error *p_error)#
Get buffer size for incomplete LU factorization with 0 fill-ins and no pivoting.
rocsparse_spilu0_buffer_sizereturns the size of the non-persistent buffer that is required by rocsparse_spilu0, and must be allocated by the user.Note
This function is non-blocking and executed asynchronously with respect to the host. It can return before the actual computation has finished.
Note
This routine supports execution in a hipGraph context.
Note
Supported formats are rocsparse_format_csr and rocsparse_format_bsr.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
spilu0_descr – [in] Spilu0 descriptor.
A – [in] descriptor of the matrix to factorize.
P – [in] descriptor of the factorization.
spilu0_stage – [in] stage for the Spilu0 computation.
p_buffer_size_in_bytes – [out] number of bytes of the buffer.
p_error – [out] error descriptor created if the returned status is not rocsparse_status_success. A null pointer can be passed if the user is not interested in obtaining an error descriptor.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_not_implemented – the sparse format is invalid or the preconditioner
Pis not identical to the matrix to factorizeA.rocsparse_status_invalid_value – the
spilu0_stagevalue is invalid.rocsparse_status_invalid_pointer –
spilu0_descr,A,P, orp_buffer_size_in_bytespointer is invalid.
rocsparse_spilu0()#
-
rocsparse_status rocsparse_spilu0(rocsparse_handle handle, rocsparse_spilu0_descr spilu0_descr, rocsparse_const_spmat_descr A, rocsparse_spmat_descr P, rocsparse_spilu0_stage spilu0_stage, size_t buffer_size_in_bytes, void *buffer, rocsparse_error *p_error)#
Incomplete LU factorization with 0 fill-ins and no pivoting.
rocsparse_spilu0computes the incomplete LU factorization with 0 fill-ins and no pivoting of a sparse \(m \times m\) matrix \(A\), such that\[ A \approx LU \]where the lower triangular matrix \(L\) and the upper triangular matrix \(U\) are computed using:\[\begin{split} \begin{array}{ll} L_{ij} = \frac{1}{U_{jj}}(A_{ij} - \sum_{k=0}^{j-1}L_{ik} \times U_{kj}), & \text{if i > j} \\ U_{ij} = (A_{ij} - \sum_{k=0}^{j-1}L_{ik} \times U_{kj}), & \text{if i <= j} \end{array} \end{split}\]for each entry found in the matrix \(A\).Performing the above operation requires two stages, the stage rocsparse_spilu0_stage_analysis and the stage rocsparse_spilu0_stage_compute. The stage rocsparse_spilu0_stage_analysis is required to perform the stage rocsparse_spilu0_stage_compute and only needs to be called once for a given sparse matrix \(A\), while the stage rocsparse_spilu0_stage_compute can be repeatedly used with different matrices \(A\) that have the same sparsity pattern.
rocsparse_spilu0supports the following data types forA: rocsparse_datatype_f32_r, rocsparse_datatype_f64_r, rocsparse_datatype_f32_c, and rocsparse_datatype_f64_c.- Example
int main() { // // Define a matrix. // // 4 1 0 0 // A = 2 8 1 0 // 0 1 8 1 // 0 0 2 4 // static constexpr int32_t m = 4; static constexpr int32_t nnz = 10; static constexpr int64_t batch_count = 3; static constexpr rocsparse_index_base idx_base = rocsparse_index_base_zero; static constexpr rocsparse_indextype row_idx_type = rocsparse_indextype_i32; static constexpr rocsparse_indextype col_idx_type = rocsparse_indextype_i32; static constexpr rocsparse_datatype data_type = rocsparse_datatype_f32_r; const int32_t hcsr_row_ptr[m + 1] = {idx_base + 0, idx_base + 2, idx_base + 5, idx_base + 8, idx_base + 10}; const int32_t hcsr_col_ind[nnz] = {idx_base + 0, idx_base + 1, idx_base + 0, idx_base + 1, idx_base + 2, idx_base + 1, idx_base + 2, idx_base + 3, idx_base + 2, idx_base + 3}; const float hcsr_val[nnz * batch_count] = { // // // 4, 1, 2, 8, 1, 1, 8, 1, 2, 4, 4, 1, 0, 8, 1, 8, 1, 0, 2, 4, 4, 1, 4, 1.001, 0, 1, 8, 1, 2, 4 }; const double singularity_tolerance = 0.01; const int32_t boost_enable = 0; const double boost_tolerance = 0.01; const double boost_value = 1; // // Offload data to device // int32_t* dcsr_row_ptr; HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int32_t) * (m + 1))); HIP_CHECK( hipMemcpy(dcsr_row_ptr, hcsr_row_ptr, sizeof(int32_t) * (m + 1), hipMemcpyHostToDevice)); int32_t* dcsr_col_ind; HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int32_t) * nnz)); HIP_CHECK(hipMemcpy(dcsr_col_ind, hcsr_col_ind, sizeof(int32_t) * nnz, hipMemcpyHostToDevice)); float* dcsr_val; HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz * batch_count)); HIP_CHECK( hipMemcpy(dcsr_val, hcsr_val, sizeof(float) * nnz * batch_count, hipMemcpyHostToDevice)); // // Create handle. // rocsparse_handle handle; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // // Create sparse matrix A // rocsparse_spmat_descr matA; ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA, m, m, nnz, dcsr_row_ptr, dcsr_col_ind, dcsr_val, row_idx_type, col_idx_type, idx_base, data_type)); ROCSPARSE_CHECK(rocsparse_csr_set_strided_batch(matA, batch_count, 0, nnz)); // // Create the descriptior of the Incomplete Cholesky algorithm of level 0. // rocsparse_spilu0_descr spilu0_descr; ROCSPARSE_CHECK(rocsparse_spilu0_descr_create(handle, &spilu0_descr, nullptr)); // // Configure the descriptor. // const rocsparse_spilu0_alg spilu0_alg = rocsparse_spilu0_alg_default; ROCSPARSE_CHECK(rocsparse_spilu0_set_input(handle, spilu0_descr, rocsparse_spilu0_input_alg, &spilu0_alg, sizeof(spilu0_alg), nullptr)); const rocsparse_datatype spilu0_compute_datatype = rocsparse_datatype_f32_r; ROCSPARSE_CHECK(rocsparse_spilu0_set_input(handle, spilu0_descr, rocsparse_spilu0_input_compute_datatype, &spilu0_compute_datatype, sizeof(spilu0_compute_datatype), nullptr)); const rocsparse_analysis_policy spilu0_analysis_policy = rocsparse_analysis_policy_reuse; ROCSPARSE_CHECK(rocsparse_spilu0_set_input(handle, spilu0_descr, rocsparse_spilu0_input_analysis_policy, &spilu0_analysis_policy, sizeof(spilu0_analysis_policy), nullptr)); ROCSPARSE_CHECK(rocsparse_spilu0_set_input(handle, spilu0_descr, rocsparse_spilu0_input_singularity_tolerance, &singularity_tolerance, sizeof(double), nullptr)); ROCSPARSE_CHECK(rocsparse_spilu0_set_input(handle, spilu0_descr, rocsparse_spilu0_input_boost_enable, &boost_enable, sizeof(int32_t), nullptr)); ROCSPARSE_CHECK(rocsparse_spilu0_set_input(handle, spilu0_descr, rocsparse_spilu0_input_boost_tolerance, &boost_tolerance, sizeof(double), nullptr)); ROCSPARSE_CHECK(rocsparse_spilu0_set_input(handle, spilu0_descr, rocsparse_spilu0_input_boost_value, &boost_value, sizeof(double), nullptr)); hipStream_t stream; ROCSPARSE_CHECK(rocsparse_get_stream(handle, &stream)); // // Spilu0 Analysis phase // size_t non_persistent_buffer_size_in_bytes; void* non_persistent_buffer; ROCSPARSE_CHECK(rocsparse_spilu0_buffer_size(handle, spilu0_descr, matA, matA, rocsparse_spilu0_stage_analysis, &non_persistent_buffer_size_in_bytes, nullptr)); HIP_CHECK(hipMalloc(&non_persistent_buffer, non_persistent_buffer_size_in_bytes)); ROCSPARSE_CHECK(rocsparse_spilu0(handle, spilu0_descr, matA, matA, rocsparse_spilu0_stage_analysis, non_persistent_buffer_size_in_bytes, non_persistent_buffer, nullptr)); // // Check for any singularities after analysis. // ROCSPARSE_CHECK(rocsparse_set_pointer_mode(handle, rocsparse_pointer_mode_host)); rocsparse_singularity post_analysis_singularity[batch_count]; ROCSPARSE_CHECK(rocsparse_spilu0_get_output(handle, spilu0_descr, rocsparse_spilu0_output_singularity, post_analysis_singularity, sizeof(rocsparse_singularity), nullptr)); int64_t singularity_position[batch_count]; ROCSPARSE_CHECK(rocsparse_spilu0_get_output(handle, spilu0_descr, rocsparse_spilu0_output_singularity_position, singularity_position, sizeof(int64_t), nullptr)); HIP_CHECK(hipStreamSynchronize(stream)); for(int64_t batch_index = 0; batch_index < batch_count; ++batch_index) { switch(post_analysis_singularity[batch_index]) { case rocsparse_singularity_none: { break; } case rocsparse_singularity_symbolic: { std::cout << "symbolic singularity detected at batch_index: " << batch_index << ", at position: " << singularity_position[batch_index] << std::endl; ROCSPARSE_CHECK(rocsparse_status_zero_pivot); break; } case rocsparse_singularity_numeric_exact: case rocsparse_singularity_numeric_near: { // // Not from analysis. // ROCSPARSE_CHECK(rocsparse_status_internal_error); break; } } } // // Compute phase. // ROCSPARSE_CHECK(rocsparse_spilu0_buffer_size(handle, spilu0_descr, matA, matA, rocsparse_spilu0_stage_compute, &non_persistent_buffer_size_in_bytes, nullptr)); HIP_CHECK(hipFree(non_persistent_buffer)); non_persistent_buffer = nullptr; HIP_CHECK(hipMalloc(&non_persistent_buffer, non_persistent_buffer_size_in_bytes)); ROCSPARSE_CHECK(rocsparse_spilu0(handle, spilu0_descr, matA, matA, rocsparse_spilu0_stage_compute, non_persistent_buffer_size_in_bytes, non_persistent_buffer, nullptr)); // // Check for any singularities after compute. // rocsparse_singularity post_compute_singularity[batch_count]; ROCSPARSE_CHECK(rocsparse_set_pointer_mode(handle, rocsparse_pointer_mode_host)); ROCSPARSE_CHECK(rocsparse_spilu0_get_output(handle, spilu0_descr, rocsparse_spilu0_output_singularity, post_compute_singularity, sizeof(rocsparse_singularity), nullptr)); ROCSPARSE_CHECK(rocsparse_spilu0_get_output(handle, spilu0_descr, rocsparse_spilu0_output_singularity_position, singularity_position, sizeof(int64_t), nullptr)); HIP_CHECK(hipStreamSynchronize(stream)); for(int64_t batch_index = 0; batch_index < batch_count; ++batch_index) { switch(post_compute_singularity[batch_index]) { case rocsparse_singularity_none: { break; } case rocsparse_singularity_symbolic: { std::cout << "numeric symbolic singularity detected at batch_index: " << batch_index << ", at position: " << singularity_position[batch_index] << std::endl; ROCSPARSE_CHECK(rocsparse_status_internal_error); break; } case rocsparse_singularity_numeric_exact: { std::cout << "numeric exact singularity detected at batch_index: " << batch_index << ", at position: " << singularity_position[batch_index] << std::endl; break; } case rocsparse_singularity_numeric_near: { std::cout << "numeric near singularity detected at batch_index: " << batch_index << ", at position: " << singularity_position[batch_index] << std::endl; break; } } } HIP_CHECK(hipFree(non_persistent_buffer)); ROCSPARSE_CHECK(rocsparse_spilu0_descr_destroy(handle, spilu0_descr, nullptr)); ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); HIP_CHECK(hipFree(dcsr_row_ptr)); HIP_CHECK(hipFree(dcsr_col_ind)); HIP_CHECK(hipFree(dcsr_val)); return 0; }
Note
The descriptor
spilu0_descrneeds to be configured with rocsparse_spilu0_set_input.Note
The sparse matrix formats currently supported are rocsparse_format_csr and rocsparse_format_bsr.
Note
the rocsparse_spilu0_stage_compute stage is non-blocking and executed asynchronously with respect to the host. It can return before the actual computation has finished. The rocsparse_spilu0_stage_analysis stage is blocking with respect to the host.
Note
Only the rocsparse_spilu0_stage_compute stage supports execution in a hipGraph context. The rocsparse_spilu0_stage_analysis stage does not support hipGraph.
Note
This routine only supports uniform batched computation, that is, same sparsity pattern but batched values of the matrices.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
spilu0_descr – [in] Spilu0 descriptor.
A – [in] descriptor of the matrix to factorize.
P – [out] descriptor of the factorization.
spilu0_stage – [in] stage for the Spilu0 computation.
buffer_size_in_bytes – [in] number of bytes of the buffer.
buffer – [in] buffer allocated by the user.
p_error – [out] error descriptor created if the returned status is not rocsparse_status_success. A null pointer can be passed if an error descriptor is not required.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_not_implemented – the sparse format is invalid or the preconditioner
Pis not identical to the matrix to factorizeA.rocsparse_status_invalid_value – the
spilu0_stagevalue is invalid.rocsparse_status_invalid_pointer –
spilu0_descr,A,P, orbuffer_size_in_bytespointer is invalid.
rocsparse_spic0_buffer_size()#
-
rocsparse_status rocsparse_spic0_buffer_size(rocsparse_handle handle, rocsparse_spic0_descr spic0_descr, rocsparse_const_spmat_descr A, rocsparse_const_spmat_descr P, rocsparse_spic0_stage spic0_stage, size_t *p_buffer_size_in_bytes, rocsparse_error *p_error)#
Incomplete Cholesky factorization with 0 fill-ins and no pivoting.
rocsparse_spic0_buffer_sizereturns the size of the non-persistent buffer that is required by rocsparse_spic0 and must be allocated by the user.Note
This function is non-blocking and executed asynchronously with respect to the host. It can return before the actual computation has finished.
Note
This routine supports execution in a hipGraph context.
Note
This routine only supports uniform batched computation, that is, the same sparsity pattern but batched values of the matrices.
Note
Supported formats are rocsparse_format_csr and rocsparse_format_bsr.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
spic0_descr – [in] Spic0 descriptor.
A – [in] descriptor of the matrix to factorize.
P – [in] descriptor of the factorization. In-place
P=Ais allowed.spic0_stage – [in] stage for the Spic0 computation.
p_buffer_size_in_bytes – [out] number of bytes of the buffer.
p_error – [out] error descriptor created if the returned status is not rocsparse_status_success. A null pointer can be passed if error descriptor is not required.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_not_implemented – the sparse format is invalid or the preconditioner
Pis not identical to the matrix to factorizeA.rocsparse_status_invalid_value – the
spic0_stagevalue is invalid.rocsparse_status_invalid_pointer –
spic0_descr,A,P, orp_buffer_size_in_bytespointer is invalid.
rocsparse_spic0()#
-
rocsparse_status rocsparse_spic0(rocsparse_handle handle, rocsparse_spic0_descr spic0_descr, rocsparse_const_spmat_descr A, rocsparse_spmat_descr P, rocsparse_spic0_stage spic0_stage, size_t buffer_size_in_bytes, void *buffer, rocsparse_error *p_error)#
Incomplete Cholesky factorization with 0 fill-ins and no pivoting.
rocsparse_spic0computes the incomplete Cholesky factorization with 0 fill-ins and no pivoting of a sparse \(m \times m\) matrix \(A\), such that\[ A \approx LL^T \]where the lower triangular matrix \(L\) is computed using:\[\begin{split} L_{ij} = \left\{ \begin{array}{ll} \sqrt{A_{jj} - \sum_{k=0}^{j-1}(L_{jk})^{2}}, & \text{if i == j} \\ \frac{1}{L_{jj}}(A_{ij} - \sum_{k=0}^{j-1}L_{ik} \times L_{jk}), & \text{if i > j} \end{array} \right. \end{split}\]for each entry found in the matrix \(A\).Performing the above operation requires two stages, the stage rocsparse_spic0_stage_analysis and the stage rocsparse_spic0_stage_compute. The stage rocsparse_spic0_stage_analysis is required to perform the stage rocsparse_spic0_stage_compute and only needs to be called once for a given sparse matrix \(A\), while the stage rocsparse_spic0_stage_compute can be repeatedly used with different matrices \(A\) that have the same sparsity pattern.
rocsparse_spic0supports the following data types forA: rocsparse_datatype_f32_r, rocsparse_datatype_f64_r, rocsparse_datatype_f32_c, and rocsparse_datatype_f64_c.- Example
int main() { // // Define a matrix. // // 4 0 0 0 // A = 2 8 0 0 // 0 1 8 0 // 0 0 2 4 // static constexpr int64_t batch_count = 2; static constexpr int32_t m = 4; static constexpr int32_t nnz = 7; static constexpr rocsparse_index_base idx_base = rocsparse_index_base_zero; static constexpr rocsparse_indextype row_idx_type = rocsparse_indextype_i32; static constexpr rocsparse_indextype col_idx_type = rocsparse_indextype_i32; static constexpr rocsparse_datatype data_type = rocsparse_datatype_f32_r; const int32_t hcsr_row_ptr[m + 1] = {idx_base + 0, idx_base + 1, idx_base + 3, idx_base + 5, idx_base + 7}; const int32_t hcsr_col_ind[nnz] = {idx_base + 0, idx_base + 0, idx_base + 1, idx_base + 1, idx_base + 2, idx_base + 2, idx_base + 3}; const float hcsr_val[nnz * 2] = {4, 2, 8, 1, 8, 2, 4, 4, 2, 8, 1, 8, 2, 4 }; // // Offload data to device // int32_t* dcsr_row_ptr; HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int32_t) * (m + 1))); HIP_CHECK( hipMemcpy(dcsr_row_ptr, hcsr_row_ptr, sizeof(int32_t) * (m + 1), hipMemcpyHostToDevice)); int32_t* dcsr_col_ind; HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int32_t) * nnz)); HIP_CHECK(hipMemcpy(dcsr_col_ind, hcsr_col_ind, sizeof(int32_t) * nnz, hipMemcpyHostToDevice)); float* dcsr_val; HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz * batch_count)); HIP_CHECK( hipMemcpy(dcsr_val, hcsr_val, sizeof(float) * nnz * batch_count, hipMemcpyHostToDevice)); // // Create handle. // rocsparse_handle handle; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // // Create sparse matrix A // rocsparse_spmat_descr matA; ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA, m, m, nnz, dcsr_row_ptr, dcsr_col_ind, dcsr_val, row_idx_type, col_idx_type, idx_base, data_type)); ROCSPARSE_CHECK(rocsparse_csr_set_strided_batch(matA, batch_count, 0, nnz)); // // Create the descriptior of the Incomplete Cholesky algorithm of level 0. // rocsparse_spic0_descr spic0_descr; ROCSPARSE_CHECK(rocsparse_spic0_descr_create(handle, &spic0_descr, nullptr)); // // Configure the descriptor. // const rocsparse_spic0_alg spic0_alg = rocsparse_spic0_alg_default; ROCSPARSE_CHECK(rocsparse_spic0_set_input( handle, spic0_descr, rocsparse_spic0_input_alg, &spic0_alg, sizeof(spic0_alg), nullptr)); const rocsparse_datatype spic0_compute_datatype = rocsparse_datatype_f32_r; ROCSPARSE_CHECK(rocsparse_spic0_set_input(handle, spic0_descr, rocsparse_spic0_input_compute_datatype, &spic0_compute_datatype, sizeof(spic0_compute_datatype), nullptr)); const rocsparse_analysis_policy spic0_analysis_policy = rocsparse_analysis_policy_reuse; ROCSPARSE_CHECK(rocsparse_spic0_set_input(handle, spic0_descr, rocsparse_spic0_input_analysis_policy, &spic0_analysis_policy, sizeof(spic0_analysis_policy), nullptr)); hipStream_t stream; ROCSPARSE_CHECK(rocsparse_get_stream(handle, &stream)); // // SpIC0 Analysis phase // size_t non_persistent_buffer_size_in_bytes; void* non_persistent_buffer; ROCSPARSE_CHECK(rocsparse_spic0_buffer_size(handle, spic0_descr, matA, matA, rocsparse_spic0_stage_analysis, &non_persistent_buffer_size_in_bytes, nullptr)); HIP_CHECK(hipMalloc(&non_persistent_buffer, non_persistent_buffer_size_in_bytes)); ROCSPARSE_CHECK(rocsparse_spic0(handle, spic0_descr, matA, matA, rocsparse_spic0_stage_analysis, non_persistent_buffer_size_in_bytes, non_persistent_buffer, nullptr)); // // Check for any singularities after analysis. // ROCSPARSE_CHECK(rocsparse_set_pointer_mode(handle, rocsparse_pointer_mode_host)); rocsparse_singularity post_analysis_singularity[batch_count]; rocsparse_singularity post_compute_singularity[batch_count]; int64_t singularity_position[batch_count]; ROCSPARSE_CHECK(rocsparse_spic0_get_output(handle, spic0_descr, rocsparse_spic0_output_singularity, post_analysis_singularity, sizeof(rocsparse_singularity), nullptr)); ROCSPARSE_CHECK(rocsparse_spic0_get_output(handle, spic0_descr, rocsparse_spic0_output_singularity_position, singularity_position, sizeof(int64_t), nullptr)); HIP_CHECK(hipStreamSynchronize(stream)); for(int64_t batch_index = 0; batch_index < batch_count; ++batch_index) { switch(post_analysis_singularity[batch_index]) { case rocsparse_singularity_none: { break; } case rocsparse_singularity_symbolic: { std::cout << "symbolic singularity detected at batch_index: " << batch_index << ", at position: " << singularity_position[batch_index] << std::endl; ROCSPARSE_CHECK(rocsparse_status_zero_pivot); break; } case rocsparse_singularity_numeric_exact: case rocsparse_singularity_numeric_near: { // // Not from analysis. // ROCSPARSE_CHECK(rocsparse_status_internal_error); break; } } } // // Compute phase. // ROCSPARSE_CHECK(rocsparse_spic0_buffer_size(handle, spic0_descr, matA, matA, rocsparse_spic0_stage_compute, &non_persistent_buffer_size_in_bytes, nullptr)); HIP_CHECK(hipFree(non_persistent_buffer)); non_persistent_buffer = nullptr; HIP_CHECK(hipMalloc(&non_persistent_buffer, non_persistent_buffer_size_in_bytes)); ROCSPARSE_CHECK(rocsparse_spic0(handle, spic0_descr, matA, matA, rocsparse_spic0_stage_compute, non_persistent_buffer_size_in_bytes, non_persistent_buffer, nullptr)); // // Check for any singularities after compute. // ROCSPARSE_CHECK(rocsparse_set_pointer_mode(handle, rocsparse_pointer_mode_host)); ROCSPARSE_CHECK(rocsparse_spic0_get_output(handle, spic0_descr, rocsparse_spic0_output_singularity, post_compute_singularity, sizeof(rocsparse_singularity), nullptr)); ROCSPARSE_CHECK(rocsparse_spic0_get_output(handle, spic0_descr, rocsparse_spic0_output_singularity_position, singularity_position, sizeof(int64_t), nullptr)); HIP_CHECK(hipStreamSynchronize(stream)); for(int64_t batch_index = 0; batch_index < batch_count; ++batch_index) { switch(post_compute_singularity[batch_index]) { case rocsparse_singularity_none: { break; } case rocsparse_singularity_symbolic: { std::cout << "numeric symbolic singularity detected at batch_index: " << batch_index << ", at position: " << singularity_position[batch_index] << std::endl; ROCSPARSE_CHECK(rocsparse_status_internal_error); break; } case rocsparse_singularity_numeric_exact: { std::cout << "numeric exact singularity detected at batch_index: " << batch_index << ", at position: " << singularity_position[batch_index] << std::endl; break; } case rocsparse_singularity_numeric_near: { std::cout << "numeric near singularity detected at batch_index: " << batch_index << ", at position: " << singularity_position[batch_index] << std::endl; break; } } } HIP_CHECK(hipFree(non_persistent_buffer)); ROCSPARSE_CHECK(rocsparse_spic0_descr_destroy(handle, spic0_descr, nullptr)); ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); HIP_CHECK(hipFree(dcsr_row_ptr)); HIP_CHECK(hipFree(dcsr_col_ind)); HIP_CHECK(hipFree(dcsr_val)); return 0; }
Note
The descriptor
spic0_descrneeds to be configured with rocsparse_spic0_set_input.Note
The sparse matrix formats currently supported are rocsparse_format_csr and rocsparse_format_bsr.
Note
the rocsparse_spic0_stage_compute stage is non-blocking and executed asynchronously with respect to the host. It can return before the actual computation has finished. The rocsparse_spic0_stage_analysis stage is blocking with respect to the host.
Note
Only the rocsparse_spic0_stage_compute stage supports execution in a hipGraph context. The rocsparse_spic0_stage_analysis stage does not support hipGraph.
Note
This routine only supports uniform strided batched computation, that is, the same sparsity pattern but strided batched values of the matrices.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
spic0_descr – [in] Spic0 descriptor
A – [in] descriptor of the matrix to factorize.
P – [out] descriptor of the factorization. In-place
P=Ais allowed.spic0_stage – [in] stage for the Spic0 computation.
buffer_size_in_bytes – [in] number of bytes of the buffer.
buffer – [in] buffer allocated by the user.
p_error – [out] error descriptor created if the returned status is not rocsparse_status_success. A null pointer can be passed if an error descriptor is not required.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_not_implemented – the sparse format is invalid or the preconditioner
Pis not identical to the matrix to factorizeA.rocsparse_status_invalid_value – the
spic0_stagevalue is invalid.rocsparse_status_invalid_pointer –
spic0_descr,A,P, orbuffer_size_in_bytespointer is invalid.
rocsparse_spsm()#
-
rocsparse_status rocsparse_spsm(rocsparse_handle handle, rocsparse_operation trans_A, rocsparse_operation trans_B, const void *alpha, rocsparse_const_spmat_descr matA, rocsparse_const_dnmat_descr matB, const rocsparse_dnmat_descr matC, rocsparse_datatype compute_type, rocsparse_spsm_alg alg, rocsparse_spsm_stage stage, size_t *buffer_size, void *temp_buffer)#
Sparse triangular system solve with multiple right-hand sides.
rocsparse_spsmsolves a triangular linear system of equations defined by a sparse \(m \times m\) square matrix \(op(A)\), given in CSR or COO storage format, such that\[ op(A) \cdot C = \alpha \cdot op(B), \]with\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans == rocsparse_operation_none} \\ A^T, & \text{if trans == rocsparse_operation_transpose} \end{array} \right. \end{split}\]and\[\begin{split} op(B) = \left\{ \begin{array}{ll} B, & \text{if trans_B == rocsparse_operation_none} \\ B^T, & \text{if trans_B == rocsparse_operation_transpose} \end{array} \right. \end{split}\]and where \(C\) is the dense solution matrix and \(B\) is the dense right-hand side matrix. Both \(B\) and \(C\) can be in row or column order.Performing the above operation requires three stages. First,
rocsparse_spsmmust be called with the stage rocsparse_spsm_stage_buffer_size, which will determine the size of the required temporary storage buffer. Then allocate this buffer and callrocsparse_spsmwith the stage rocsparse_spsm_stage_preprocess, which will perform analysis on the sparse matrix \(op(A)\). Finally, complete the computation by callingrocsparse_spsmwith the stage rocsparse_spsm_stage_compute. The buffer size, buffer allocation, and preprocess stages only need to be called once for a given sparse triangular matrix \(op(A)\), while the computation stage can be repeatedly used with different \(B\) and \(C\) matrices.As noted above, both \(B\) and \(C\) can be in row or column order (this includes mixing the order so that \(B\) is in row order and \(C\) in column order and vice versa). Internally, however, rocSPARSE kernels solve the system assuming the matrices \(B\) and \(C\) are in row order, as this provides the best memory access. This means that if the matrix \(C\) is not in row order and/or the matrix \(B\) is not row order (or \(B^{T}\) is not column order as this is equivalent to being in row order), then internally, memory copies and/or transposing of data might be performed to get them into the correct order (possibly using extra buffer size). After the computation is completed, additional memory copies and/or transposing of data might be performed to get them back into the user arrays. For the best performance and smallest required temporary storage buffers, use row order for the matrix \(C\) and row order for the matrix \(B\) (or column order if \(B\) is being transposed).
rocsparse_spsmsupports rocsparse_indextype_i32 and rocsparse_indextype_i64 index precisions for storing the row pointer and column indices arrays of the sparse matrices.rocsparse_spsmsupports the following data types for \(op(A)\), \(op(B)\), \(C\), and compute types for \(\alpha\):- Uniform Precisions:
A / B / C / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Example
int main() { // 1 0 0 0 // A = 4 2 0 0 // 0 3 7 0 // 0 0 0 1 int m = 4; int n = 2; std::vector<int> hcsr_row_ptr = {0, 1, 3, 5, 6}; std::vector<int> hcsr_col_ind = {0, 0, 1, 1, 2, 3}; std::vector<float> hcsr_val = {1, 4, 2, 3, 7, 1}; std::vector<float> hB(m * n); std::vector<float> hC(m * n); for(int i = 0; i < n; i++) { for(int j = 0; j < m; j++) { hB[m * i + j] = static_cast<float>(i + 1); } } // Scalar alpha float alpha = 1.0f; int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0]; // Offload data to device int* dcsr_row_ptr; int* dcsr_col_ind; float* dcsr_val; float* dB; float* dC; HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1))); HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz)); HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz)); HIP_CHECK(hipMalloc(&dB, sizeof(float) * m * n)); HIP_CHECK(hipMalloc(&dC, sizeof(float) * m * n)); HIP_CHECK( hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dB, hB.data(), sizeof(float) * m * n, hipMemcpyHostToDevice)); rocsparse_handle handle; rocsparse_spmat_descr matA; rocsparse_dnmat_descr matB; rocsparse_dnmat_descr matC; rocsparse_indextype row_idx_type = rocsparse_indextype_i32; rocsparse_indextype col_idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_datatype compute_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; rocsparse_operation trans_A = rocsparse_operation_none; rocsparse_operation trans_B = rocsparse_operation_none; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // Create sparse matrix A ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA, m, m, nnz, dcsr_row_ptr, dcsr_col_ind, dcsr_val, row_idx_type, col_idx_type, idx_base, data_type)); // Create dense matrix B ROCSPARSE_CHECK( rocsparse_create_dnmat_descr(&matB, m, n, m, dB, data_type, rocsparse_order_column)); // Create dense matrix C ROCSPARSE_CHECK( rocsparse_create_dnmat_descr(&matC, m, n, m, dC, data_type, rocsparse_order_column)); // Call spsv to get buffer size size_t buffer_size; ROCSPARSE_CHECK(rocsparse_spsm(handle, trans_A, trans_B, &alpha, matA, matB, matC, compute_type, rocsparse_spsm_alg_default, rocsparse_spsm_stage_buffer_size, &buffer_size, nullptr)); void* temp_buffer; HIP_CHECK(hipMalloc(&temp_buffer, buffer_size)); // Call spsv to perform analysis ROCSPARSE_CHECK(rocsparse_spsm(handle, trans_A, trans_B, &alpha, matA, matB, matC, compute_type, rocsparse_spsm_alg_default, rocsparse_spsm_stage_preprocess, &buffer_size, temp_buffer)); // Call spsv to perform computation ROCSPARSE_CHECK(rocsparse_spsm(handle, trans_A, trans_B, &alpha, matA, matB, matC, compute_type, rocsparse_spsm_alg_default, rocsparse_spsm_stage_compute, &buffer_size, temp_buffer)); // Copy result back to host HIP_CHECK(hipMemcpy(hC.data(), dC, sizeof(float) * m * n, hipMemcpyDeviceToHost)); std::cout << "hC" << std::endl; for(size_t i = 0; i < hC.size(); ++i) { std::cout << hC[i] << " "; } std::cout << std::endl; // Clear rocSPARSE ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA)); ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(matB)); ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(matC)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); // Clear device memory HIP_CHECK(hipFree(dcsr_row_ptr)); HIP_CHECK(hipFree(dcsr_col_ind)); HIP_CHECK(hipFree(dcsr_val)); HIP_CHECK(hipFree(dB)); HIP_CHECK(hipFree(dC)); HIP_CHECK(hipFree(temp_buffer)); return 0; }
Note
The sparse matrix formats currently supported are: rocsparse_format_coo and rocsparse_format_csr.
Note
Only the rocsparse_spsm_stage_buffer_size stage and the rocsparse_spsm_stage_compute stage are non-blocking and executed asynchronously with respect to the host. They can return before the actual computation has finished. The rocsparse_spsm_stage_preprocess stage is blocking with respect to the host.
Note
Currently, only
trans_A== rocsparse_operation_none andtrans_A== rocsparse_operation_transpose is supported. Currently, onlytrans_B== rocsparse_operation_none andtrans_B== rocsparse_operation_transpose is supported.Note
Only the rocsparse_spsm_stage_buffer_size stage and the rocsparse_spsm_stage_compute stage support execution in a hipGraph context. The rocsparse_spsm_stage_preprocess stage does not support hipGraph.
Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
trans_A – [in] matrix operation type for the sparse matrix \(op(A)\).
trans_B – [in] matrix operation type for the dense matrix \(op(B)\).
alpha – [in] scalar \(\alpha\).
matA – [in] sparse matrix descriptor.
matB – [in] dense matrix descriptor.
matC – [inout] dense matrix descriptor.
compute_type – [in] floating point precision for the SpSM computation.
alg – [in] SpSM algorithm for the SpSM computation.
stage – [in] SpSM stage for the SpSM computation.
buffer_size – [out] number of bytes of the temporary storage buffer.
temp_buffer – [in] temporary storage buffer allocated by the user. When the rocsparse_spsm_stage_buffer_size stage is passed in, the required allocation size (in bytes) is written to
buffer_size, and the function returns without performing the SpSM operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
alpha,matA,matB,matC,descr, orbuffer_sizepointer is invalid.rocsparse_status_not_implemented –
trans_A,trans_B,compute_type,stage, oralgis currently not supported.
rocsparse_sptrsm_buffer_size()#
-
rocsparse_status rocsparse_sptrsm_buffer_size(rocsparse_handle handle, rocsparse_sptrsm_descr sptrsm_descr, rocsparse_const_spmat_descr A, rocsparse_const_dnmat_descr X, rocsparse_const_dnmat_descr Y, rocsparse_sptrsm_stage sptrsm_stage, size_t *buffer_size_in_bytes, rocsparse_error *p_error)#
rocsparse_sptrsm_buffer_sizereturns the size of the required buffer to execute the given stage of the SpTrSM operation. This routine is used in conjunction with rocsparse_sptrsm(). See rocsparse_sptrsm for a full description and example.Note
This routine does not support execution in a hipGraph context.
Note
This routine does not support batched execution.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
sptrsm_descr – [in] SpTrSM descriptor.
A – [in] sparse matrix descriptor.
X – [in] dense matrix descriptor.
Y – [in] dense matrix descriptor.
sptrsm_stage – [in] stage for the SpTrSM computation.
buffer_size_in_bytes – [out] number of bytes of the buffer.
p_error – [out] error descriptor created if the returned status is not rocsparse_status_success. A null pointer can be passed if an error descriptor is not required.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_value – the
sptrsm_stagevalue is invalid.rocsparse_status_invalid_pointer –
A,X,Y,sptrsm_descr, orbuffer_size_in_bytespointer is invalid.
rocsparse_sptrsm()#
-
rocsparse_status rocsparse_sptrsm(rocsparse_handle handle, rocsparse_sptrsm_descr sptrsm_descr, rocsparse_const_spmat_descr A, rocsparse_const_dnmat_descr X, rocsparse_dnmat_descr Y, rocsparse_sptrsm_stage sptrsm_stage, size_t buffer_size_in_bytes, void *buffer, rocsparse_error *p_error)#
Sparse triangular system solve with multiple right-hand sides.
rocsparse_sptrsmsolves a triangular linear system of equations defined by a sparse \(m \times m\) square matrix \(op(A)\), given in CSR or COO storage format, such that\[ op(A) \cdot Y = \alpha \cdot op(X), \]with\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans == rocsparse_operation_none} \\ A^T, & \text{if trans == rocsparse_operation_transpose} \end{array} \right. \end{split}\]and\[\begin{split} op(X) = \left\{ \begin{array}{ll} X, & \text{if trans_B == rocsparse_operation_none} \\ X^T, & \text{if trans_B == rocsparse_operation_transpose} \end{array} \right. \end{split}\]and where \(Y\) is the dense solution matrix and \(X\) is the dense right-hand side matrix. Both \(X\) and \(Y\) can be in row or column order.Performing the above operation requires two stages, the stage rocsparse_sptrsm_stage_analysis and the stage rocsparse_sptrsm_stage_compute. The stage rocsparse_sptrsm_stage_analysis is required to perform the stage rocsparse_sptrsm_stage_compute and only needs to be called once for a given sparse matrix \(op(A)\), while the stage rocsparse_sptrsm_stage_compute can be repeatedly used with different \(X\) and \(Y\) matrices.
As noted above, both \(X\) and \(Y\) can be in row or column order (this includes mixing the order so that \(X\) is in row order and \(Y\) in column order and vice versa). Internally, however, rocSPARSE kernels solve the system assuming the matrices \(X\) and \(Y\) are in row order, as this provides the best memory access. This means that if the matrix \(Y\) is not in row order and/or the matrix \(X\) is not in row order (or \(X^{T}\) is not in column order as this is equivalent to being in row order), then internally, memory copies and/or transposing of data might be performed to get them into the correct order (possibly using extra buffer size). After the computation is completed, additional memory copies and/or transposing of data might be performed to get them back into the user arrays. For the best performance and smallest required temporary storage buffers, use row order for the matrix \(Y\) and row order for the matrix \(X\) (or column order if \(X\) is being transposed).
rocsparse_sptrsmsupports rocsparse_indextype_i32 and rocsparse_indextype_i64 index precisions for storing the row pointer and column indices arrays of the sparse matrices.rocsparse_sptrsmsupports the following data types for \(op(A)\), \(op(X)\), \(Y\), and compute types for \(\alpha\):- Uniform Precisions:
A / X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Example
int main() { // 1 0 0 0 // A = 4 2 0 0 // 0 3 7 0 // 0 0 0 1 int m = 4; int n = 2; std::vector<int> hcsr_row_ptr = {0, 1, 3, 5, 6}; std::vector<int> hcsr_col_ind = {0, 0, 1, 1, 2, 3}; std::vector<float> hcsr_val = {1, 4, 2, 3, 7, 1}; std::vector<float> hB(m * n); std::vector<float> hC(m * n); for(int i = 0; i < n; i++) { for(int j = 0; j < m; j++) { hB[m * i + j] = static_cast<float>(i + 1); } } // Scalar alpha float alpha = 1.0f; int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0]; // Offload data to device int* dcsr_row_ptr; int* dcsr_col_ind; float* dcsr_val; float* dB; float* dC; HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1))); HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz)); HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz)); HIP_CHECK(hipMalloc(&dB, sizeof(float) * m * n)); HIP_CHECK(hipMalloc(&dC, sizeof(float) * m * n)); HIP_CHECK( hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dB, hB.data(), sizeof(float) * m * n, hipMemcpyHostToDevice)); rocsparse_handle handle; rocsparse_spmat_descr matA; rocsparse_dnmat_descr matB; rocsparse_dnmat_descr matC; rocsparse_indextype row_idx_type = rocsparse_indextype_i32; rocsparse_indextype col_idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_datatype compute_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; rocsparse_operation trans_A = rocsparse_operation_none; rocsparse_operation trans_X = rocsparse_operation_none; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // Create sparse matrix A ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA, m, m, nnz, dcsr_row_ptr, dcsr_col_ind, dcsr_val, row_idx_type, col_idx_type, idx_base, data_type)); // Create dense matrix B ROCSPARSE_CHECK( rocsparse_create_dnmat_descr(&matB, m, n, m, dB, data_type, rocsparse_order_column)); // Create dense matrix C ROCSPARSE_CHECK( rocsparse_create_dnmat_descr(&matC, m, n, m, dC, data_type, rocsparse_order_column)); rocsparse_sptrsm_descr sptrsm_descr; ROCSPARSE_CHECK(rocsparse_create_sptrsm_descr(&sptrsm_descr)); const rocsparse_sptrsm_alg sptrsm_alg = rocsparse_sptrsm_alg_default; ROCSPARSE_CHECK(rocsparse_sptrsm_set_input(handle, sptrsm_descr, rocsparse_sptrsm_input_alg, &sptrsm_alg, sizeof(sptrsm_alg), nullptr)); ROCSPARSE_CHECK(rocsparse_sptrsm_set_input(handle, sptrsm_descr, rocsparse_sptrsm_input_operation_A, &trans_A, sizeof(trans_A), nullptr)); ROCSPARSE_CHECK(rocsparse_sptrsm_set_input(handle, sptrsm_descr, rocsparse_sptrsm_input_operation_X, &trans_X, sizeof(trans_X), nullptr)); const rocsparse_datatype sptrsm_scalar_datatype = rocsparse_datatype_f32_r; ROCSPARSE_CHECK(rocsparse_sptrsm_set_input(handle, sptrsm_descr, rocsparse_sptrsm_input_scalar_datatype, &sptrsm_scalar_datatype, sizeof(sptrsm_scalar_datatype), nullptr)); const rocsparse_datatype sptrsm_compute_datatype = rocsparse_datatype_f32_r; ROCSPARSE_CHECK(rocsparse_sptrsm_set_input(handle, sptrsm_descr, rocsparse_sptrsm_input_compute_datatype, &sptrsm_compute_datatype, sizeof(sptrsm_compute_datatype), nullptr)); const rocsparse_analysis_policy sptrsm_analysis_policy = rocsparse_analysis_policy_reuse; ROCSPARSE_CHECK(rocsparse_sptrsm_set_input(handle, sptrsm_descr, rocsparse_sptrsm_input_analysis_policy, &sptrsm_analysis_policy, sizeof(sptrsm_analysis_policy), nullptr)); size_t buffer_size; void* temp_buffer; // Analysis phase ROCSPARSE_CHECK(rocsparse_sptrsm_buffer_size(handle, sptrsm_descr, matA, matB, matC, rocsparse_sptrsm_stage_analysis, &buffer_size, nullptr)); HIP_CHECK(hipMalloc(&temp_buffer, buffer_size)); ROCSPARSE_CHECK(rocsparse_sptrsm(handle, sptrsm_descr, matA, matB, matC, rocsparse_sptrsm_stage_analysis, buffer_size, temp_buffer, nullptr)); HIP_CHECK(hipFree(temp_buffer)); temp_buffer = nullptr; int64_t zero_pivot; ROCSPARSE_CHECK(rocsparse_sptrsm_get_output(handle, sptrsm_descr, rocsparse_sptrsm_output_zero_pivot_position, &zero_pivot, sizeof(zero_pivot), nullptr)); if(zero_pivot != -1) { std::cout << "zero pivot detected during analysis at position " << zero_pivot << std::endl; } // // Compute phase. // ROCSPARSE_CHECK(rocsparse_sptrsm_set_input(handle, sptrsm_descr, rocsparse_sptrsm_input_scalar_alpha, &alpha, sizeof(&alpha), nullptr)); ROCSPARSE_CHECK(rocsparse_sptrsm_buffer_size(handle, sptrsm_descr, matA, matB, matC, rocsparse_sptrsm_stage_compute, &buffer_size, nullptr)); HIP_CHECK(hipMalloc(&temp_buffer, buffer_size)); ROCSPARSE_CHECK(rocsparse_sptrsm(handle, sptrsm_descr, matA, matB, matC, rocsparse_sptrsm_stage_compute, buffer_size, temp_buffer, nullptr)); // Device synchronization hipStream_t stream; ROCSPARSE_CHECK(rocsparse_get_stream(handle, &stream)); HIP_CHECK(hipStreamSynchronize(stream)); ROCSPARSE_CHECK(rocsparse_sptrsm_get_output(handle, sptrsm_descr, rocsparse_sptrsm_output_zero_pivot_position, &zero_pivot, sizeof(zero_pivot), nullptr)); if(zero_pivot != -1) { std::cout << "zero pivot detected during compute phase at position " << zero_pivot << std::endl; } // Clear rocSPARSE ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA)); ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(matB)); ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(matC)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); ROCSPARSE_CHECK(rocsparse_destroy_sptrsm_descr(sptrsm_descr)); // Copy result back to host HIP_CHECK(hipMemcpy(hC.data(), dC, sizeof(float) * m * n, hipMemcpyDeviceToHost)); std::cout << "hC" << std::endl; for(size_t i = 0; i < hC.size(); ++i) { std::cout << hC[i] << " "; } std::cout << std::endl; // Clear device memory HIP_CHECK(hipFree(dcsr_row_ptr)); HIP_CHECK(hipFree(dcsr_col_ind)); HIP_CHECK(hipFree(dcsr_val)); HIP_CHECK(hipFree(dB)); HIP_CHECK(hipFree(dC)); HIP_CHECK(hipFree(temp_buffer)); return 0; }
Note
The sparse matrix formats currently supported are: rocsparse_format_coo and rocsparse_format_csr.
Note
Only the rocsparse_sptrsm_stage_compute stage is non-blocking and executed asynchronously with respect to the host. It can return before the actual computation has finished. The rocsparse_sptrsm_stage_analysis stage is blocking with respect to the host.
Note
Currently, only
trans_A== rocsparse_operation_none andtrans_A== rocsparse_operation_transpose are supported. Currently, onlytrans_X== rocsparse_operation_none andtrans_X== rocsparse_operation_transpose are supported.Note
Only the stage rocsparse_sptrsm_stage_compute supports execution in a hipGraph context. The rocsparse_sptrsm_stage_analysis stage does not support hipGraph.
Note
This routine does not support batched execution.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
sptrsm_descr – [in] SpTrSM routine descriptor.
A – [in] sparse matrix descriptor.
X – [in] dense matrix descriptor.
Y – [inout] dense matrix descriptor.
sptrsm_stage – [in] SpTrSM stage for the SpTrSM computation.
buffer_size_in_bytes – [out] number of bytes of the temporary storage buffer.
buffer – [in] temporary storage buffer allocated by the user.
p_error – [out] error descriptor created if the returned status is not rocsparse_status_success. A null pointer can be passed if an error descriptor is not required.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
A, X,Y,sptrsm_descr, orbuffer_sizepointer is invalid.rocsparse_status_not_implemented – the configuration of the descriptor
sptrsm_descris currently not supported.
rocsparse_spmm()#
-
rocsparse_status rocsparse_spmm(rocsparse_handle handle, rocsparse_operation trans_A, rocsparse_operation trans_B, const void *alpha, rocsparse_const_spmat_descr mat_A, rocsparse_const_dnmat_descr mat_B, const void *beta, const rocsparse_dnmat_descr mat_C, rocsparse_datatype compute_type, rocsparse_spmm_alg alg, rocsparse_spmm_stage stage, size_t *buffer_size, void *temp_buffer)#
Sparse matrix dense matrix multiplication.
rocsparse_spmmmultiplies the scalar \(\alpha\) with a sparse \(m \times k\) matrix \(op(A)\), defined in CSR, CSC, COO, BSR, or Blocked ELL storage format, and the dense \(k \times n\) matrix \(op(B)\) and adds the result to the dense \(m \times n\) matrix \(C\) that is multiplied by the scalar \(\beta\), such that\[ C := \alpha \cdot op(A) \cdot op(B) + \beta \cdot C, \]with\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans_A == rocsparse_operation_none} \\ A^T, & \text{if trans_A == rocsparse_operation_transpose} \\ A^H, & \text{if trans_A == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]and\[\begin{split} op(B) = \left\{ \begin{array}{ll} B, & \text{if trans_B == rocsparse_operation_none} \\ B^T, & \text{if trans_B == rocsparse_operation_transpose} \\ B^H, & \text{if trans_B == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]Both \(B\) and \(C\) can be in row or column order.rocsparse_spmmrequires three stages to complete. First, pass the rocsparse_spmm_stage_buffer_size stage to determine the size of the required temporary storage buffer. Next, allocate this buffer and callrocsparse_spmmagain with the rocsparse_spmm_stage_preprocess stage, which will perform analysis on the sparse matrix \(op(A)\). Finally, callrocsparse_spmmwith the rocsparse_spmm_stage_compute stage to perform the actual computation. The buffer size, buffer allocation, and preprocess stages only need to be called once for a given sparse matrix \(op(A)\), while the computation stage can be repeatedly used with different \(B\) and \(C\) matrices. After all calls torocsparse_spmmare complete, the temporary buffer can be deallocated.As noted above, both \(B\) and \(C\) can be in row or column order (this includes mixing the order so that \(B\) is in row order and \(C\) in column order and vice versa). For best performance, use row order for both \(B\) and \(C\) as this provides the best memory access.
rocsparse_spmmsupports multiple different algorithms. These algorithms have different trade-offs depending on the sparsity pattern of the matrix, whether or not the results need to be deterministic, and how many times the sparse-matrix product will be performed.CSR Algorithms
Deterministic
Preprocessing
Notes
rocsparse_spmm_alg_csr
Yes
No
Default algorithm.
rocsparse_spmm_alg_csr_row_split
Yes
No
Assigns a fixed number of threads per row, regardless of the number of non-zeros in each row. This can perform well when each row in the matrix has roughly the same number of non-zeros.
rocsparse_spmm_alg_csr_nnz_split
No
Yes
Distributes work by having each thread block work on a fixed number of non-zeros, regardless of the number of rows that might be involved. This can perform well when the matrix has some rows with few non-zeros and some rows with many non-zeros.
rocsparse_spmm_alg_csr_merge_path
No
Yes
Attempts to combine the approaches of row-split and non-zero split by having each block work on a fixed amount of work, which can be either non-zeros or rows.
COO Algorithms
Deterministic
Preprocessing
Notes
rocsparse_spmm_alg_coo_segmented
Yes
No
Generally not as fast as the atomic algorithm but is deterministic.
rocsparse_spmm_alg_coo_atomic
No
No
Generally the fastest COO algorithm. This is the default algorithm.
rocsparse_spmm_alg_coo_segmented_atomic
No
No
Blocked ELL Algorithms
Deterministic
Preprocessing
Notes
rocsparse_spmm_alg_bell
Yes
No
BSR Algorithms
Deterministic
Preprocessing
Notes
rocsparse_spmm_alg_bsr
Yes
No
It is also possible to pass rocsparse_spmm_alg_default, which will automatically select from the algorithms listed above based on the sparse matrix format. In the case of CSR or CSC matrices, this will set the algorithm to be rocsparse_spmm_alg_csr. In the case of blocked ELL matrices, this will set the algorithm to be rocsparse_spmm_alg_bell. In the case of BSR matrices, this will set the algorithm to be rocsparse_spmm_alg_bsr, and for COO matrices, it will set the algorithm to be rocsparse_spmm_alg_coo_atomic.
When A is transposed,
rocsparse_spmmwill revert to using rocsparse_spmm_alg_csr for CSR and CSC formats and rocsparse_spmm_alg_coo_atomic for COO format, regardless of algorithm selected.rocsparse_spmmsupports multiple combinations of data types and compute types. The tables below indicate the currently supported different data types that can be used for for the sparse matrix \(op(A)\) and the dense matrices \(op(B)\) and \(C\) and the compute type for \(\alpha\) and \(\beta\). The advantage of using different data types is to save on memory bandwidth and storage when a user application allows, while performing the actual computation in a higher precision.rocsparse_spmmsupports rocsparse_indextype_i32 and rocsparse_indextype_i64 index precisions for storing the row pointer and column indices arrays of the sparse matrices.- Uniform Precisions:
A / B / C / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Mixed precisions:
A / B
C
compute_type
rocsparse_datatype_i8_r
rocsparse_datatype_i32_r
rocsparse_datatype_i32_r
rocsparse_datatype_i8_r
rocsparse_datatype_f32_r
rocsparse_datatype_f32_r
rocsparse_datatype_f16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f32_r
rocsparse_datatype_f16_r
rocsparse_datatype_f16_r
rocsparse_datatype_f32_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f32_r
rocsparse_datatype_bf16_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
rocsparse_spmmalso supports batched computation for CSR and COO matrices. There are three supported batch modes:\[\begin{split} C_i = A \times B_i \\ C_i = A_i \times B \\ C_i = A_i \times B_i \end{split}\]The batch mode is determined by the batch count and stride passed for each matrix. For example, to use the first batch mode ( \(C_i = A \times B_i\)) with 100 batches for non-transposed \(A\), \(B\), and \(C\), pass:
\[\begin{split} batch\_count\_A=1 \\ batch\_count\_B=100 \\ batch\_count\_C=100 \\ offsets\_batch\_stride\_A=0 \\ columns\_values\_batch\_stride\_A=0 \\ batch\_stride\_B=k*n \\ batch\_stride\_C=m*n \end{split}\]To use the second batch mode ( \(C_i = A_i \times B\)), pass:\[\begin{split} batch\_count\_A=100 \\ batch\_count\_B=1 \\ batch\_count\_C=100 \\ offsets\_batch\_stride\_A=m+1 \\ columns\_values\_batch\_stride\_A=nnz \\ batch\_stride\_B=0 \\ batch\_stride\_C=m*n \end{split}\]And to use the third batch mode ( \(C_i = A_i \times B_i\)), pass:\[\begin{split} batch\_count\_A=100 \\ batch\_count\_B=100 \\ batch\_count\_C=100 \\ offsets\_batch\_stride\_A=m+1 \\ columns\_values\_batch\_stride_A=nnz \\ batch\_stride_B=k*n \\ batch\_stride_C=m*n \end{split}\]See the examples below.- Example
This example performs sparse matrix-dense matrix multiplication, \(C := \alpha \cdot A \cdot B + \beta \cdot C\)
int main() { // 1 4 0 0 0 0 // A = 0 2 3 0 0 0 // 5 0 0 7 8 0 // 0 0 9 0 6 0 // 1 4 2 // 1 2 3 // B = 5 4 0 // 3 1 9 // 1 2 2 // 0 3 0 // 1 1 5 // C = 1 2 1 // 1 3 1 // 6 2 4 int m = 4; int k = 6; int n = 3; std::vector<int> hcsr_row_ptr = {0, 2, 4, 7, 9}; std::vector<int> hcsr_col_ind = {0, 1, 1, 2, 0, 3, 4, 2, 4}; std::vector<float> hcsr_val = {1, 4, 2, 3, 5, 7, 8, 9, 6}; std::vector<float> hB(k * n, 1.0f); std::vector<float> hC(m * n, 1.0f); int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0]; float alpha = 1.0f; float beta = 0.0f; // Create CSR arrays on device int* dcsr_row_ptr; int* dcsr_col_ind; float* dcsr_val; float* dB; float* dC; HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1))); HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz)); HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz)); HIP_CHECK(hipMalloc(&dB, sizeof(float) * k * n)); HIP_CHECK(hipMalloc(&dC, sizeof(float) * m * n)); HIP_CHECK( hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dB, hB.data(), sizeof(float) * k * n, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dC, hC.data(), sizeof(float) * m * n, hipMemcpyHostToDevice)); // Create rocsparse handle rocsparse_handle handle; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // Types rocsparse_indextype itype = rocsparse_indextype_i32; rocsparse_indextype jtype = rocsparse_indextype_i32; rocsparse_datatype ttype = rocsparse_datatype_f32_r; // Create descriptors rocsparse_spmat_descr mat_A; rocsparse_dnmat_descr mat_B; rocsparse_dnmat_descr mat_C; ROCSPARSE_CHECK(rocsparse_create_csr_descr(&mat_A, m, k, nnz, dcsr_row_ptr, dcsr_col_ind, dcsr_val, itype, jtype, rocsparse_index_base_zero, ttype)); ROCSPARSE_CHECK( rocsparse_create_dnmat_descr(&mat_B, k, n, k, dB, ttype, rocsparse_order_column)); ROCSPARSE_CHECK( rocsparse_create_dnmat_descr(&mat_C, m, n, m, dC, ttype, rocsparse_order_column)); // Query SpMM buffer size_t buffer_size; ROCSPARSE_CHECK(rocsparse_spmm(handle, rocsparse_operation_none, rocsparse_operation_none, &alpha, mat_A, mat_B, &beta, mat_C, ttype, rocsparse_spmm_alg_default, rocsparse_spmm_stage_buffer_size, &buffer_size, nullptr)); // Allocate buffer void* buffer; HIP_CHECK(hipMalloc(&buffer, buffer_size)); ROCSPARSE_CHECK(rocsparse_spmm(handle, rocsparse_operation_none, rocsparse_operation_none, &alpha, mat_A, mat_B, &beta, mat_C, ttype, rocsparse_spmm_alg_default, rocsparse_spmm_stage_preprocess, &buffer_size, buffer)); // Pointer mode host ROCSPARSE_CHECK(rocsparse_spmm(handle, rocsparse_operation_none, rocsparse_operation_none, &alpha, mat_A, mat_B, &beta, mat_C, ttype, rocsparse_spmm_alg_default, rocsparse_spmm_stage_compute, &buffer_size, buffer)); // Clear up on device HIP_CHECK(hipFree(dcsr_row_ptr)); HIP_CHECK(hipFree(dcsr_col_ind)); HIP_CHECK(hipFree(dcsr_val)); HIP_CHECK(hipFree(dB)); HIP_CHECK(hipFree(dC)); HIP_CHECK(hipFree(buffer)); ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(mat_A)); ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(mat_B)); ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(mat_C)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); return 0; }
- Example
An example of the first batch mode ( \(C_i = A \times B_i\)) is provided below.
int main() { // 1 4 0 0 0 0 // A = 0 2 3 0 0 0 // 5 0 0 7 8 0 // 0 0 9 0 6 0 int m = 4; int k = 6; int n = 3; int batch_count_A = 1; int batch_count_B = 100; int batch_count_C = 100; std::vector<int> hcsr_row_ptr = {0, 2, 4, 7, 9}; std::vector<int> hcsr_col_ind = {0, 1, 1, 2, 0, 3, 4, 2, 4}; std::vector<float> hcsr_val = {1, 4, 2, 3, 5, 7, 8, 9, 6}; std::vector<float> hB(batch_count_B * k * n, 1.0f); std::vector<float> hC(batch_count_C * m * n, 1.0f); int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0]; int offsets_batch_stride_A = 0; int columns_values_batch_stride_A = 0; int batch_stride_B = k * n; int batch_stride_C = m * n; float alpha = 1.0f; float beta = 0.0f; // Create CSR arrays on device int* dcsr_row_ptr; int* dcsr_col_ind; float* dcsr_val; float* dB; float* dC; HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1))); HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz)); HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz)); HIP_CHECK(hipMalloc(&dB, sizeof(float) * batch_count_B * k * n)); HIP_CHECK(hipMalloc(&dC, sizeof(float) * batch_count_C * m * n)); HIP_CHECK( hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dB, hB.data(), sizeof(float) * batch_count_B * k * n, hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dC, hC.data(), sizeof(float) * batch_count_C * m * n, hipMemcpyHostToDevice)); // Create rocsparse handle rocsparse_handle handle; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // Types rocsparse_indextype itype = rocsparse_indextype_i32; rocsparse_indextype jtype = rocsparse_indextype_i32; rocsparse_datatype ttype = rocsparse_datatype_f32_r; // Create descriptors rocsparse_spmat_descr mat_A; rocsparse_dnmat_descr mat_B; rocsparse_dnmat_descr mat_C; ROCSPARSE_CHECK(rocsparse_create_csr_descr(&mat_A, m, k, nnz, dcsr_row_ptr, dcsr_col_ind, dcsr_val, itype, jtype, rocsparse_index_base_zero, ttype)); ROCSPARSE_CHECK( rocsparse_create_dnmat_descr(&mat_B, k, n, k, dB, ttype, rocsparse_order_column)); ROCSPARSE_CHECK( rocsparse_create_dnmat_descr(&mat_C, m, n, m, dC, ttype, rocsparse_order_column)); ROCSPARSE_CHECK(rocsparse_csr_set_strided_batch( mat_A, batch_count_A, offsets_batch_stride_A, columns_values_batch_stride_A)); ROCSPARSE_CHECK(rocsparse_dnmat_set_strided_batch(mat_B, batch_count_B, batch_stride_B)); ROCSPARSE_CHECK(rocsparse_dnmat_set_strided_batch(mat_C, batch_count_C, batch_stride_C)); // Query SpMM buffer size_t buffer_size; ROCSPARSE_CHECK(rocsparse_spmm(handle, rocsparse_operation_none, rocsparse_operation_none, &alpha, mat_A, mat_B, &beta, mat_C, ttype, rocsparse_spmm_alg_default, rocsparse_spmm_stage_buffer_size, &buffer_size, nullptr)); // Allocate buffer void* buffer; HIP_CHECK(hipMalloc(&buffer, buffer_size)); ROCSPARSE_CHECK(rocsparse_spmm(handle, rocsparse_operation_none, rocsparse_operation_none, &alpha, mat_A, mat_B, &beta, mat_C, ttype, rocsparse_spmm_alg_default, rocsparse_spmm_stage_preprocess, &buffer_size, buffer)); // Pointer mode host ROCSPARSE_CHECK(rocsparse_spmm(handle, rocsparse_operation_none, rocsparse_operation_none, &alpha, mat_A, mat_B, &beta, mat_C, ttype, rocsparse_spmm_alg_default, rocsparse_spmm_stage_compute, &buffer_size, buffer)); // Clear up on device HIP_CHECK(hipFree(dcsr_row_ptr)); HIP_CHECK(hipFree(dcsr_col_ind)); HIP_CHECK(hipFree(dcsr_val)); HIP_CHECK(hipFree(dB)); HIP_CHECK(hipFree(dC)); HIP_CHECK(hipFree(buffer)); ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(mat_A)); ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(mat_B)); ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(mat_C)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); return 0; }
Note
None of the algorithms above are deterministic when \(A\) is transposed or conjugate transposed.
Note
All algorithms perform best when using row ordering for the dense \(B\) and \(C\) matrices.
Note
The sparse matrix formats currently supported are: rocsparse_format_coo, rocsparse_format_csr, rocsparse_format_csc, rocsparse_format_bsr, and rocsparse_format_bell.
Note
Mixed precisions are only supported for BSR, CSR, CSC, and COO matrix formats.
Note
Only the rocsparse_spmm_stage_buffer_size stage and the rocsparse_spmm_stage_compute stage are non-blocking and executed asynchronously with respect to the host. They can return before the actual computation has finished. The rocsparse_spmm_stage_preprocess stage is blocking with respect to the host.
Note
Currently, only
trans_A== rocsparse_operation_none is supported for the COO and blocked ELL formats.Note
Only the rocsparse_spmm_stage_buffer_size stage and the rocsparse_spmm_stage_compute stage support execution in a hipGraph context. The rocsparse_spmm_stage_preprocess stage does not support hipGraph.
Note
Currently, only CSR, CSC, COO, BSR, and blocked ELL sparse formats are supported.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
trans_A – [in] matrix operation type.
trans_B – [in] matrix operation type.
alpha – [in] scalar \(\alpha\).
mat_A – [in] matrix descriptor.
mat_B – [in] matrix descriptor.
beta – [in] scalar \(\beta\).
mat_C – [in] matrix descriptor.
compute_type – [in] floating point precision for the SpMM computation.
alg – [in] SpMM algorithm for the SpMM computation.
stage – [in] SpMM stage for the SpMM computation.
buffer_size – [out] number of bytes of the temporary storage buffer.
temp_buffer – [in] temporary storage buffer allocated by the user. When the rocsparse_spmm_stage_buffer_size stage is passed in, the required allocation size (in bytes) is written to
buffer_sizeand function returns without performing the SpMM operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
alpha,mat_A,mat_B,mat_C,beta, orbuffer_sizepointer is invalid.rocsparse_status_not_implemented –
trans_A,trans_B,compute_type, oralgis currently not supported.
rocsparse_spgemm()#
-
rocsparse_status rocsparse_spgemm(rocsparse_handle handle, rocsparse_operation trans_A, rocsparse_operation trans_B, const void *alpha, rocsparse_const_spmat_descr A, rocsparse_const_spmat_descr B, const void *beta, rocsparse_const_spmat_descr D, rocsparse_spmat_descr C, rocsparse_datatype compute_type, rocsparse_spgemm_alg alg, rocsparse_spgemm_stage stage, size_t *buffer_size, void *temp_buffer)#
Sparse matrix sparse matrix multiplication.
rocsparse_spgemmmultiplies the scalar \(\alpha\) with the sparse \(m \times k\) matrix \(op(A)\) and the sparse \(k \times n\) matrix \(op(B)\) and adds the result to the sparse \(m \times n\) matrix \(D\) that is multiplied by \(\beta\). The final result is stored in the sparse \(m \times n\) matrix \(C\), such that\[ C := \alpha \cdot op(A) \cdot op(B) + \beta \cdot D, \]with\[ op(A) = \left\{ \begin{array}{ll} A, & \text{if trans_A == rocsparse_operation_none} \end{array} \right. \]and\[ op(B) = \left\{ \begin{array}{ll} B, & \text{if trans_B == rocsparse_operation_none} \end{array} \right. \]rocsparse_spgemmrequires three stages to complete. First, pass the rocsparse_spgemm_stage_buffer_size stage to determine the size of the required temporary storage buffer. Next, allocate this buffer and callrocsparse_spgemmagain with the rocsparse_spgemm_stage_nnz stage, which will determine the number of non-zeros in \(C\). This stage will also fill in the row pointer array of \(C\). Now that the number of non-zeros in \(C\) is known, allocate space for the column indices and values arrays of \(C\). Finally, callrocsparse_spgemmwith the rocsparse_spgemm_stage_compute stage to perform the actual computation, which fills in the column indices and values arrays of \(C\). After all calls torocsparse_spgemmare complete, the temporary buffer can be deallocated.Alternatively, it is possible to perform sparse matrix products multiple times with matrices having the same sparsity pattern with different values. In this scenario, the process begins like before. First, call
rocsparse_spgemmwith stage rocsparse_spgemm_stage_buffer_size to determine the required buffer size. Then allocate this buffer and callrocsparse_spgemmwith the stage rocsparse_spgemm_stage_nnz to determine the number of non-zeros in \(C\) and allocate the \(C\) column indices and values arrays. Now, however, callrocsparse_spgemmwith the rocsparse_spgemm_stage_symbolic stage, which will fill in the column indices array of \(C\) but not the values array. It is then possible to repeatedly change the values of \(A\), \(B\), and \(D\) and callrocsparse_spgemmwith the rocsparse_spgemm_stage_numeric stage, which fills the values array of \(C\). The use of the extra rocsparse_spgemm_stage_symbolic and rocsparse_spgemm_stage_numeric stages allows users to compute the sparsity pattern of \(C\) once, but compute the values multiple times.rocsparse_spgemmsupports multiple combinations of data types and compute types. The tables below indicate the currently supported different data types that can be used for the sparse matrices \(op(A)\), \(op(B)\), \(C\), and \(D\), and the compute type for \(\alpha\) and \(\beta\). The advantage of using different data types is to save on memory bandwidth and storage when a user application allows, while performing the actual computation in a higher precision.rocsparse_spgemmsupports rocsparse_indextype_i32 and rocsparse_indextype_i64 index precisions for storing the row pointer and column indices arrays of the sparse matrices.- Uniform Precisions:
A / B / C / D / compute_type
rocsparse_datatype_f16_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
In general, when multiplying two sparse matrices together, it is possible that the resulting matrix will require a larger index representation to store correctly. For example, when multiplying \(A \times B\) using rocsparse_indextype_i32 index types for the row pointer and column indices arrays, it might be the case that the row pointer of the resulting \(C\) matrix would require index precision rocsparse_indextype_i64. This is currently not supported. In this scenario, the \(A\) and \(B\) matrices need to be stored using the higher index precision.
- Example
int main() { // A - m x k // B - k x n // C - m x n int m = 4; int n = 4; int k = 3; // A // 1 2 3 // 0 1 0 // 2 0 0 // 0 0 3 // B // 0 1 2 0 // 0 0 0 1 // 1 2 3 4 std::vector<int> hcsr_row_ptr_A = {0, 3, 4, 5}; std::vector<int> hcsr_col_ind_A = {0, 1, 2, 1, 0, 2}; std::vector<float> hcsr_val_A = {1.0f, 2.0f, 3.0f, 1.0f, 2.0f, 3.0f}; std::vector<int> hcsr_row_ptr_B = {0, 2, 3, 7}; std::vector<int> hcsr_col_ind_B = {1, 2, 3, 0, 1, 2, 3}; std::vector<float> hcsr_val_B = {1.0f, 2.0f, 1.0f, 1.0f, 2.0f, 3.0f, 4.0f}; int nnz_A = hcsr_val_A.size(); int nnz_B = hcsr_val_B.size(); float alpha = 1.0f; float beta = 0.0f; int* dcsr_row_ptr_A; int* dcsr_col_ind_A; float* dcsr_val_A; int* dcsr_row_ptr_B; int* dcsr_col_ind_B; float* dcsr_val_B; int* dcsr_row_ptr_C; HIP_CHECK(hipMalloc(&dcsr_row_ptr_A, (m + 1) * sizeof(int))); HIP_CHECK(hipMalloc(&dcsr_col_ind_A, nnz_A * sizeof(int))); HIP_CHECK(hipMalloc(&dcsr_val_A, nnz_A * sizeof(float))); HIP_CHECK(hipMalloc(&dcsr_row_ptr_B, (k + 1) * sizeof(int))); HIP_CHECK(hipMalloc(&dcsr_col_ind_B, nnz_B * sizeof(int))); HIP_CHECK(hipMalloc(&dcsr_val_B, nnz_B * sizeof(float))); HIP_CHECK(hipMalloc(&dcsr_row_ptr_C, (m + 1) * sizeof(int))); HIP_CHECK(hipMemcpy( dcsr_row_ptr_A, hcsr_row_ptr_A.data(), (m + 1) * sizeof(int), hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy( dcsr_col_ind_A, hcsr_col_ind_A.data(), nnz_A * sizeof(int), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_val_A, hcsr_val_A.data(), nnz_A * sizeof(float), hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy( dcsr_row_ptr_B, hcsr_row_ptr_B.data(), (k + 1) * sizeof(int), hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy( dcsr_col_ind_B, hcsr_col_ind_B.data(), nnz_B * sizeof(int), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_val_B, hcsr_val_B.data(), nnz_B * sizeof(float), hipMemcpyHostToDevice)); rocsparse_handle handle; rocsparse_spmat_descr matA, matB, matC, matD; void* temp_buffer; size_t buffer_size = 0; rocsparse_operation trans_A = rocsparse_operation_none; rocsparse_operation trans_B = rocsparse_operation_none; rocsparse_index_base index_base = rocsparse_index_base_zero; rocsparse_indextype itype = rocsparse_indextype_i32; rocsparse_indextype jtype = rocsparse_indextype_i32; rocsparse_datatype ttype = rocsparse_datatype_f32_r; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // Create sparse matrix A in CSR format ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA, m, k, nnz_A, dcsr_row_ptr_A, dcsr_col_ind_A, dcsr_val_A, itype, jtype, index_base, ttype)); // Create sparse matrix B in CSR format ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matB, k, n, nnz_B, dcsr_row_ptr_B, dcsr_col_ind_B, dcsr_val_B, itype, jtype, index_base, ttype)); // Create sparse matrix C in CSR format ROCSPARSE_CHECK(rocsparse_create_csr_descr( &matC, m, n, 0, dcsr_row_ptr_C, nullptr, nullptr, itype, jtype, index_base, ttype)); // Create sparse matrix D in CSR format ROCSPARSE_CHECK(rocsparse_create_csr_descr( &matD, 0, 0, 0, nullptr, nullptr, nullptr, itype, jtype, index_base, ttype)); // Determine buffer size ROCSPARSE_CHECK(rocsparse_spgemm(handle, trans_A, trans_B, &alpha, matA, matB, &beta, matD, matC, ttype, rocsparse_spgemm_alg_default, rocsparse_spgemm_stage_buffer_size, &buffer_size, nullptr)); HIP_CHECK(hipMalloc(&temp_buffer, buffer_size)); // Determine number of non-zeros in C matrix ROCSPARSE_CHECK(rocsparse_spgemm(handle, trans_A, trans_B, &alpha, matA, matB, &beta, matD, matC, ttype, rocsparse_spgemm_alg_default, rocsparse_spgemm_stage_nnz, &buffer_size, temp_buffer)); int64_t rows_C; int64_t cols_C; int64_t nnz_C; // Extract number of non-zeros in C matrix so we can allocate the column indices and values arrays ROCSPARSE_CHECK(rocsparse_spmat_get_size(matC, &rows_C, &cols_C, &nnz_C)); std::cout << "rows_C: " << rows_C << " cols_C: " << cols_C << " nnz_C: " << nnz_C << std::endl; int* dcsr_col_ind_C; float* dcsr_val_C; HIP_CHECK(hipMalloc(&dcsr_col_ind_C, sizeof(int) * nnz_C)); HIP_CHECK(hipMalloc(&dcsr_val_C, sizeof(float) * nnz_C)); // Set C matrix pointers ROCSPARSE_CHECK(rocsparse_csr_set_pointers(matC, dcsr_row_ptr_C, dcsr_col_ind_C, dcsr_val_C)); // SpGEMM computation ROCSPARSE_CHECK(rocsparse_spgemm(handle, trans_A, trans_B, &alpha, matA, matB, &beta, matD, matC, ttype, rocsparse_spgemm_alg_default, rocsparse_spgemm_stage_compute, &buffer_size, temp_buffer)); // Copy C matrix result back to host std::vector<int> hcsr_row_ptr_C(m + 1); std::vector<int> hcsr_col_ind_C(nnz_C); std::vector<float> hcsr_val_C(nnz_C); HIP_CHECK(hipMemcpy( hcsr_row_ptr_C.data(), dcsr_row_ptr_C, sizeof(int) * (m + 1), hipMemcpyDeviceToHost)); HIP_CHECK(hipMemcpy( hcsr_col_ind_C.data(), dcsr_col_ind_C, sizeof(int) * nnz_C, hipMemcpyDeviceToHost)); HIP_CHECK( hipMemcpy(hcsr_val_C.data(), dcsr_val_C, sizeof(float) * nnz_C, hipMemcpyDeviceToHost)); // Destroy matrix descriptors ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA)); ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matB)); ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matC)); ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matD)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); // Free device arrays HIP_CHECK(hipFree(temp_buffer)); HIP_CHECK(hipFree(dcsr_row_ptr_A)); HIP_CHECK(hipFree(dcsr_col_ind_A)); HIP_CHECK(hipFree(dcsr_val_A)); HIP_CHECK(hipFree(dcsr_row_ptr_B)); HIP_CHECK(hipFree(dcsr_col_ind_B)); HIP_CHECK(hipFree(dcsr_val_B)); HIP_CHECK(hipFree(dcsr_row_ptr_C)); HIP_CHECK(hipFree(dcsr_col_ind_C)); HIP_CHECK(hipFree(dcsr_val_C)); return 0; }
Note
This function does not produce deterministic results.
Note
SpGEMM requires three stages to complete. The first stage, rocsparse_spgemm_stage_buffer_size, will return the size of the temporary storage buffer that is required for subsequent calls to rocsparse_spgemm. The second stage, rocsparse_spgemm_stage_nnz, will determine the number of non-zero elements of the resulting \(C\) matrix. If the sparsity pattern of \(C\) is already known, this stage can be skipped. In the final stage, rocsparse_spgemm_stage_compute, the actual computation is performed.
Note
If \(\alpha == 0\), then \(C = \beta \cdot D\) will be computed.
Note
If \(\beta == 0\), then \(C = \alpha \cdot op(A) \cdot op(B)\) will be computed.
Note
Currently only CSR and BSR formats are supported.
Note
If rocsparse_spgemm_stage_symbolic is selected, then only the symbolic computation is performed.
Note
If rocsparse_spgemm_stage_numeric is selected, then only the numeric computation is performed.
Note
For the rocsparse_spgemm_stage_symbolic and rocsparse_spgemm_stage_numeric stages, only the CSR matrix format is currently supported.
Note
\(\alpha == beta == 0\) is invalid.
Note
It is permissible to pass the same sparse matrix for \(C\) and \(D\) if both matrices have the same sparsity pattern.
Note
Currently, only
trans_A== rocsparse_operation_none is supported.Note
Currently, only
trans_B== rocsparse_operation_none is supported.Note
This function is non-blocking and executed asynchronously with respect to the host. It can return before the actual computation has finished.
Note
Note that for rare matrix products with more than 4096 non-zero entries per row, an additional temporary storage buffer is allocated by the algorithm.
Note
This routine does not support execution in a hipGraph context.
Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
trans_A – [in] sparse matrix \(A\) operation type.
trans_B – [in] sparse matrix \(B\) operation type.
alpha – [in] scalar \(\alpha\).
A – [in] sparse matrix \(A\) descriptor.
B – [in] sparse matrix \(B\) descriptor.
beta – [in] scalar \(\beta\).
D – [in] sparse matrix \(D\) descriptor.
C – [out] sparse matrix \(C\) descriptor.
compute_type – [in] floating point precision for the SpGEMM computation.
alg – [in] SpGEMM algorithm for the SpGEMM computation.
stage – [in] SpGEMM stage for the SpGEMM computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when
temp_bufferis nullptr.temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to
buffer_sizeand the function returns without performing the SpGEMM operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
alphaandbetaare invalid, or theA,B,D,C, orbuffer_sizepointer is invalid.rocsparse_status_memory_error – additional buffer for long rows could not be allocated.
rocsparse_status_not_implemented –
trans_A!= rocsparse_operation_none ortrans_B!= rocsparse_operation_none.
rocsparse_spgeam_buffer_size()#
-
rocsparse_status rocsparse_spgeam_buffer_size(rocsparse_handle handle, rocsparse_spgeam_descr descr, rocsparse_const_spmat_descr mat_A, rocsparse_const_spmat_descr mat_B, rocsparse_const_spmat_descr mat_C, rocsparse_spgeam_stage stage, size_t *buffer_size, rocsparse_error *error)#
rocsparse_spgeam_buffer_sizereturns the size of the required buffer to execute the given stage of the SpGEAM operation. This routine is used in conjunction with rocsparse_spgeam(). See rocsparse_spgeam for a full description and example.Note
This routine does not support execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
descr – [in] SpGEAM descriptor.
mat_A – [in] sparse matrix \(A\) descriptor.
mat_B – [in] sparse matrix \(B\) descriptor.
mat_C – [in] sparse matrix \(C\) descriptor.
stage – [in] SpGEAM stage for the SpGEAM computation.
buffer_size – [out] number of bytes of the temporary storage buffer.
error – [out] error descriptor created if the returned status is not rocsparse_status_success. A null pointer can be passed if an error descriptor is not required.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
mat_A,mat_B,descr, orbuffer_sizepointer is invalid.
rocsparse_spgeam()#
-
rocsparse_status rocsparse_spgeam(rocsparse_handle handle, rocsparse_spgeam_descr descr, rocsparse_const_spmat_descr mat_A, rocsparse_const_spmat_descr mat_B, rocsparse_spmat_descr mat_C, rocsparse_spgeam_stage stage, size_t buffer_size, void *temp_buffer, rocsparse_error *error)#
Sparse matrix sparse matrix addition.
rocsparse_spgeammultiplies the scalar \(\alpha\) with the sparse \(m \times n\) CSR matrix \(op(A)\) and adds it to \(\beta\) multiplied by the sparse \(m \times n\) matrix \(op(B)\). The final result is stored in the sparse \(m \times n\) matrix \(C\), such that\[ C := \alpha op(A) + \beta op(B), \]with\[ op(A) = \left\{ \begin{array}{ll} A, & \text{if trans_A == rocsparse_operation_none} \end{array} \right. \]and\[ op(B) = \left\{ \begin{array}{ll} B, & \text{if trans_B == rocsparse_operation_none} \end{array} \right. \]rocsparse_spgeamrequires multiple steps to complete. First, create a rocsparse_spgeam_descr by calling rocsparse_create_spgeam_descr. Set the SpGEAM algorithm (currently only rocsparse_spgeam_alg_default supported) as well as the compute type and the transpose operation type for the sparse matrices \(op(A)\) and \(op(B)\) using rocsparse_spgeam_set_input. Next, calculate the total non-zeros that will exist in the sparse matrix \(C\). To do so, call rocsparse_spgeam_buffer_size with the stage set to rocsparse_spgeam_stage_analysis. This will fill thebuffer_sizeparameter, allowing allocation of this buffer. After the buffer has been allocated, callrocsparse_spgeamwith the same stage rocsparse_spgeam_stage_analysis. The total non-zeros and the row offset array for \(C\) have now been calculated and are stored internally in the rocsparse_spgeam_descr. Now, retrieve the non-zero count using rocsparse_spgeam_get_output and then allocate the \(C\) matrix. To complete the computation, repeat the process (this time passing the stage rocsparse_spgeam_stage_compute) by calling rocsparse_spgeam_buffer_size to determine the required buffer size, then allocate the buffer, and finally callrocsparse_spgeam. The user-allocated buffers can be freed after each call torocsparse_spgeam. After the computation is complete and the SpGEAM descriptor is no longer needed, call rocsparse_destroy_spgeam_descr. See the full code example below.The stage rocsparse_spgeam_stage_compute computes the symbolic part and the numeric of the resulting matrix C. To perform multiple operations involving matrices of same sparsity patterns but with different numerical values, the symbolic stages (rocsparse_spgeam_stage_symbolic_analysis and rocsparse_spgeam_stage_symbolic_compute) and the numeric stages (rocsparse_spgeam_stage_numeric_analysis and rocsparse_spgeam_stage_numeric_compute) can be used to separate the symbolic calculation from the numeric calculation.
rocsparse_spgeamsupports multiple combinations of index types, data types, and compute types. The tables below indicate the currently supported different index and data types that can be used for the sparse matrices \(op(A)\), \(op(B)\), and \(C\), and the compute type for \(\alpha\) and \(\beta\). The advantage of using different index and data types is to save on memory bandwidth and storage when a user application allows, while performing the actual computation in a higher precision.In general, when adding two sparse matrices together, it is possible that the resulting matrix will require a larger index representation to store correctly. For example, when adding \(A + B\) using rocsparse_indextype_i32 index types for the row pointer and column indices arrays, it might be the case that the row pointer of the resulting \(C\) matrix would require index type rocsparse_indextype_i64. This is currently not supported. In this scenario, store the \(A\), \(B\), and \(C\) matrices using the higher index precision.
- Uniform Precisions:
A / B / C / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Uniform Index Types:
CSR Row offset
CSR Column indices
rocsparse_indextype_i32
rocsparse_indextype_i32
rocsparse_indextype_i64
rocsparse_indextype_i64
- Mixed Index Types:
CSR Row offset
CSR Column indices
rocsparse_indextype_i64
rocsparse_indextype_i32
Additionally, all three matrices \(A\), \(B\), and \(C\) must use the same index types. For example, if \(A\) uses the index type rocsparse_indextype_i32 for the row offset array and the index type rocsparse_indextype_i32 for the column indices array, then both \(B\) and \(C\) must also use these same index types for their respective row offset and column index arrays. In the scenario where \(C\) requires a larger index type for the row offset array, store all three matrices using the larger index type rocsparse_indextype_i64 for the row offsets array.
- First Example
int main() { // A - m x n // B - m x n // C - m x n int m = 4; int n = 6; // 1 2 0 0 3 7 // 0 0 1 4 6 8 // 0 2 0 4 0 0 // 9 8 0 0 2 0 std::vector<int> hcsr_row_ptr_A = {0, 4, 8, 10, 13}; // host A m x n matrix std::vector<int> hcsr_col_ind_A = {0, 1, 4, 5, 2, 3, 4, 5, 1, 3, 0, 1, 4}; // host A m x n matrix std::vector<float> hcsr_val_A = {1, 2, 3, 7, 1, 4, 6, 8, 2, 4, 9, 8, 2}; // host A m x n matrix // 0 2 1 0 0 5 // 0 1 1 3 0 2 // 0 0 0 0 0 0 // 1 2 3 4 5 6 std::vector<int> hcsr_row_ptr_B = {0, 3, 7, 7, 13}; // host B m x n matrix std::vector<int> hcsr_col_ind_B = {1, 2, 5, 1, 2, 3, 5, 0, 1, 2, 3, 4, 5}; // host B m x n matrix std::vector<float> hcsr_val_B = {2, 1, 5, 1, 1, 3, 2, 1, 2, 3, 4, 5, 6}; // host B m x n matrix int nnz_A = hcsr_val_A.size(); int nnz_B = hcsr_val_B.size(); float alpha = 1.0f; float beta = 1.0f; int* dcsr_row_ptr_A; int* dcsr_col_ind_A; float* dcsr_val_A; int* dcsr_row_ptr_B; int* dcsr_col_ind_B; float* dcsr_val_B; HIP_CHECK(hipMalloc(&dcsr_row_ptr_A, (m + 1) * sizeof(int))); HIP_CHECK(hipMalloc(&dcsr_col_ind_A, nnz_A * sizeof(int))); HIP_CHECK(hipMalloc(&dcsr_val_A, nnz_A * sizeof(float))); HIP_CHECK(hipMalloc(&dcsr_row_ptr_B, (m + 1) * sizeof(int))); HIP_CHECK(hipMalloc(&dcsr_col_ind_B, nnz_B * sizeof(int))); HIP_CHECK(hipMalloc(&dcsr_val_B, nnz_B * sizeof(float))); HIP_CHECK(hipMemcpy( dcsr_row_ptr_A, hcsr_row_ptr_A.data(), (m + 1) * sizeof(int), hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy( dcsr_col_ind_A, hcsr_col_ind_A.data(), nnz_A * sizeof(int), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_val_A, hcsr_val_A.data(), nnz_A * sizeof(float), hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy( dcsr_row_ptr_B, hcsr_row_ptr_B.data(), (m + 1) * sizeof(int), hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy( dcsr_col_ind_B, hcsr_col_ind_B.data(), nnz_B * sizeof(int), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_val_B, hcsr_val_B.data(), nnz_B * sizeof(float), hipMemcpyHostToDevice)); rocsparse_handle handle; rocsparse_error p_error[1] = {}; rocsparse_spmat_descr matA, matB, matC; rocsparse_index_base index_base = rocsparse_index_base_zero; rocsparse_indextype itype = rocsparse_indextype_i32; rocsparse_indextype jtype = rocsparse_indextype_i32; rocsparse_datatype ttype = rocsparse_datatype_f32_r; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); hipStream_t stream; ROCSPARSE_CHECK(rocsparse_get_stream(handle, &stream)); // Create sparse matrix A in CSR format ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA, m, n, nnz_A, dcsr_row_ptr_A, dcsr_col_ind_A, dcsr_val_A, itype, jtype, index_base, ttype)); // Create sparse matrix B in CSR format ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matB, m, n, nnz_B, dcsr_row_ptr_B, dcsr_col_ind_B, dcsr_val_B, itype, jtype, index_base, ttype)); // Create SpGEAM descriptor. rocsparse_spgeam_descr descr; ROCSPARSE_CHECK(rocsparse_create_spgeam_descr(&descr)); // Set the algorithm on the descriptor const rocsparse_spgeam_alg alg = rocsparse_spgeam_alg_default; ROCSPARSE_CHECK(rocsparse_spgeam_set_input( handle, descr, rocsparse_spgeam_input_alg, &alg, sizeof(alg), p_error)); // Set the transpose operation for sparses matrix A and B on the descriptor const rocsparse_operation trans_A = rocsparse_operation_none; const rocsparse_operation trans_B = rocsparse_operation_none; ROCSPARSE_CHECK(rocsparse_spgeam_set_input( handle, descr, rocsparse_spgeam_input_operation_A, &trans_A, sizeof(trans_A), p_error)); ROCSPARSE_CHECK(rocsparse_spgeam_set_input( handle, descr, rocsparse_spgeam_input_operation_B, &trans_B, sizeof(trans_B), p_error)); // Set the scalar type on the descriptor const rocsparse_datatype scalar_datatype = rocsparse_datatype_f32_r; ROCSPARSE_CHECK(rocsparse_spgeam_set_input(handle, descr, rocsparse_spgeam_input_scalar_datatype, &scalar_datatype, sizeof(scalar_datatype), p_error)); // Set the compute type on the descriptor const rocsparse_datatype compute_datatype = rocsparse_datatype_f32_r; ROCSPARSE_CHECK(rocsparse_spgeam_set_input(handle, descr, rocsparse_spgeam_input_compute_datatype, &compute_datatype, sizeof(compute_datatype), p_error)); // Calculate NNZ phase size_t buffer_size_in_bytes; void* buffer; ROCSPARSE_CHECK(rocsparse_spgeam_buffer_size(handle, descr, matA, matB, nullptr, rocsparse_spgeam_stage_analysis, &buffer_size_in_bytes, p_error)); HIP_CHECK(hipMalloc(&buffer, buffer_size_in_bytes)); ROCSPARSE_CHECK(rocsparse_spgeam(handle, descr, matA, matB, nullptr, rocsparse_spgeam_stage_analysis, buffer_size_in_bytes, buffer, p_error)); HIP_CHECK(hipFree(buffer)); // Ensure analysis stage is complete before grabbing C non-zero count HIP_CHECK(hipStreamSynchronize(stream)); int64_t nnz_C; ROCSPARSE_CHECK(rocsparse_spgeam_get_output( handle, descr, rocsparse_spgeam_output_nnz, &nnz_C, sizeof(int64_t), p_error)); // Compute column indices and values of C int* dcsr_row_ptr_C; int* dcsr_col_ind_C; float* dcsr_val_C; HIP_CHECK(hipMalloc(&dcsr_row_ptr_C, (m + 1) * sizeof(int))); HIP_CHECK(hipMalloc(&dcsr_col_ind_C, sizeof(int32_t) * nnz_C)); HIP_CHECK(hipMalloc(&dcsr_val_C, sizeof(float) * nnz_C)); // Create sparse matrix C in CSR format ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matC, m, n, nnz_C, dcsr_row_ptr_C, dcsr_col_ind_C, dcsr_val_C, itype, jtype, index_base, ttype)); // Compute phase ROCSPARSE_CHECK(rocsparse_spgeam_buffer_size(handle, descr, matA, matB, matC, rocsparse_spgeam_stage_compute, &buffer_size_in_bytes, p_error)); // Set alpha and beta ROCSPARSE_CHECK(rocsparse_spgeam_set_input( handle, descr, rocsparse_spgeam_input_scalar_alpha, &alpha, sizeof(&alpha), p_error)); ROCSPARSE_CHECK(rocsparse_spgeam_set_input( handle, descr, rocsparse_spgeam_input_scalar_beta, &beta, sizeof(&beta), p_error)); HIP_CHECK(hipMalloc(&buffer, buffer_size_in_bytes)); ROCSPARSE_CHECK(rocsparse_spgeam(handle, descr, matA, matB, matC, rocsparse_spgeam_stage_compute, buffer_size_in_bytes, buffer, p_error)); HIP_CHECK(hipFree(buffer)); // Copy C matrix result back to host std::vector<int> hcsr_row_ptr_C(m + 1); std::vector<int> hcsr_col_ind_C(nnz_C); std::vector<float> hcsr_val_C(nnz_C); HIP_CHECK(hipMemcpy( hcsr_row_ptr_C.data(), dcsr_row_ptr_C, sizeof(int) * (m + 1), hipMemcpyDeviceToHost)); HIP_CHECK(hipMemcpy( hcsr_col_ind_C.data(), dcsr_col_ind_C, sizeof(int) * nnz_C, hipMemcpyDeviceToHost)); HIP_CHECK( hipMemcpy(hcsr_val_C.data(), dcsr_val_C, sizeof(float) * nnz_C, hipMemcpyDeviceToHost)); std::cout << "C" << std::endl; for(int i = 0; i < m; i++) { int start = hcsr_row_ptr_C[i]; int end = hcsr_row_ptr_C[i + 1]; std::vector<float> htemp(n, 0.0f); for(int j = start; j < end; j++) { htemp[hcsr_col_ind_C[j]] = hcsr_val_C[j]; } for(int j = 0; j < n; j++) { std::cout << htemp[j] << " "; } std::cout << "" << std::endl; } std::cout << "" << std::endl; // Destroy matrix descriptors ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA)); ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matB)); ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matC)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); ROCSPARSE_CHECK(rocsparse_destroy_error(p_error[0])); // Free device arrays HIP_CHECK(hipFree(dcsr_row_ptr_A)); HIP_CHECK(hipFree(dcsr_col_ind_A)); HIP_CHECK(hipFree(dcsr_val_A)); HIP_CHECK(hipFree(dcsr_row_ptr_B)); HIP_CHECK(hipFree(dcsr_col_ind_B)); HIP_CHECK(hipFree(dcsr_val_B)); HIP_CHECK(hipFree(dcsr_row_ptr_C)); HIP_CHECK(hipFree(dcsr_col_ind_C)); HIP_CHECK(hipFree(dcsr_val_C)); return 0; }
- Second Example
int main() { // A - m x n // B - m x n // C - m x n int m = 4; int n = 6; // 1 2 0 0 3 7 // 0 0 1 4 6 8 // 0 2 0 4 0 0 // 9 8 0 0 2 0 std::vector<int> hcsr_row_ptr_A = {0, 4, 8, 10, 13}; // host A m x n matrix std::vector<int> hcsr_col_ind_A = {0, 1, 4, 5, 2, 3, 4, 5, 1, 3, 0, 1, 4}; // host A m x n matrix std::vector<float> hcsr_val_A = {1, 2, 3, 7, 1, 4, 6, 8, 2, 4, 9, 8, 2}; // host A m x n matrix // 0 2 1 0 0 5 // 0 1 1 3 0 2 // 0 0 0 0 0 0 // 1 2 3 4 5 6 std::vector<int> hcsr_row_ptr_B = {0, 3, 7, 7, 13}; // host B m x n matrix std::vector<int> hcsr_col_ind_B = {1, 2, 5, 1, 2, 3, 5, 0, 1, 2, 3, 4, 5}; // host B m x n matrix std::vector<float> hcsr_val_B = {2, 1, 5, 1, 1, 3, 2, 1, 2, 3, 4, 5, 6}; // host B m x n matrix int nnz_A = hcsr_val_A.size(); int nnz_B = hcsr_val_B.size(); float alpha = 1.0f; float beta = 1.0f; int* dcsr_row_ptr_A; int* dcsr_col_ind_A; float* dcsr_val_A; int* dcsr_row_ptr_B; int* dcsr_col_ind_B; float* dcsr_val_B; HIP_CHECK(hipMalloc(&dcsr_row_ptr_A, (m + 1) * sizeof(int))); HIP_CHECK(hipMalloc(&dcsr_col_ind_A, nnz_A * sizeof(int))); HIP_CHECK(hipMalloc(&dcsr_val_A, nnz_A * sizeof(float))); HIP_CHECK(hipMalloc(&dcsr_row_ptr_B, (m + 1) * sizeof(int))); HIP_CHECK(hipMalloc(&dcsr_col_ind_B, nnz_B * sizeof(int))); HIP_CHECK(hipMalloc(&dcsr_val_B, nnz_B * sizeof(float))); HIP_CHECK(hipMemcpy( dcsr_row_ptr_A, hcsr_row_ptr_A.data(), (m + 1) * sizeof(int), hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy( dcsr_col_ind_A, hcsr_col_ind_A.data(), nnz_A * sizeof(int), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_val_A, hcsr_val_A.data(), nnz_A * sizeof(float), hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy( dcsr_row_ptr_B, hcsr_row_ptr_B.data(), (m + 1) * sizeof(int), hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy( dcsr_col_ind_B, hcsr_col_ind_B.data(), nnz_B * sizeof(int), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_val_B, hcsr_val_B.data(), nnz_B * sizeof(float), hipMemcpyHostToDevice)); rocsparse_handle handle; rocsparse_error p_error[1] = {}; rocsparse_spmat_descr matA, matB, matC; rocsparse_index_base index_base = rocsparse_index_base_zero; rocsparse_indextype itype = rocsparse_indextype_i32; rocsparse_indextype jtype = rocsparse_indextype_i32; rocsparse_datatype ttype = rocsparse_datatype_f32_r; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); hipStream_t stream; ROCSPARSE_CHECK(rocsparse_get_stream(handle, &stream)); // Create sparse matrix A in CSR format ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA, m, n, nnz_A, dcsr_row_ptr_A, dcsr_col_ind_A, dcsr_val_A, itype, jtype, index_base, ttype)); // Create sparse matrix B in CSR format ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matB, m, n, nnz_B, dcsr_row_ptr_B, dcsr_col_ind_B, dcsr_val_B, itype, jtype, index_base, ttype)); // Create SpGEAM descriptor. rocsparse_spgeam_descr descr; ROCSPARSE_CHECK(rocsparse_create_spgeam_descr(&descr)); // Set the algorithm on the descriptor const rocsparse_spgeam_alg alg = rocsparse_spgeam_alg_default; ROCSPARSE_CHECK(rocsparse_spgeam_set_input( handle, descr, rocsparse_spgeam_input_alg, &alg, sizeof(alg), p_error)); // Set the transpose operation for sparses matrix A and B on the descriptor const rocsparse_operation trans_A = rocsparse_operation_none; const rocsparse_operation trans_B = rocsparse_operation_none; ROCSPARSE_CHECK(rocsparse_spgeam_set_input( handle, descr, rocsparse_spgeam_input_operation_A, &trans_A, sizeof(trans_A), p_error)); ROCSPARSE_CHECK(rocsparse_spgeam_set_input( handle, descr, rocsparse_spgeam_input_operation_B, &trans_B, sizeof(trans_B), p_error)); // Set the scalar type on the descriptor const rocsparse_datatype scalar_datatype = rocsparse_datatype_f32_r; ROCSPARSE_CHECK(rocsparse_spgeam_set_input(handle, descr, rocsparse_spgeam_input_scalar_datatype, &scalar_datatype, sizeof(scalar_datatype), p_error)); // Set alpha and beta. ROCSPARSE_CHECK(rocsparse_spgeam_set_input( handle, descr, rocsparse_spgeam_input_scalar_alpha, &alpha, sizeof(&alpha), p_error)); ROCSPARSE_CHECK(rocsparse_spgeam_set_input( handle, descr, rocsparse_spgeam_input_scalar_beta, &beta, sizeof(&beta), p_error)); // Set the compute type on the descriptor const rocsparse_datatype compute_datatype = rocsparse_datatype_f32_r; ROCSPARSE_CHECK(rocsparse_spgeam_set_input(handle, descr, rocsparse_spgeam_input_compute_datatype, &compute_datatype, sizeof(compute_datatype), p_error)); // Calculate NNZ phase size_t buffer_size_in_bytes; void* buffer; ROCSPARSE_CHECK(rocsparse_spgeam_buffer_size(handle, descr, matA, matB, nullptr, rocsparse_spgeam_stage_symbolic_analysis, &buffer_size_in_bytes, p_error)); HIP_CHECK(hipMalloc(&buffer, buffer_size_in_bytes)); ROCSPARSE_CHECK(rocsparse_spgeam(handle, descr, matA, matB, nullptr, rocsparse_spgeam_stage_symbolic_analysis, buffer_size_in_bytes, buffer, p_error)); HIP_CHECK(hipFree(buffer)); // Ensure analysis stage is complete before grabbing C non-zero count HIP_CHECK(hipStreamSynchronize(stream)); int64_t nnz_C; ROCSPARSE_CHECK(rocsparse_spgeam_get_output( handle, descr, rocsparse_spgeam_output_nnz, &nnz_C, sizeof(int64_t), p_error)); // Compute column indices and values of C int* dcsr_row_ptr_C; int* dcsr_col_ind_C; float* dcsr_val_C; HIP_CHECK(hipMalloc(&dcsr_row_ptr_C, (m + 1) * sizeof(int))); HIP_CHECK(hipMalloc(&dcsr_col_ind_C, sizeof(int32_t) * nnz_C)); HIP_CHECK(hipMalloc(&dcsr_val_C, sizeof(float) * nnz_C)); // Create sparse matrix C in CSR format ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matC, m, n, nnz_C, dcsr_row_ptr_C, dcsr_col_ind_C, dcsr_val_C, itype, jtype, index_base, ttype)); // Symbolic compute phase ROCSPARSE_CHECK(rocsparse_spgeam_buffer_size(handle, descr, matA, matB, matC, rocsparse_spgeam_stage_symbolic_compute, &buffer_size_in_bytes, p_error)); HIP_CHECK(hipMalloc(&buffer, buffer_size_in_bytes)); ROCSPARSE_CHECK(rocsparse_spgeam(handle, descr, matA, matB, matC, rocsparse_spgeam_stage_symbolic_compute, buffer_size_in_bytes, buffer, p_error)); HIP_CHECK(hipFree(buffer)); ROCSPARSE_CHECK(rocsparse_spgeam_buffer_size(handle, descr, matA, matB, matC, rocsparse_spgeam_stage_numeric_analysis, &buffer_size_in_bytes, p_error)); HIP_CHECK(hipMalloc(&buffer, buffer_size_in_bytes)); ROCSPARSE_CHECK(rocsparse_spgeam(handle, descr, matA, matB, matC, rocsparse_spgeam_stage_numeric_analysis, buffer_size_in_bytes, buffer, p_error)); HIP_CHECK(hipFree(buffer)); // First Numeric compute phase ROCSPARSE_CHECK(rocsparse_spgeam_buffer_size(handle, descr, matA, matB, matC, rocsparse_spgeam_stage_numeric_compute, &buffer_size_in_bytes, p_error)); HIP_CHECK(hipMalloc(&buffer, buffer_size_in_bytes)); ROCSPARSE_CHECK(rocsparse_spgeam(handle, descr, matA, matB, matC, rocsparse_spgeam_stage_numeric_compute, buffer_size_in_bytes, buffer, p_error)); HIP_CHECK(hipFree(buffer)); // Second numeric compute phase hcsr_val_B[0] += 0.125; hcsr_val_B[1] += 0.5; HIP_CHECK( hipMemcpy(dcsr_val_B, hcsr_val_B.data(), nnz_B * sizeof(float), hipMemcpyHostToDevice)); HIP_CHECK(hipMalloc(&buffer, buffer_size_in_bytes)); ROCSPARSE_CHECK(rocsparse_spgeam(handle, descr, matA, matB, matC, rocsparse_spgeam_stage_numeric_compute, buffer_size_in_bytes, buffer, p_error)); HIP_CHECK(hipFree(buffer)); // Copy C matrix result back to host std::vector<int> hcsr_row_ptr_C(m + 1); std::vector<int> hcsr_col_ind_C(nnz_C); std::vector<float> hcsr_val_C(nnz_C); HIP_CHECK(hipMemcpy( hcsr_row_ptr_C.data(), dcsr_row_ptr_C, sizeof(int) * (m + 1), hipMemcpyDeviceToHost)); HIP_CHECK(hipMemcpy( hcsr_col_ind_C.data(), dcsr_col_ind_C, sizeof(int) * nnz_C, hipMemcpyDeviceToHost)); HIP_CHECK( hipMemcpy(hcsr_val_C.data(), dcsr_val_C, sizeof(float) * nnz_C, hipMemcpyDeviceToHost)); // Destroy matrix descriptors ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA)); ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matB)); ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matC)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); ROCSPARSE_CHECK(rocsparse_destroy_error(p_error[0])); // Free device arrays HIP_CHECK(hipFree(dcsr_row_ptr_A)); HIP_CHECK(hipFree(dcsr_col_ind_A)); HIP_CHECK(hipFree(dcsr_val_A)); HIP_CHECK(hipFree(dcsr_row_ptr_B)); HIP_CHECK(hipFree(dcsr_col_ind_B)); HIP_CHECK(hipFree(dcsr_val_B)); HIP_CHECK(hipFree(dcsr_row_ptr_C)); HIP_CHECK(hipFree(dcsr_col_ind_C)); HIP_CHECK(hipFree(dcsr_val_C)); return 0; }
Note
The stages rocsparse_spgeam_stage_analysis and rocsparse_spgeam_stage_compute cannot be mixed with the stages rocsparse_spgeam_stage_symbolic_analysis, rocsparse_spgeam_stage_symbolic_compute, rocsparse_spgeam_stage_numeric_analysis, and rocsparse_spgeam_stage_numeric_compute.
Note
The stage rocsparse_spgeam_stage_analysis must precede the stage rocsparse_spgeam_stage_compute.
Note
The stage rocsparse_spgeam_stage_symbolic_analysis must precede the stage rocsparse_spgeam_stage_symbolic_compute.
Note
The stage rocsparse_spgeam_stage_numeric_analysis must precede the stage rocsparse_spgeam_stage_numeric_compute.
Note
The symbolic stages are not required to perform the numeric stages.
Note
The stage rocsparse_spgeam_stage_numeric_analysis must be reapplied if the numeric values of the input matrices
mat_Aandmat_Bhave changed between subsequent calls of the stage rocsparse_spgeam_stage_numeric_compute.Note
This routine does not support batched computation.
Note
Currently only CSR format is supported.
Note
Currently, only
trans_A== rocsparse_operation_none is supported.Note
Currently, only
trans_B== rocsparse_operation_none is supported.Note
This routine does not support execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
descr – [in] SpGEAM descriptor.
mat_A – [in] sparse matrix \(A\) descriptor.
mat_B – [in] sparse matrix \(B\) descriptor.
mat_C – [out] sparse matrix \(C\) descriptor.
stage – [in] SpGEAM stage for the SpGEAM computation.
buffer_size – [out] number of bytes of the temporary storage buffer.
buffer_sizeis determined by calling rocsparse_spgeam_buffer_size.temp_buffer – [in] temporary storage buffer allocated by the user.
error – [out] error descriptor created if the returned status is not rocsparse_status_success. A null pointer can be passed if an error descriptor is not required.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
mat_A,mat_B,mat_C,descr, orbuffer_sizepointer is invalid.
rocsparse_sddmm_buffer_size()#
-
rocsparse_status rocsparse_sddmm_buffer_size(rocsparse_handle handle, rocsparse_operation opA, rocsparse_operation opB, const void *alpha, rocsparse_const_dnmat_descr mat_A, rocsparse_const_dnmat_descr mat_B, const void *beta, rocsparse_spmat_descr mat_C, rocsparse_datatype compute_type, rocsparse_sddmm_alg alg, size_t *buffer_size)#
rocsparse_sddmm_buffer_sizereturns the size of the required buffer to execute the SDDMM operation from a given configuration. This routine is used in conjunction with rocsparse_sddmm_preprocess() and rocsparse_sddmm().Note
This routine does not support execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
opA – [in] dense matrix \(A\) operation type.
opB – [in] dense matrix \(B\) operation type.
alpha – [in] scalar \(\alpha\).
mat_A – [in] dense matrix \(A\) descriptor.
mat_B – [in] dense matrix \(B\) descriptor.
beta – [in] scalar \(\beta\).
mat_C – [inout] sparse matrix \(C\) descriptor.
compute_type – [in] floating point precision for the SDDMM computation.
alg – [in] specification of the algorithm to use.
buffer_size – [out] number of bytes of the temporary storage buffer.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_value – the value of
opAoropBis incorrect.rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
alphaandbetaare invalid, or themat_A,mat_B,mat_C, orbuffer_sizepointer is invalid.rocsparse_status_not_implemented –
opA== rocsparse_operation_conjugate_transpose oropB== rocsparse_operation_conjugate_transpose.
rocsparse_sddmm_preprocess()#
-
rocsparse_status rocsparse_sddmm_preprocess(rocsparse_handle handle, rocsparse_operation opA, rocsparse_operation opB, const void *alpha, rocsparse_const_dnmat_descr mat_A, rocsparse_const_dnmat_descr mat_B, const void *beta, rocsparse_spmat_descr mat_C, rocsparse_datatype compute_type, rocsparse_sddmm_alg alg, void *temp_buffer)#
rocsparse_sddmm_preprocessexecutes a part of the algorithm that can be calculated once in the context of multiple calls of the rocsparse_sddmm with the same sparsity pattern.Note
This routine does not support execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
opA – [in] dense matrix \(A\) operation type.
opB – [in] dense matrix \(B\) operation type.
alpha – [in] scalar \(\alpha\).
mat_A – [in] dense matrix \(A\) descriptor.
mat_B – [in] dense matrix \(B\) descriptor.
beta – [in] scalar \(\beta\).
mat_C – [inout] sparse matrix \(C\) descriptor.
compute_type – [in] floating point precision for the SDDMM computation.
alg – [in] specification of the algorithm to use.
temp_buffer – [in] temporary storage buffer allocated by the user. The size must be greater or equal to the size obtained with rocsparse_sddmm_buffer_size.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_value – the value of
opAoropBis incorrect.rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
alphaandbetaare invalid, or themat_A,mat_B,mat_C, ortemp_bufferpointer is invalid.rocsparse_status_not_implemented –
opA== rocsparse_operation_conjugate_transpose oropB== rocsparse_operation_conjugate_transpose.
rocsparse_sddmm()#
-
rocsparse_status rocsparse_sddmm(rocsparse_handle handle, rocsparse_operation opA, rocsparse_operation opB, const void *alpha, rocsparse_const_dnmat_descr mat_A, rocsparse_const_dnmat_descr mat_B, const void *beta, rocsparse_spmat_descr mat_C, rocsparse_datatype compute_type, rocsparse_sddmm_alg alg, void *temp_buffer)#
Sampled Dense-Dense Matrix Multiplication.
rocsparse_sddmmmultiplies the scalar \(\alpha\) with the dense \(m \times k\) matrix \(op(A)\), the dense \(k \times n\) matrix \(op(B)\), filtered by the sparsity pattern of the \(m \times n\) sparse matrix \(C\) and adds the result to \(C\) scaled by \(\beta\). The final result is stored in the sparse \(m \times n\) matrix \(C\), such that\[ C := \alpha ( op(A) \cdot op(B) ) \circ spy(C) + \beta C, \]with\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if op(A) == rocsparse_operation_none} \\ A^T, & \text{if op(A) == rocsparse_operation_transpose} \\ \end{array} \right. \end{split}\],\[\begin{split} op(B) = \left\{ \begin{array}{ll} B, & \text{if op(B) == rocsparse_operation_none} \\ B^T, & \text{if op(B) == rocsparse_operation_transpose} \\ \end{array} \right. \end{split}\]and\[\begin{split} spy(C)_{ij} = \left\{ \begin{array}{ll} 1, & \text{ if C_{ij} != 0} \\ 0, & \text{ otherwise} \\ \end{array} \right. \end{split}\]Computing the above sampled dense-dense multiplication requires three steps to complete. First, call rocsparse_sddmm_buffer_size to determine the size of the required temporary storage buffer. Next, allocate this buffer and call rocsparse_sddmm_preprocess, which performs any analysis of the input matrices that might be required. Finally, call
rocsparse_sddmmto complete the computation. After all calls torocsparse_sddmmare complete, the temporary buffer can be deallocated.rocsparse_sddmmsupports different algorithms which can provide better performance for different matrices.Algorithms
Deterministic
Preprocessing
Notes
rocsparse_sddmm_alg_default
Yes
No
Uses the sparsity pattern of matrix C to perform a limited set of dot products.
rocsparse_sddmm_alg_dense
Yes
No
Explicitly converts the matrix C into a dense matrix to perform a dense matrix multiply and add.
Currently,
rocsparse_sddmmonly supports the uniform precisions indicated in the table below. For the sparse matrix \(C\),rocsparse_sddmmsupports the index types rocsparse_indextype_i32 and rocsparse_indextype_i64.- Uniform Precisions:
A / B / C / compute_type
rocsparse_datatype_f16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Mixed Precisions:
A / B
C
compute_type
rocsparse_datatype_f16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f32_r
rocsparse_datatype_f16_r
rocsparse_datatype_f16_r
rocsparse_datatype_f32_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f32_r
rocsparse_datatype_bf16_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
- Example
This example performs a sampled dense-dense matrix product, \(C := \alpha ( A \cdot B ) \circ spy(C) + \beta C\) where \(\circ\) is the Hadamard product.
int main() { // rocSPARSE handle rocsparse_handle handle; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); float halpha = 1.0f; float hbeta = -1.0f; // A, B, and C are mxk, kxn, and mxn int m = 4; int k = 3; int n = 2; int nnzC = 5; // 2 3 -1 // A = 0 2 1 // 0 0 5 // 0 -2 0.5 // 0 4 // B = 1 0 // -2 0.5 // 1 0 1 0 // C = 2 3 spy(C) = 1 1 // 0 0 0 0 // 4 5 1 1 std::vector<float> hA = {2.0f, 3.0f, -1.0f, 0.0, 2.0f, 1.0f, 0.0f, 0.0f, 5.0f, 0.0f, -2.0f, 0.5f}; std::vector<float> hB = {0.0f, 4.0f, 1.0f, 0.0, -2.0f, 0.5f}; std::vector<int> hcsr_row_ptrC = {0, 1, 3, 3, 5}; std::vector<int> hcsr_col_indC = {0, 0, 1, 0, 1}; std::vector<float> hcsr_valC = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f}; float* dA; float* dB; HIP_CHECK(hipMalloc(&dA, sizeof(float) * m * k)); HIP_CHECK(hipMalloc(&dB, sizeof(float) * k * n)); int* dcsr_row_ptrC; int* dcsr_col_indC; float* dcsr_valC; HIP_CHECK(hipMalloc(&dcsr_row_ptrC, sizeof(int) * (m + 1))); HIP_CHECK(hipMalloc(&dcsr_col_indC, sizeof(int) * nnzC)); HIP_CHECK(hipMalloc(&dcsr_valC, sizeof(float) * nnzC)); HIP_CHECK(hipMemcpy(dA, hA.data(), sizeof(float) * m * k, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dB, hB.data(), sizeof(float) * k * n, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy( dcsr_row_ptrC, hcsr_row_ptrC.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_col_indC, hcsr_col_indC.data(), sizeof(int) * nnzC, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dcsr_valC, hcsr_valC.data(), sizeof(float) * nnzC, hipMemcpyHostToDevice)); rocsparse_dnmat_descr matA; ROCSPARSE_CHECK(rocsparse_create_dnmat_descr( &matA, m, k, k, dA, rocsparse_datatype_f32_r, rocsparse_order_row)); rocsparse_dnmat_descr matB; ROCSPARSE_CHECK(rocsparse_create_dnmat_descr( &matB, k, n, n, dB, rocsparse_datatype_f32_r, rocsparse_order_row)); rocsparse_spmat_descr matC; ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matC, m, n, nnzC, dcsr_row_ptrC, dcsr_col_indC, dcsr_valC, rocsparse_indextype_i32, rocsparse_indextype_i32, rocsparse_index_base_zero, rocsparse_datatype_f32_r)); size_t buffer_size = 0; ROCSPARSE_CHECK(rocsparse_sddmm_buffer_size(handle, rocsparse_operation_none, rocsparse_operation_none, &halpha, matA, matB, &hbeta, matC, rocsparse_datatype_f32_r, rocsparse_sddmm_alg_default, &buffer_size)); void* dbuffer; HIP_CHECK(hipMalloc(&dbuffer, buffer_size)); ROCSPARSE_CHECK(rocsparse_sddmm_preprocess(handle, rocsparse_operation_none, rocsparse_operation_none, &halpha, matA, matB, &hbeta, matC, rocsparse_datatype_f32_r, rocsparse_sddmm_alg_default, dbuffer)); ROCSPARSE_CHECK(rocsparse_sddmm(handle, rocsparse_operation_none, rocsparse_operation_none, &halpha, matA, matB, &hbeta, matC, rocsparse_datatype_f32_r, rocsparse_sddmm_alg_default, dbuffer)); HIP_CHECK(hipMemcpy( hcsr_row_ptrC.data(), dcsr_row_ptrC, sizeof(int) * (m + 1), hipMemcpyDeviceToHost)); HIP_CHECK( hipMemcpy(hcsr_col_indC.data(), dcsr_col_indC, sizeof(int) * nnzC, hipMemcpyDeviceToHost)); HIP_CHECK(hipMemcpy(hcsr_valC.data(), dcsr_valC, sizeof(float) * nnzC, hipMemcpyDeviceToHost)); std::cout << "hcsr_row_ptrC" << std::endl; for(size_t i = 0; i < hcsr_row_ptrC.size(); i++) { std::cout << hcsr_row_ptrC[i] << " "; } std::cout << "" << std::endl; std::cout << "hcsr_col_indC" << std::endl; for(size_t i = 0; i < hcsr_col_indC.size(); i++) { std::cout << hcsr_col_indC[i] << " "; } std::cout << "" << std::endl; std::cout << "hcsr_valC" << std::endl; for(size_t i = 0; i < hcsr_valC.size(); i++) { std::cout << hcsr_valC[i] << " "; } std::cout << "" << std::endl; ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(matA)); ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(matB)); ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matC)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); HIP_CHECK(hipFree(dA)); HIP_CHECK(hipFree(dB)); HIP_CHECK(hipFree(dcsr_row_ptrC)); HIP_CHECK(hipFree(dcsr_col_indC)); HIP_CHECK(hipFree(dcsr_valC)); HIP_CHECK(hipFree(dbuffer)); return 0; }
Note
The sparse matrix formats currently supported are: rocsparse_format_csr, rocsparse_format_csc, rocsparse_format_coo, rocsparse_format_coo_aos, and rocsparse_format_ell.
Note
opA== rocsparse_operation_conjugate_transpose is not supported.Note
opB== rocsparse_operation_conjugate_transpose is not supported.Note
This routine supports execution in a hipGraph context only when
alg== rocsparse_sddmm_alg_default.Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
opA – [in] dense matrix \(A\) operation type.
opB – [in] dense matrix \(B\) operation type.
alpha – [in] scalar \(\alpha\).
mat_A – [in] dense matrix \(A\) descriptor.
mat_B – [in] dense matrix \(B\) descriptor.
beta – [in] scalar \(\beta\).
mat_C – [inout] sparse matrix \(C\) descriptor.
compute_type – [in] floating point precision for the SDDMM computation.
alg – [in] specification of the algorithm to use.
temp_buffer – [in] temporary storage buffer allocated by the user. The size must be greater or equal to the size obtained with rocsparse_sddmm_buffer_size.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_value – the value of
opA,opB,compute_type, oralgis incorrect.rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
alphaandbetaare invalid, or themat_A,mat_B,mat_C, ortemp_bufferpointer is invalid.rocsparse_status_not_implemented –
opA== rocsparse_operation_conjugate_transpose oropB== rocsparse_operation_conjugate_transpose.
rocsparse_dense_to_sparse()#
-
rocsparse_status rocsparse_dense_to_sparse(rocsparse_handle handle, rocsparse_const_dnmat_descr mat_A, rocsparse_spmat_descr mat_B, rocsparse_dense_to_sparse_alg alg, size_t *buffer_size, void *temp_buffer)#
Dense matrix to sparse matrix conversion.
rocsparse_dense_to_sparseperforms the conversion of a dense matrix to a sparse matrix in CSR, CSC, or COO format.rocsparse_dense_to_sparserequires multiple steps to complete. First, callrocsparse_dense_to_sparsewithnullptrpassed intotemp_buffer:After this is called, the// Call dense_to_sparse to get required buffer size size_t buffer_size = 0; rocsparse_dense_to_sparse(handle, matA, matB, rocsparse_dense_to_sparse_alg_default, &buffer_size, nullptr);
buffer_sizewill be filled with the size of the required buffer that must be allocated. Next, callrocsparse_dense_to_sparsewith the newly allocatedtemp_bufferandnullptrpassed intobuffer_size:This will determine the number of non-zeros that will exist in the sparse matrix, which can be queried using the rocsparse_spmat_get_size routine. With this, allocate the sparse matrix device arrays and set them on the sparse matrix descriptor using rocsparse_csr_set_pointers (CSR format), rocsparse_csc_set_pointers (for CSC format), or rocsparse_coo_set_pointers (for COO format). Finally, the conversion is completed by calling// Call dense_to_sparse to perform analysis rocsparse_dense_to_sparse(handle, matA, matB, rocsparse_dense_to_sparse_alg_default, nullptr, temp_buffer);
rocsparse_dense_to_sparsewith both thebuffer_sizeandtemp_buffer:Currently,// Call dense_to_sparse to complete conversion rocsparse_dense_to_sparse(handle, matA, matB, rocsparse_dense_to_sparse_alg_default, &buffer_size, temp_buffer);
rocsparse_dense_to_sparseonly supports the algorithm rocsparse_dense_to_sparse_alg_default. See the full example below.rocsparse_dense_to_sparsesupports rocsparse_datatype_f16_r, rocsparse_datatype_bf16_r, rocsparse_datatype_f32_r, rocsparse_datatype_f64_r, rocsparse_datatype_f32_c, and rocsparse_datatype_f64_c for values arrays in the sparse matrix (stored in CSR, CSC, or COO format) and the dense matrix. For the row/column offset and row/column index arrays of the sparse matrix,rocsparse_dense_to_sparsesupports the precisions rocsparse_indextype_i32 and rocsparse_indextype_i64.- Uniform Precisions:
A / B
rocsparse_datatype_f16_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Example
int main() { // 1 4 0 0 0 0 // A = 0 2 3 0 0 0 // 5 0 0 7 8 0 // 0 0 9 0 6 0 int m = 4; int n = 6; std::vector<float> hdense = {1, 0, 5, 0, 4, 2, 0, 0, 0, 3, 0, 9, 0, 0, 7, 0, 0, 0, 8, 6, 0, 0, 0, 0}; // Offload data to device int* dcsr_row_ptr; float* ddense; HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1))); HIP_CHECK(hipMalloc(&ddense, sizeof(float) * m * n)); HIP_CHECK(hipMemcpy(ddense, hdense.data(), sizeof(float) * m * n, hipMemcpyHostToDevice)); rocsparse_handle handle; rocsparse_dnmat_descr matA; rocsparse_spmat_descr matB; rocsparse_indextype row_idx_type = rocsparse_indextype_i32; rocsparse_indextype col_idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // Create sparse matrix A ROCSPARSE_CHECK( rocsparse_create_dnmat_descr(&matA, m, n, m, ddense, data_type, rocsparse_order_column)); // Create dense matrix B ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matB, m, n, 0, dcsr_row_ptr, nullptr, nullptr, row_idx_type, col_idx_type, idx_base, data_type)); // Call dense_to_sparse to get required buffer size size_t buffer_size = 0; ROCSPARSE_CHECK(rocsparse_dense_to_sparse( handle, matA, matB, rocsparse_dense_to_sparse_alg_default, &buffer_size, nullptr)); void* temp_buffer; HIP_CHECK(hipMalloc(&temp_buffer, buffer_size)); // Call dense_to_sparse to perform analysis ROCSPARSE_CHECK(rocsparse_dense_to_sparse( handle, matA, matB, rocsparse_dense_to_sparse_alg_default, nullptr, temp_buffer)); int64_t num_rows_tmp, num_cols_tmp, nnz; ROCSPARSE_CHECK(rocsparse_spmat_get_size(matB, &num_rows_tmp, &num_cols_tmp, &nnz)); int* dcsr_col_ind; float* dcsr_val; HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz)); HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz)); ROCSPARSE_CHECK(rocsparse_csr_set_pointers(matB, dcsr_row_ptr, dcsr_col_ind, dcsr_val)); // Call dense_to_sparse to complete conversion ROCSPARSE_CHECK(rocsparse_dense_to_sparse( handle, matA, matB, rocsparse_dense_to_sparse_alg_default, &buffer_size, temp_buffer)); std::vector<int> hcsr_row_ptr(m + 1, 0); std::vector<int> hcsr_col_ind(nnz, 0); std::vector<float> hcsr_val(nnz, 0); // Copy result back to host HIP_CHECK( hipMemcpy(hcsr_row_ptr.data(), dcsr_row_ptr, sizeof(int) * (m + 1), hipMemcpyDeviceToHost)); HIP_CHECK( hipMemcpy(hcsr_col_ind.data(), dcsr_col_ind, sizeof(int) * nnz, hipMemcpyDeviceToHost)); HIP_CHECK(hipMemcpy(hcsr_val.data(), dcsr_val, sizeof(float) * nnz, hipMemcpyDeviceToHost)); std::cout << "hcsr_row_ptr" << std::endl; for(size_t i = 0; i < hcsr_row_ptr.size(); i++) { std::cout << hcsr_row_ptr[i] << " "; } std::cout << "" << std::endl; std::cout << "hcsr_col_ind" << std::endl; for(size_t i = 0; i < hcsr_col_ind.size(); i++) { std::cout << hcsr_col_ind[i] << " "; } std::cout << "" << std::endl; std::cout << "hcsr_val" << std::endl; for(size_t i = 0; i < hcsr_val.size(); i++) { std::cout << hcsr_val[i] << " "; } std::cout << "" << std::endl; // Clear rocSPARSE ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(matA)); ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matB)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); // Clear device memory HIP_CHECK(hipFree(dcsr_row_ptr)); HIP_CHECK(hipFree(dcsr_col_ind)); HIP_CHECK(hipFree(dcsr_val)); HIP_CHECK(hipFree(ddense)); HIP_CHECK(hipFree(temp_buffer)); return 0; }
Note
This function writes the required allocation size (in bytes) to
buffer_sizeand returns without performing the dense to sparse operation, when a nullptr is passed fortemp_buffer.Note
This function is blocking with respect to the host.
Note
This routine does not support execution in a hipGraph context.
Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
mat_A – [in] dense matrix descriptor.
mat_B – [in] sparse matrix descriptor.
alg – [in] algorithm for the dense to sparse computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when
temp_bufferis nullptr.temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to
buffer_sizeand the function returns without performing the dense to sparse operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
mat_A,mat_B, orbuffer_sizepointer is invalid.
rocsparse_sparse_to_dense()#
-
rocsparse_status rocsparse_sparse_to_dense(rocsparse_handle handle, rocsparse_const_spmat_descr mat_A, rocsparse_dnmat_descr mat_B, rocsparse_sparse_to_dense_alg alg, size_t *buffer_size, void *temp_buffer)#
Sparse matrix to dense matrix conversion.
rocsparse_sparse_to_denseperforms the conversion of a sparse matrix in CSR, CSC, or COO format to a dense matrix.rocsparse_sparse_to_denserequires multiple steps to complete. First, callrocsparse_sparse_to_densewithnullptrpassed intotemp_buffer:After this is called, the// Call sparse_to_dense to get required buffer size size_t buffer_size = 0; rocsparse_sparse_to_dense(handle, matA, matB, rocsparse_sparse_to_dense_alg_default, &buffer_size, nullptr);
buffer_sizewill be filled with the size of the required buffer that must be allocated. Finally, the conversion is completed by callingrocsparse_sparse_to_densewith both thebuffer_sizeandtemp_buffer:Currently,// Call dense_to_sparse to complete conversion rocsparse_sparse_to_dense(handle, matA, matB, rocsparse_sparse_to_dense_alg_default, &buffer_size, temp_buffer);
rocsparse_sparse_to_denseonly supports the algorithm rocsparse_sparse_to_dense_alg_default. See the full example below.rocsparse_sparse_to_densesupports rocsparse_datatype_f16_r, rocsparse_datatype_bf16_r, rocsparse_datatype_f32_r, rocsparse_datatype_f64_r, rocsparse_datatype_f32_c, and rocsparse_datatype_f64_c for values arrays in the sparse matrix (stored in CSR, CSC, or COO format) and the dense matrix. For the row/column offset and row/column index arrays of the sparse matrix,rocsparse_sparse_to_densesupports the precisions rocsparse_indextype_i32 and rocsparse_indextype_i64.- Uniform Precisions:
A / B
rocsparse_datatype_f16_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Example
int main() { // 1 4 0 0 0 0 // A = 0 2 3 0 0 0 // 5 0 0 7 8 0 // 0 0 9 0 6 0 int m = 4; int n = 6; std::vector<int> hcsr_row_ptr = {0, 2, 4, 7, 9}; std::vector<int> hcsr_col_ind = {0, 1, 1, 2, 0, 3, 4, 2, 4}; std::vector<float> hcsr_val = {1, 4, 2, 3, 5, 7, 8, 9, 6}; std::vector<float> hdense(m * n, 0.0f); int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0]; // Offload data to device int* dcsr_row_ptr; int* dcsr_col_ind; float* dcsr_val; float* ddense; HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1))); HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz)); HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz)); HIP_CHECK(hipMalloc(&ddense, sizeof(float) * m * n)); HIP_CHECK( hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(ddense, hdense.data(), sizeof(float) * m * n, hipMemcpyHostToDevice)); rocsparse_handle handle; rocsparse_spmat_descr matA; rocsparse_dnmat_descr matB; rocsparse_indextype row_idx_type = rocsparse_indextype_i32; rocsparse_indextype col_idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // Create sparse matrix A ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA, m, n, nnz, dcsr_row_ptr, dcsr_col_ind, dcsr_val, row_idx_type, col_idx_type, idx_base, data_type)); // Create dense matrix B ROCSPARSE_CHECK( rocsparse_create_dnmat_descr(&matB, m, n, m, ddense, data_type, rocsparse_order_column)); // Call sparse_to_dense size_t buffer_size = 0; ROCSPARSE_CHECK(rocsparse_sparse_to_dense( handle, matA, matB, rocsparse_sparse_to_dense_alg_default, &buffer_size, nullptr)); void* temp_buffer; HIP_CHECK(hipMalloc(&temp_buffer, buffer_size)); ROCSPARSE_CHECK(rocsparse_sparse_to_dense( handle, matA, matB, rocsparse_sparse_to_dense_alg_default, &buffer_size, temp_buffer)); // Copy result back to host HIP_CHECK(hipMemcpy(hdense.data(), ddense, sizeof(float) * m * n, hipMemcpyDeviceToHost)); // Clear rocSPARSE ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA)); ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(matB)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); // Clear device memory HIP_CHECK(hipFree(dcsr_row_ptr)); HIP_CHECK(hipFree(dcsr_col_ind)); HIP_CHECK(hipFree(dcsr_val)); HIP_CHECK(hipFree(ddense)); HIP_CHECK(hipFree(temp_buffer)); return 0; }
Note
This function writes the required allocation size (in bytes) to
buffer_sizeand returns without performing the sparse to dense operation when NULL is passed fortemp_buffer.Note
This function is blocking with respect to the host.
Note
This routine does not support execution in a hipGraph context.
Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
mat_A – [in] sparse matrix descriptor.
mat_B – [in] dense matrix descriptor.
alg – [in] algorithm for the sparse to dense computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when
temp_bufferis nullptr.temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to
buffer_sizeand the function returns without performing the sparse to dense operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
mat_A,mat_B, orbuffer_sizepointer is invalid.
rocsparse_sparse_to_sparse_buffer_size()#
-
rocsparse_status rocsparse_sparse_to_sparse_buffer_size(rocsparse_handle handle, rocsparse_sparse_to_sparse_descr descr, rocsparse_const_spmat_descr source, rocsparse_spmat_descr target, rocsparse_sparse_to_sparse_stage stage, size_t *buffer_size_in_bytes)#
rocsparse_sparse_to_sparse_buffer_sizecalculates the required buffer size in bytes for a given stagestage.Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
descr – [in] descriptor of the sparse_to_sparse algorithm.
source – [in] source sparse matrix descriptor.
target – [in] target sparse matrix descriptor.
stage – [in] stage of the sparse_to_sparse computation.
buffer_size_in_bytes – [out] size in bytes of the
buffer
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_value – if any required enumeration is invalid.
rocsparse_status_invalid_pointer –
mat_A,mat_B, orbuffer_size_in_bytespointer is invalid.
rocsparse_sparse_to_sparse()#
-
rocsparse_status rocsparse_sparse_to_sparse(rocsparse_handle handle, rocsparse_sparse_to_sparse_descr descr, rocsparse_const_spmat_descr source, rocsparse_spmat_descr target, rocsparse_sparse_to_sparse_stage stage, size_t buffer_size_in_bytes, void *buffer)#
Sparse matrix to sparse matrix conversion.
rocsparse_sparse_to_sparseperforms the conversion of a sparse matrix to a sparse matrix.- Example
This example converts a CSR matrix into an ELL matrix.
int main() { // 4 2 0 1 0 // 2 4 2 0 1 // 0 2 4 2 0 // 1 0 2 4 2 // 0 1 0 2 4 int m = 5; int n = 5; int nnz = 17; std::vector<int> hcsr_row_ptr = {0, 3, 7, 10, 14, 17}; std::vector<int> hcsr_col_ind = {0, 1, 3, 0, 1, 2, 4, 1, 2, 3, 0, 2, 3, 4, 1, 3, 4}; std::vector<double> hcsr_val = {4.0, 2.0, 1.0, 2.0, 4.0, 2.0, 1.0, 2.0, 4.0, 2.0, 1.0, 2.0, 4.0, 2.0, 1.0, 2.0, 4.0}; // rocSPARSE handle rocsparse_handle handle; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); int* dcsr_row_ptr = nullptr; int* dcsr_col_ind = nullptr; double* dcsr_val = nullptr; HIP_CHECK(hipMalloc((void**)&dcsr_row_ptr, sizeof(int) * (m + 1))); HIP_CHECK(hipMalloc((void**)&dcsr_col_ind, sizeof(int) * nnz)); HIP_CHECK(hipMalloc((void**)&dcsr_val, sizeof(double) * nnz)); HIP_CHECK( hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(double) * nnz, hipMemcpyHostToDevice)); // It assumes the CSR arrays (ptr, ind, val) have already been allocated and filled. // Build Source rocsparse_spmat_descr source; ROCSPARSE_CHECK(rocsparse_create_csr_descr(&source, m, n, nnz, dcsr_row_ptr, dcsr_col_ind, dcsr_val, rocsparse_indextype_i32, rocsparse_indextype_i32, rocsparse_index_base_zero, rocsparse_datatype_f64_r)); // Build target void * dell_ind, *dell_val; int64_t ell_width = 0; rocsparse_spmat_descr target; ROCSPARSE_CHECK(rocsparse_create_ell_descr(&target, m, n, dell_ind, dell_val, ell_width, rocsparse_indextype_i32, rocsparse_index_base_zero, rocsparse_datatype_f64_r)); // Create descriptor rocsparse_sparse_to_sparse_descr descr; ROCSPARSE_CHECK(rocsparse_create_sparse_to_sparse_descr( &descr, source, target, rocsparse_sparse_to_sparse_alg_default)); // Analysis phase size_t buffer_size; ROCSPARSE_CHECK(rocsparse_sparse_to_sparse_buffer_size( handle, descr, source, target, rocsparse_sparse_to_sparse_stage_analysis, &buffer_size)); void* buffer = nullptr; HIP_CHECK(hipMalloc(&buffer, buffer_size)); ROCSPARSE_CHECK(rocsparse_sparse_to_sparse(handle, descr, source, target, rocsparse_sparse_to_sparse_stage_analysis, buffer_size, buffer)); HIP_CHECK(hipFree(buffer)); // the user is responsible to allocate target arrays after the analysis phase. int64_t rows, cols; void * ind, *val; rocsparse_indextype idx_type; rocsparse_index_base idx_base; rocsparse_datatype data_type; // Get ell_width ROCSPARSE_CHECK(rocsparse_ell_get( target, &rows, &cols, &ind, &val, &ell_width, &idx_type, &idx_base, &data_type)); std::cout << "rows: " << rows << " cols: " << cols << " ell_width: " << ell_width << std::endl; // Allocate device arrays for ELL format HIP_CHECK(hipMalloc(&dell_ind, sizeof(int) * ell_width * m)); HIP_CHECK(hipMalloc(&dell_val, sizeof(double) * ell_width * m)); ROCSPARSE_CHECK(rocsparse_ell_set_pointers(target, dell_ind, dell_val)); // Calculation phase ROCSPARSE_CHECK(rocsparse_sparse_to_sparse_buffer_size( handle, descr, source, target, rocsparse_sparse_to_sparse_stage_compute, &buffer_size)); HIP_CHECK(hipMalloc(&buffer, buffer_size)); ROCSPARSE_CHECK(rocsparse_sparse_to_sparse(handle, descr, source, target, rocsparse_sparse_to_sparse_stage_compute, buffer_size, buffer)); HIP_CHECK(hipFree(buffer)); std::vector<int> hell_ind(ell_width * m); std::vector<double> hell_val(ell_width * m); HIP_CHECK( hipMemcpy(hell_ind.data(), dell_ind, sizeof(int) * ell_width * m, hipMemcpyDeviceToHost)); HIP_CHECK(hipMemcpy( hell_val.data(), dell_val, sizeof(double) * ell_width * m, hipMemcpyDeviceToHost)); std::cout << "hell_ind" << std::endl; for(size_t i = 0; i < hell_ind.size(); i++) { std::cout << hell_ind[i] << " "; } std::cout << "" << std::endl; std::cout << "hell_val" << std::endl; for(size_t i = 0; i < hell_val.size(); i++) { std::cout << hell_val[i] << " "; } std::cout << "" << std::endl; HIP_CHECK(hipFree(dcsr_row_ptr)); HIP_CHECK(hipFree(dcsr_col_ind)); HIP_CHECK(hipFree(dcsr_val)); HIP_CHECK(hipFree(dell_ind)); HIP_CHECK(hipFree(dell_val)); return 0; }
Note
The required allocation size (in bytes) to
buffer_size_in_bytesmust be obtained from rocsparse_sparse_to_sparse_buffer_size for each stage. The required buffer size can be different between stages.Note
The rocsparse_format_bell and rocsparse_format_sell formats are not supported.
Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
descr – [in] descriptor of the sparse_to_sparse algorithm.
source – [in] sparse matrix descriptor.
target – [in] sparse matrix descriptor.
stage – [in] stage of the sparse_to_sparse computation.
buffer_size_in_bytes – [in] size in bytes of the
buffer.buffer – [in] temporary storage buffer allocated by the user.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_extract_buffer_size()#
-
rocsparse_status rocsparse_extract_buffer_size(rocsparse_handle handle, rocsparse_extract_descr descr, rocsparse_const_spmat_descr source, rocsparse_spmat_descr target, rocsparse_extract_stage stage, size_t *buffer_size_in_bytes)#
rocsparse_extract_buffer_sizecalculates the required buffer size in bytes for a given stagestage. This routine is used in conjunction with rocsparse_extract_nnz and rocsparse_extract to extract a lower or upper triangular sparse matrix from an input sparse matrix. See rocsparse_extract for more details.Note
This routine is asynchronous with respect to the host. This routine supports execution in a hipGraph context.
Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
descr – [in] descriptor of the extract algorithm.
source – [in] source sparse matrix descriptor.
target – [in] target sparse matrix descriptor.
stage – [in] stage of the extract computation.
buffer_size_in_bytes – [out] size in bytes of the buffer.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_value – if
stageis invalid.rocsparse_status_invalid_pointer –
descr,source,target, orbuffer_size_in_bytespointer is invalid.
rocsparse_extract_nnz#
-
rocsparse_status rocsparse_extract_nnz(rocsparse_handle handle, rocsparse_extract_descr descr, int64_t *nnz)#
rocsparse_extract_nnzreturns the number of non-zeros in the extracted matrix. The value is available after the analysis phase rocsparse_extract_stage_analysis has been executed. This routine is used in conjunction with rocsparse_extract_buffer_size and rocsparse_extract to extract a lower or upper triangular sparse matrix from an input sparse matrix. See rocsparse_extract for more details.Note
This routine is asynchronous with respect to the host. This routine supports execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
descr – [in] descriptor of the extract algorithm.
nnz – [out] the number of non-zeros.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
descrornnzpointer is invalid.
rocsparse_extract()#
-
rocsparse_status rocsparse_extract(rocsparse_handle handle, rocsparse_extract_descr descr, rocsparse_const_spmat_descr source, rocsparse_spmat_descr target, rocsparse_extract_stage stage, size_t buffer_size_in_bytes, void *buffer)#
Sparse matrix extraction.
rocsparse_extractperforms the extraction of the lower or upper part of a sparse matrix into a new matrix.rocsparse_extractrequires multiple steps to complete. First, create the source and target sparse matrix descriptors. For example, in the case of CSR matrix format, this might look like:Next, create the extraction descriptor and call rocsparse_extract_buffer_size with the stage rocsparse_extract_stage_analysis to determine the amount of temporary storage required. Allocate this temporary storage buffer and pass it to// Build Source rocsparse_spmat_descr source; rocsparse_create_csr_descr(&source, M, N, nnz, dsource_row_ptr, dsource_col_ind, dsource_val, rocsparse_indextype_i32, rocsparse_indextype_i32, rocsparse_index_base_zero, rocsparse_datatype_f32_r); // Build target void * dtarget_row_ptr; hipMalloc(&dtarget_row_ptr, sizeof(int32_t) * (M + 1)); rocsparse_spmat_descr target; rocsparse_create_csr_descr(&target, M, N, 0, dtarget_row_ptr, nullptr, nullptr, rocsparse_indextype_i32, rocsparse_indextype_i32, rocsparse_index_base_zero, rocsparse_datatype_f32_r);
rocsparse_extractwith the stage rocsparse_extract_stage_analysis.Then calls rocsparse_extract_nnz to determine the number of non-zeros that will exist in the target matrix. After this is determined, allocate the column indices and values arrays of the target sparse matrix:// Create descriptor rocsparse_extract_descr descr; rocsparse_create_extract_descr(&descr, source, target, rocsparse_extract_alg_default); // Analysis phase size_t buffer_size; rocsparse_extract_buffer_size(handle, descr, source, target, rocsparse_extract_stage_analysis, &buffer_size); void* dbuffer = nullptr; hipMalloc(&dbuffer, buffer_size); rocsparse_extract(handle, descr, source, target, rocsparse_extract_stage_analysis, buffer_size, dbuffer); hipFree(dbuffer);
Finally, call rocsparse_extract_buffer_size with the stage rocsparse_extract_stage_compute to determine the size of the temporary user-allocated storage needed for the computation of the column indices and values in the sparse target. Allocate this buffer and complete the conversion by callingint64_t target_nnz; rocsparse_extract_nnz(handle, descr, &target_nnz); void* dtarget_col_ind, void* dtarget_val; hipMalloc(&dtarget_col_ind, sizeof(int32_t) * target_nnz); hipMalloc(&dtarget_val, sizeof(float) * target_nnz); rocsparse_csr_set_pointers(target, dtarget_row_ptr, dtarget_col_ind, dtarget_val);
rocsparse_extractusing the rocsparse_extract_stage_compute stage:The target row pointer, column indices, and values arrays will now be filled with the upper or lower part of the source matrix.// Calculation phase rocsparse_extract_buffer_size(handle, descr, source, target, rocsparse_extract_stage_compute, &buffer_size); hipMalloc(&dbuffer, buffer_size); rocsparse_extract(handle, descr, source, target, rocsparse_extract_stage_compute, buffer_size, dbuffer); hipFree(dbuffer);
The source and the target matrices must have the same format (see rocsparse_format) and the same storage mode (see rocsparse_storage_mode). The attributes of the target matrix, the fill mode rocsparse_fill_mode, and the diagonal type rocsparse_diag_type are used to parameterize the algorithm. These can be set on the target matrix using rocsparse_spmat_set_attribute. See the full example below.
- Example
This example extracts the lower part of CSR matrix into a CSR matrix.
int main() { // 1 2 3 0 0 0 4 5 // 0 1 3 5 7 0 0 0 // 0 0 0 1 0 3 0 9 // 1 2 3 0 0 0 0 4 // 0 0 0 0 0 0 0 0 // 1 2 1 0 0 5 8 0 // 0 1 2 3 0 0 0 4 // 0 0 0 1 2 0 1 2 int32_t M = 8; int32_t N = 8; int32_t nnz = 29; std::vector<int32_t> hsource_row_ptr = {0, 5, 9, 12, 16, 16, 21, 25, 29}; std::vector<int32_t> hsource_col_ind = {0, 1, 2, 6, 7, 1, 2, 3, 4, 3, 5, 7, 0, 1, 2, 7, 0, 1, 2, 5, 6, 1, 2, 3, 7, 3, 4, 6, 7}; std::vector<float> hsource_val = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 1.0f, 3.0f, 5.0f, 7.0f, 1.0f, 3.0f, 9.0f, 1.0f, 2.0f, 3.0f, 4.0f, 1.0f, 2.0f, 1.0f, 5.0f, 8.0f, 1.0f, 2.0f, 3.0f, 4.0f, 1.0f, 2.0f, 1.0f, 2.0f}; int32_t* dsource_row_ptr; int32_t* dsource_col_ind; float* dsource_val; HIP_CHECK(hipMalloc(&dsource_row_ptr, sizeof(int32_t) * (M + 1))); HIP_CHECK(hipMalloc(&dsource_col_ind, sizeof(int32_t) * nnz)); HIP_CHECK(hipMalloc(&dsource_val, sizeof(float) * nnz)); HIP_CHECK(hipMemcpy( dsource_row_ptr, hsource_row_ptr.data(), sizeof(int32_t) * (M + 1), hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy( dsource_col_ind, hsource_col_ind.data(), sizeof(int32_t) * nnz, hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dsource_val, hsource_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice)); rocsparse_handle handle; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // Build Source rocsparse_spmat_descr source; ROCSPARSE_CHECK(rocsparse_create_csr_descr(&source, M, N, nnz, dsource_row_ptr, dsource_col_ind, dsource_val, rocsparse_indextype_i32, rocsparse_indextype_i32, rocsparse_index_base_zero, rocsparse_datatype_f32_r)); // Build target void* dtarget_row_ptr; HIP_CHECK(hipMalloc(&dtarget_row_ptr, sizeof(int32_t) * (M + 1))); rocsparse_spmat_descr target; ROCSPARSE_CHECK(rocsparse_create_csr_descr(&target, M, N, 0, dtarget_row_ptr, nullptr, nullptr, rocsparse_indextype_i32, rocsparse_indextype_i32, rocsparse_index_base_zero, rocsparse_datatype_f32_r)); const rocsparse_fill_mode fill_mode = rocsparse_fill_mode_lower; const rocsparse_diag_type diag_type = rocsparse_diag_type_non_unit; ROCSPARSE_CHECK(rocsparse_spmat_set_attribute( target, rocsparse_spmat_fill_mode, &fill_mode, sizeof(fill_mode))); ROCSPARSE_CHECK(rocsparse_spmat_set_attribute( target, rocsparse_spmat_diag_type, &diag_type, sizeof(diag_type))); // Create descriptor rocsparse_extract_descr descr; ROCSPARSE_CHECK( rocsparse_create_extract_descr(&descr, source, target, rocsparse_extract_alg_default)); // Analysis phase size_t buffer_size; ROCSPARSE_CHECK(rocsparse_extract_buffer_size( handle, descr, source, target, rocsparse_extract_stage_analysis, &buffer_size)); void* dbuffer; HIP_CHECK(hipMalloc(&dbuffer, buffer_size)); ROCSPARSE_CHECK(rocsparse_extract( handle, descr, source, target, rocsparse_extract_stage_analysis, buffer_size, dbuffer)); HIP_CHECK(hipFree(dbuffer)); // The user is responsible to allocate target arrays after the analysis phase. int64_t target_nnz; ROCSPARSE_CHECK(rocsparse_extract_nnz(handle, descr, &target_nnz)); std::cout << "target_nnz: " << target_nnz << std::endl; void* dtarget_col_ind; void* dtarget_val; HIP_CHECK(hipMalloc(&dtarget_col_ind, sizeof(int32_t) * target_nnz)); HIP_CHECK(hipMalloc(&dtarget_val, sizeof(float) * target_nnz)); ROCSPARSE_CHECK( rocsparse_csr_set_pointers(target, dtarget_row_ptr, dtarget_col_ind, dtarget_val)); // Calculation phase ROCSPARSE_CHECK(rocsparse_extract_buffer_size( handle, descr, source, target, rocsparse_extract_stage_compute, &buffer_size)); HIP_CHECK(hipMalloc(&dbuffer, buffer_size)); ROCSPARSE_CHECK(rocsparse_extract( handle, descr, source, target, rocsparse_extract_stage_compute, buffer_size, dbuffer)); HIP_CHECK(hipFree(dbuffer)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); HIP_CHECK(hipFree(dsource_row_ptr)); HIP_CHECK(hipFree(dsource_col_ind)); HIP_CHECK(hipFree(dsource_val)); HIP_CHECK(hipFree(dtarget_row_ptr)); HIP_CHECK(hipFree(dtarget_col_ind)); HIP_CHECK(hipFree(dtarget_val)); return 0; }
Note
This routine is asynchronous with respect to the host. This routine supports execution in a hipGraph context.
Note
Supported formats are rocsparse_format_csr and rocsparse_format_csc.
Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
descr – [in] descriptor of the extract algorithm.
source – [in] sparse matrix descriptor.
target – [in] sparse matrix descriptor.
stage – [in] stage of the extract computation.
buffer_size_in_bytes – [in] size in bytes of the
buffer.buffer – [in] temporary storage buffer allocated by the user.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_value – if
stageis invalid.rocsparse_status_invalid_pointer –
descr,source,target, orbufferpointer is invalid.
rocsparse_check_spmat#
-
rocsparse_status rocsparse_check_spmat(rocsparse_handle handle, rocsparse_const_spmat_descr mat, rocsparse_data_status *data_status, rocsparse_check_spmat_stage stage, size_t *buffer_size, void *temp_buffer)#
Check matrix to see if it is valid.
rocsparse_check_spmatchecks whether the input matrix is valid.rocsparse_check_spmatrequires two steps to complete. First, callrocsparse_check_spmatwith the stage parameter set to rocsparse_check_spmat_stage_buffer_size, which determines the size of the temporary buffer needed in the second step. Allocate this buffer and callrocsparse_check_spmatwith the stage parameter set to rocsparse_check_spmat_stage_compute, which checks the input matrix for errors. Any detected errors in the input matrix are reported in thedata_status(passed to the function as a host pointer).- Uniform Precisions:
A
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Example
This example checks whether a matrix is upper triangular. The matrix passed to rocsparse_check_spmat is invalid because it contains an entry in the lower triangular part of the matrix.
int main() { // 1 2 0 0 // 3 0 4 0 // <-------contains a "3" in the lower part of matrix // 0 0 1 1 // 0 0 0 2 std::vector<int> hcsr_row_ptr = {0, 2, 4, 6, 7}; std::vector<int> hcsr_col_ind = {0, 1, 0, 2, 2, 3, 3}; std::vector<float> hcsr_val = {1, 2, 3, 4, 1, 1, 2}; int M = 4; int N = 4; int nnz = 7; int* dcsr_row_ptr; int* dcsr_col_ind; float* dcsr_val; HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (M + 1))); HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz)); HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz)); HIP_CHECK( hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (M + 1), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice)); rocsparse_handle handle; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); rocsparse_spmat_descr matA; ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA, M, N, nnz, dcsr_row_ptr, dcsr_col_ind, dcsr_val, rocsparse_indextype_i32, rocsparse_indextype_i32, rocsparse_index_base_zero, rocsparse_datatype_f32_r)); const rocsparse_fill_mode fill_mode = rocsparse_fill_mode_upper; const rocsparse_matrix_type matrix_type = rocsparse_matrix_type_triangular; ROCSPARSE_CHECK(rocsparse_spmat_set_attribute( matA, rocsparse_spmat_fill_mode, &fill_mode, sizeof(fill_mode))); ROCSPARSE_CHECK(rocsparse_spmat_set_attribute( matA, rocsparse_spmat_matrix_type, &matrix_type, sizeof(matrix_type))); rocsparse_data_status data_status; size_t buffer_size; ROCSPARSE_CHECK(rocsparse_check_spmat(handle, matA, &data_status, rocsparse_check_spmat_stage_buffer_size, &buffer_size, nullptr)); void* dbuffer; HIP_CHECK(hipMalloc(&dbuffer, buffer_size)); ROCSPARSE_CHECK(rocsparse_check_spmat( handle, matA, &data_status, rocsparse_check_spmat_stage_compute, &buffer_size, dbuffer)); std::cout << "data_status: " << data_status << std::endl; ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA)); HIP_CHECK(hipFree(dbuffer)); HIP_CHECK(hipFree(dcsr_row_ptr)); HIP_CHECK(hipFree(dcsr_col_ind)); HIP_CHECK(hipFree(dcsr_val)); return 0; }
Note
This function writes the required allocation size (in bytes) to
buffer_sizeand returns without performing the checking operation whenstageis equal to rocsparse_check_spmat_stage_buffer_size.Note
The sparse matrix formats currently supported are:
rocsparse_format_coo,rocsparse_format_csr,rocsparse_format_csc,rocsparse_format_ell, androcsparse_format_bsr.Note
check_spmat requires two stages to complete. The first stage rocsparse_check_spmat_stage_buffer_size will return the size of the temporary storage buffer that is required for subsequent calls to rocsparse_check_spmat. In the final stage rocsparse_check_spmat_stage_compute, the actual computation is performed.
Note
This routine does not support execution in a hipGraph context.
Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
mat – [in] matrix descriptor.
data_status – [out] modified to indicate the status of the data.
stage – [in] check_matrix stage for the matrix computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when
temp_bufferis nullptr.temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to
buffer_sizeand function returns without performing the checking operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
mat,buffer_size,temp_buffer, ordata_statuspointer is invalid.rocsparse_status_invalid_value – the value of
stageis incorrect.
rocsparse_spitsv#
-
rocsparse_status rocsparse_spitsv(rocsparse_handle handle, rocsparse_int *host_nmaxiter, const void *host_tol, void *host_history, rocsparse_operation trans, const void *alpha, const rocsparse_spmat_descr mat, const rocsparse_dnvec_descr x, const rocsparse_dnvec_descr y, rocsparse_datatype compute_type, rocsparse_spitsv_alg alg, rocsparse_spitsv_stage stage, size_t *buffer_size, void *temp_buffer)#
Sparse iterative triangular solve.
rocsparse_spitsvsolves, using the Jacobi iterative method, a sparse triangular linear system of a sparse \(m \times m\) matrix, defined in CSR format, a dense solution vector \(y\) and the right-hand side \(x\) that is multiplied by \(\alpha\), such that\[ op(A) y = \alpha x, \]with\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans == rocsparse_operation_none} \\ A^T, & \text{if trans == rocsparse_operation_transpose} \\ A^H, & \text{if trans == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]The Jacobi method applied to the sparse triangular linear system above gives
\[ y_{k+1} = y_{k} + D^{-1} ( \alpha x - (D + T) y_{k} ) \]with \(A = D + T\), \(D\) the diagonal of \(A\) and \(T\) the strict triangular part of \(A\).The above equation can be also written as
\[ y_{k+1} = y_{k} + D^{-1} r_k \]where\[ r_k = \alpha x - (D + T) y_k. \]Starting with \(y_0 = \)y, the method iterates while \( k \lt \)host_nmaxiterand until\[ \Vert r_k \Vert_{\infty} \le \epsilon, \]with \(\epsilon\) =host_tol.rocsparse_spitsvrequires three stages to complete. First, pass the rocsparse_spitsv_stage_buffer_size stage to determine the size of the required temporary storage buffer. Next, allocate this buffer and callrocsparse_spitsvagain with the rocsparse_spitsv_stage_preprocess stage, which will preprocess data and store it in the temporary buffer. Finally, callrocsparse_spitsvwith the rocsparse_spitsv_stage_compute stage to perform the actual computation. After all calls torocsparse_spitsvare complete, the temporary buffer can be deallocated.rocsparse_spitsvsupports rocsparse_indextype_i32 and rocsparse_indextype_i64 index precisions for storing the row pointer and column indices arrays of the sparse matrix.rocsparse_spitsvsupports the following data types for \(op(A)\), \(x\), \(y\), and compute types for \(\alpha\):- Uniform Precisions:
A / X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Example
int main() { // 1 0 0 0 // A = 0 2 0 0 // 5 0 3 0 // 0 0 9 4 int m = 4; int n = 4; int nnz = 6; float halpha = 1.0f; std::vector<int> hcsr_row_ptr = {0, 1, 2, 4, 6}; std::vector<int> hcsr_col_ind = {0, 1, 0, 2, 2, 3}; std::vector<float> hcsr_val = {1.0f, 2.0f, 5.0f, 3.0f, 9.0f, 4.0f}; std::vector<float> hx(m, 1.0f); std::vector<float> hy(m, 1.0f); // Offload data to device int* dcsr_row_ptr; int* dcsr_col_ind; float* dcsr_val; float* dx; float* dy; HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1))); HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz)); HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz)); HIP_CHECK(hipMalloc(&dx, sizeof(float) * m)); HIP_CHECK(hipMalloc(&dy, sizeof(float) * m)); HIP_CHECK( hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice)); HIP_CHECK( hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dx, hx.data(), sizeof(float) * m, hipMemcpyHostToDevice)); HIP_CHECK(hipMemcpy(dy, hy.data(), sizeof(float) * m, hipMemcpyHostToDevice)); rocsparse_handle handle; rocsparse_spmat_descr matA; rocsparse_dnvec_descr vecX; rocsparse_dnvec_descr vecY; rocsparse_indextype row_ptr_type = rocsparse_indextype_i32; rocsparse_indextype col_idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_datatype compute_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; ROCSPARSE_CHECK(rocsparse_create_handle(&handle)); // Create sparse matrix A ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA, m, m, nnz, dcsr_row_ptr, dcsr_col_ind, dcsr_val, row_ptr_type, col_idx_type, idx_base, data_type)); ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecX, m, dx, data_type)); ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, m, dy, data_type)); rocsparse_int host_nmaxiter[1] = {200}; float host_tol[1] = {1.0e-6}; float host_history[200]; size_t buffer_size = 0; ROCSPARSE_CHECK(rocsparse_spitsv(handle, &host_nmaxiter[0], &host_tol[0], &host_history[0], rocsparse_operation_none, &halpha, matA, vecX, vecY, compute_type, rocsparse_spitsv_alg_default, rocsparse_spitsv_stage_buffer_size, &buffer_size, nullptr)); void* temp_buffer; HIP_CHECK(hipMalloc(&temp_buffer, buffer_size)); ROCSPARSE_CHECK(rocsparse_spitsv(handle, &host_nmaxiter[0], &host_tol[0], &host_history[0], rocsparse_operation_none, &halpha, matA, vecX, vecY, compute_type, rocsparse_spitsv_alg_default, rocsparse_spitsv_stage_preprocess, nullptr, temp_buffer)); ROCSPARSE_CHECK(rocsparse_spitsv(handle, &host_nmaxiter[0], &host_tol[0], &host_history[0], rocsparse_operation_none, &halpha, matA, vecX, vecY, compute_type, rocsparse_spitsv_alg_default, rocsparse_spitsv_stage_compute, &buffer_size, temp_buffer)); HIP_CHECK(hipMemcpy(hy.data(), dy, sizeof(float) * m, hipMemcpyDeviceToHost)); // Clear rocSPARSE ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA)); ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecX)); ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY)); ROCSPARSE_CHECK(rocsparse_destroy_handle(handle)); // Clear device memory HIP_CHECK(hipFree(dcsr_row_ptr)); HIP_CHECK(hipFree(dcsr_col_ind)); HIP_CHECK(hipFree(dcsr_val)); HIP_CHECK(hipFree(dx)); HIP_CHECK(hipFree(dy)); HIP_CHECK(hipFree(temp_buffer)); return 0; }
Note
This routine does not support execution in a hipGraph context.
Note
This routine does not support batched computation.
- Parameters:
handle – [in] handle to the rocSPARSE library context queue.
host_nmaxiter – [inout] maximum number of iteration on input and number of iteration on output. If the output number of iterations is strictly less than the input maximum number of iterations, then the algorithm converged.
host_tol – [in] if the pointer is null, then the loop will execute
nmaxiter[0] iterations. The precision is float for f32-based calculations (including the complex case) and double for f64-based calculations (including the complex case).host_history – [out] Optional array to record the norm of the residual before each iteration. The precision is float for f32-based calculations (including the complex case) and double for f64-based calculations (including the complex case).
trans – [in] matrix operation type.
alpha – [in] scalar \(\alpha\).
mat – [in] matrix descriptor.
x – [in] vector descriptor.
y – [inout] vector descriptor.
compute_type – [in] floating point precision for the SpITSV computation.
alg – [in] SpITSV algorithm for the SpITSV computation.
stage – [in] SpITSV stage for the SpITSV computation.
buffer_size – [out] number of bytes of the temporary storage buffer.
temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to
buffer_sizeand function returns without performing the SpITSV operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
alpha,mat,x,y,descr, orbuffer_sizepointer is invalid.rocsparse_status_not_implemented –
trans,compute_type,stage, oralgis currently not supported.