Sparse Generic Functions#
This module holds all sparse generic routines.
The sparse generic routines describe some of the most common operations that manipulate sparse matrices and vectors. The generic API is more flexible than the other rocSPARSE APIs in that it is easy to set different index types, data types and compute types. For some generic routines, for example SpMV, the generic API also allows users to select different algorithms which have different performance characteristics depending on the sparse matrix being operated on.
rocsparse_axpby()#
-
rocsparse_status rocsparse_axpby(rocsparse_handle handle, const void *alpha, rocsparse_const_spvec_descr x, const void *beta, rocsparse_dnvec_descr y)#
Scale a sparse vector and add it to a scaled dense vector.
rocsparse_axpby
multiplies the sparse vector \(x\) with scalar \(\alpha\) and adds the result to the dense vector \(y\) that is multiplied with scalar \(\beta\), such that\[ y := \alpha \cdot x + \beta \cdot y \]for(i = 0; i < size; ++i) { y[i] = beta * y[i] } for(i = 0; i < nnz; ++i) { y[x_ind[i]] += alpha * x_val[i] }
- Uniform Precisions:
X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Example
// Number of non-zeros of the sparse vector int nnz = 3; // Size of sparse and dense vector int size = 9; // Sparse index vector std::vector<int> hx_ind = {0, 3, 5}; // Sparse value vector std::vector<float> hx_val = {1.0f, 2.0f, 3.0f}; // Dense vector std::vector<float> hy = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f}; // Scalar alpha float alpha = 3.7f; // Scalar beta float beta = 1.2f; // Offload data to device int* dx_ind; float* dx_val; float* dy; hipMalloc((void**)&dx_ind, sizeof(int) * nnz); hipMalloc((void**)&dx_val, sizeof(float) * nnz); hipMalloc((void**)&dy, sizeof(float) * size); hipMemcpy(dx_ind, hx_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice); hipMemcpy(dx_val, hx_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice); hipMemcpy(dy, hy.data(), sizeof(float) * size, hipMemcpyHostToDevice); rocsparse_handle handle; rocsparse_spvec_descr vecX; rocsparse_dnvec_descr vecY; rocsparse_indextype idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; rocsparse_create_handle(&handle); // Create sparse vector X rocsparse_create_spvec_descr(&vecX, size, nnz, dx_ind, dx_val, idx_type, idx_base, data_type); // Create dense vector Y rocsparse_create_dnvec_descr(&vecY, size, dy, data_type); // Call axpby to perform y = beta * y + alpha * x rocsparse_axpby(handle, &alpha, vecX, &beta, vecY); rocsparse_dnvec_get_values(vecY, (void**)&dy); // Copy result back to host hipMemcpy(hy.data(), dy, sizeof(float) * size, hipMemcpyDeviceToHost); std::cout << "y" << std::endl; for(size_t i = 0; i < hy.size(); ++i) { std::cout << hy[i] << " "; } std::cout << std::endl; // Clear rocSPARSE rocsparse_destroy_spvec_descr(vecX); rocsparse_destroy_dnvec_descr(vecY); rocsparse_destroy_handle(handle); // Clear device memory hipFree(dx_ind); hipFree(dx_val); hipFree(dy);
Note
This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.
Note
This routine supports execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
alpha – [in] scalar \(\alpha\).
x – [in] sparse matrix descriptor.
beta – [in] scalar \(\beta\).
y – [inout] dense matrix descriptor.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
alpha
,x
,beta
ory
pointer is invalid.
rocsparse_gather()#
-
rocsparse_status rocsparse_gather(rocsparse_handle handle, rocsparse_const_dnvec_descr y, rocsparse_spvec_descr x)#
Gather elements from a dense vector and store them into a sparse vector.
rocsparse_gather
gathers the elements from the dense vector \(y\) and stores them in the sparse vector \(x\).for(i = 0; i < nnz; ++i) { x_val[i] = y[x_ind[i]]; }
- Uniform Precisions:
X / Y
rocsparse_datatype_i8_r
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Example
// Number of non-zeros of the sparse vector int nnz = 3; // Size of sparse and dense vector int size = 9; // Sparse index vector std::vector<int> hx_ind = {0, 3, 5}; // Dense vector std::vector<float> hy = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f}; // Offload data to device int* dx_ind; float* dx_val; float* dy; hipMalloc((void**)&dx_ind, sizeof(int) * nnz); hipMalloc((void**)&dx_val, sizeof(float) * nnz); hipMalloc((void**)&dy, sizeof(float) * size); hipMemcpy(dx_ind, hx_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice); hipMemcpy(dy, hy.data(), sizeof(float) * size, hipMemcpyHostToDevice); rocsparse_handle handle; rocsparse_spvec_descr vecX; rocsparse_dnvec_descr vecY; rocsparse_indextype idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; rocsparse_create_handle(&handle); // Create sparse vector X rocsparse_create_spvec_descr(&vecX, size, nnz, dx_ind, dx_val, idx_type, idx_base, data_type); // Create dense vector Y rocsparse_create_dnvec_descr(&vecY, size, dy, data_type); // Call axpby to perform gather rocsparse_gather(handle, vecY, vecX); rocsparse_spvec_get_values(vecX, (void**)&dx_val); // Copy result back to host std::vector<float> hx_val(nnz, 0.0f); hipMemcpy(hx_val.data(), dx_val, sizeof(float) * nnz, hipMemcpyDeviceToHost); std::cout << "x" << std::endl; for(size_t i = 0; i < hx_val.size(); ++i) { std::cout << hx_val[i] << " "; } std::cout << std::endl; // Clear rocSPARSE rocsparse_destroy_spvec_descr(vecX); rocsparse_destroy_dnvec_descr(vecY); rocsparse_destroy_handle(handle); // Clear device memory hipFree(dx_ind); hipFree(dx_val); hipFree(dy);
Note
This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.
Note
This routine supports execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
y – [in] dense vector \(y\).
x – [out] sparse vector \(x\).
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
x
ory
pointer is invalid.
rocsparse_scatter()#
-
rocsparse_status rocsparse_scatter(rocsparse_handle handle, rocsparse_const_spvec_descr x, rocsparse_dnvec_descr y)#
Scatter elements from a sparse vector into a dense vector.
rocsparse_scatter
scatters the elements from the sparse vector \(x\) in the dense vector \(y\).for(i = 0; i < nnz; ++i) { y[x_ind[i]] = x_val[i]; }
- Uniform Precisions:
X / Y
rocsparse_datatype_i8_r
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Example
// Number of non-zeros of the sparse vector int nnz = 3; // Size of sparse and dense vector int size = 9; // Sparse index vector std::vector<int> hx_ind = {0, 3, 5}; // Sparse value vector std::vector<float> hx_val = {1.0f, 2.0f, 3.0f}; // Dense vector std::vector<float> hy = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f}; // Offload data to device int* dx_ind; float* dx_val; float* dy; hipMalloc((void**)&dx_ind, sizeof(int) * nnz); hipMalloc((void**)&dx_val, sizeof(float) * nnz); hipMalloc((void**)&dy, sizeof(float) * size); hipMemcpy(dx_ind, hx_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice); hipMemcpy(dx_val, hx_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice); hipMemcpy(dy, hy.data(), sizeof(float) * size, hipMemcpyHostToDevice); rocsparse_handle handle; rocsparse_spvec_descr vecX; rocsparse_dnvec_descr vecY; rocsparse_indextype idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; rocsparse_create_handle(&handle); // Create sparse vector X rocsparse_create_spvec_descr(&vecX, size, nnz, dx_ind, dx_val, idx_type, idx_base, data_type); // Create dense vector Y rocsparse_create_dnvec_descr(&vecY, size, dy, data_type); // Call axpby to perform scatter rocsparse_scatter(handle, vecX, vecY); rocsparse_dnvec_get_values(vecY, (void**)&dy); // Copy result back to host hipMemcpy(hy.data(), dy, sizeof(float) * size, hipMemcpyDeviceToHost); std::cout << "y" << std::endl; for(size_t i = 0; i < hy.size(); ++i) { std::cout << hy[i] << " "; } std::cout << std::endl; // Clear rocSPARSE rocsparse_destroy_spvec_descr(vecX); rocsparse_destroy_dnvec_descr(vecY); rocsparse_destroy_handle(handle); // Clear device memory hipFree(dx_ind); hipFree(dx_val); hipFree(dy);
Note
This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.
Note
This routine supports execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
x – [in] sparse vector \(x\).
y – [out] dense vector \(y\).
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
x
ory
pointer is invalid.
rocsparse_rot()#
-
rocsparse_status rocsparse_rot(rocsparse_handle handle, const void *c, const void *s, rocsparse_spvec_descr x, rocsparse_dnvec_descr y)#
Apply Givens rotation to a dense and a sparse vector.
rocsparse_rot
applies the Givens rotation matrix \(G\) to the sparse vector \(x\) and the dense vector \(y\), where\[\begin{split} G = \begin{pmatrix} c & s \\ -s & c \end{pmatrix} \end{split}\]for(i = 0; i < nnz; ++i) { x_tmp = x_val[i]; y_tmp = y[x_ind[i]]; x_val[i] = c * x_tmp + s * y_tmp; y[x_ind[i]] = c * y_tmp - s * x_tmp; }
- Uniform Precisions:
X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Example
// Number of non-zeros of the sparse vector int nnz = 3; // Size of sparse and dense vector int size = 9; // Sparse index vector std::vector<int> hx_ind = {0, 3, 5}; // Sparse value vector std::vector<float> hx_val = {1.0f, 2.0f, 3.0f}; // Dense vector std::vector<float> hy = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f}; // Scalar c float c = 3.7f; // Scalar s float s = 1.2f; // Offload data to device int* dx_ind; float* dx_val; float* dy; hipMalloc((void**)&dx_ind, sizeof(int) * nnz); hipMalloc((void**)&dx_val, sizeof(float) * nnz); hipMalloc((void**)&dy, sizeof(float) * size); hipMemcpy(dx_ind, hx_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice); hipMemcpy(dx_val, hx_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice); hipMemcpy(dy, hy.data(), sizeof(float) * size, hipMemcpyHostToDevice); rocsparse_handle handle; rocsparse_spvec_descr vecX; rocsparse_dnvec_descr vecY; rocsparse_indextype idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; rocsparse_create_handle(&handle); // Create sparse vector X rocsparse_create_spvec_descr(&vecX, size, nnz, dx_ind, dx_val, idx_type, idx_base, data_type); // Create dense vector Y rocsparse_create_dnvec_descr(&vecY, size, dy, data_type); // Call rot rocsparse_rot(handle, (void*)&c, (void*)&s, vecX, vecY); rocsparse_spvec_get_values(vecX, (void**)&dx_val); rocsparse_dnvec_get_values(vecY, (void**)&dy); // Copy result back to host hipMemcpy(hx_val.data(), dx_val, sizeof(float) * nnz, hipMemcpyDeviceToHost); hipMemcpy(hy.data(), dy, sizeof(float) * size, hipMemcpyDeviceToHost); std::cout << "x" << std::endl; for(size_t i = 0; i < hx_val.size(); ++i) { std::cout << hx_val[i] << " "; } std::cout << std::endl; std::cout << "y" << std::endl; for(size_t i = 0; i < hy.size(); ++i) { std::cout << hy[i] << " "; } std::cout << std::endl; // Clear rocSPARSE rocsparse_destroy_spvec_descr(vecX); rocsparse_destroy_dnvec_descr(vecY); rocsparse_destroy_handle(handle); // Clear device memory hipFree(dx_ind); hipFree(dx_val); hipFree(dy);
Note
This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.
Note
This routine supports execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
c – [in] pointer to the cosine element of \(G\), can be on host or device.
s – [in] pointer to the sine element of \(G\), can be on host or device.
x – [inout] sparse vector \(x\).
y – [inout] dense vector \(y\).
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
c
,s
,x
ory
pointer is invalid.
rocsparse_spvv()#
-
rocsparse_status rocsparse_spvv(rocsparse_handle handle, rocsparse_operation trans, rocsparse_const_spvec_descr x, rocsparse_const_dnvec_descr y, void *result, rocsparse_datatype compute_type, size_t *buffer_size, void *temp_buffer)#
Sparse vector inner dot product.
rocsparse_spvv
computes the inner dot product of the sparse vecotr \(x\) with the dense vector \(y\), such that\[ \text{result} := x^{'} \cdot y, \]with\[\begin{split} op(x) = \left\{ \begin{array}{ll} x, & \text{if trans == rocsparse_operation_none} \\ \bar{x}, & \text{if trans == rocsparse_operation_conjugate_transpose} \\ \end{array} \right. \end{split}\]result = 0; for(i = 0; i < nnz; ++i) { result += x_val[i] * y[x_ind[i]]; }
- Uniform Precisions:
X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Mixed precisions:
X / Y
compute_type / result
rocsparse_datatype_i8_r
rocsparse_datatype_i32_r
rocsparse_datatype_i8_r
rocsparse_datatype_f32_r
- Example
// Number of non-zeros of the sparse vector int nnz = 3; // Size of sparse and dense vector int size = 9; // Sparse index vector std::vector<int> hx_ind = {0, 3, 5}; // Sparse value vector std::vector<float> hx_val = {1.0f, 2.0f, 3.0f}; // Dense vector std::vector<float> hy = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f}; // Offload data to device int* dx_ind; float* dx_val; float* dy; hipMalloc((void**)&dx_ind, sizeof(int) * nnz); hipMalloc((void**)&dx_val, sizeof(float) * nnz); hipMalloc((void**)&dy, sizeof(float) * size); hipMemcpy(dx_ind, hx_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice); hipMemcpy(dx_val, hx_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice); hipMemcpy(dy, hy.data(), sizeof(float) * size, hipMemcpyHostToDevice); rocsparse_handle handle; rocsparse_spvec_descr vecX; rocsparse_dnvec_descr vecY; rocsparse_indextype idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_datatype compute_type = rocsparse_datatype_f32_r; rocsparse_operation trans = rocsparse_operation_none; rocsparse_index_base idx_base = rocsparse_index_base_zero; rocsparse_create_handle(&handle); // Create sparse vector X rocsparse_create_spvec_descr(&vecX, size, nnz, dx_ind, dx_val, idx_type, idx_base, data_type); // Create dense vector Y rocsparse_create_dnvec_descr(&vecY, size, dy, data_type); // Obtain buffer size float hresult = 0.0f; size_t buffer_size; rocsparse_spvv(handle, trans, vecX, vecY, &hresult, compute_type, &buffer_size, nullptr); void* temp_buffer; hipMalloc(&temp_buffer, buffer_size); // SpVV rocsparse_spvv(handle, trans, vecX, vecY, &hresult, compute_type, &buffer_size, temp_buffer); hipDeviceSynchronize(); std::cout << "hresult: " << hresult << std::endl; // Clear rocSPARSE rocsparse_destroy_spvec_descr(vecX); rocsparse_destroy_dnvec_descr(vecY); rocsparse_destroy_handle(handle); // Clear device memory hipFree(dx_ind); hipFree(dx_val); hipFree(dy); hipFree(temp_buffer);
Note
This function writes the required allocation size (in bytes) to
buffer_size
and returns without performing the SpVV operation, when a nullptr is passed fortemp_buffer
.Note
This function is blocking with respect to the host.
Note
This routine does not support execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
trans – [in] sparse vector operation type.
x – [in] sparse vector descriptor.
y – [in] dense vector descriptor.
result – [out] pointer to the result, can be host or device memory
compute_type – [in] floating point precision for the SpVV computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when
temp_buffer
is nullptr.temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to
buffer_size
and function returns without performing the SpVV operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
x
,y
,result
orbuffer_size
pointer is invalid.rocsparse_status_not_implemented –
compute_type
is currently not supported.
rocsparse_spmv()#
-
rocsparse_status rocsparse_spmv(rocsparse_handle handle, rocsparse_operation trans, const void *alpha, rocsparse_const_spmat_descr mat, rocsparse_const_dnvec_descr x, const void *beta, const rocsparse_dnvec_descr y, rocsparse_datatype compute_type, rocsparse_spmv_alg alg, rocsparse_spmv_stage stage, size_t *buffer_size, void *temp_buffer)#
Sparse matrix vector multiplication.
rocsparse_spmv
multiplies the scalar \(\alpha\) with a sparse \(m \times n\) matrix and the dense vector \(x\) and adds the result to the dense vector \(y\) that is multiplied by the scalar \(\beta\), such that\[ y := \alpha \cdot op(A) \cdot x + \beta \cdot y, \]with\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans == rocsparse_operation_none} \\ A^T, & \text{if trans == rocsparse_operation_transpose} \\ A^H, & \text{if trans == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]rocsparse_spmv supports multiple different algorithms. These algorithms have different trade offs depending on the sparsity pattern of the matrix, whether or not the results need to be deterministic, and how many times the sparse-vector product will be performed.
Algorithm
Deterministic
Preprocessing
Notes
rocsparse_spmv_alg_csr_stream
Yes
No
Is best suited for matrices with all rows having a similar number of non-zeros. Can out perform adaptive and LRB algirthms in certain sparsity patterns. Will perform very poorly if some rows have few non-zeros and some rows have many non-zeros.
rocsparse_spmv_alg_csr_adaptive
No
Yes
Generally the fastest algorithm across all matrix sparsity patterns. This includes matrices that have some rows with many non-zeros and some rows with few non-zeros. Requires a lengthy preprocessing that needs to be amortized over many subsequent sparse vector products.
rocsparse_spmv_alg_csr_lrb
No
Yes
Like adaptive algorithm, generally performs well accross all matrix sparsity patterns. Generally not as fast as adaptive algorithm, however uses a much faster pre-processing step. Good for when only a few number of sparse vector products will be performed.
Algorithm
Deterministic
Preprocessing
Notes
rocsparse_spmv_alg_coo
Yes
Yes
Generally not as fast as atomic algorithm but is deterministic
rocsparse_spmv_alg_coo_atomic
No
No
Generally the fastest COO algorithm
Algorithm
Deterministic
Preprocessing
Notes
rocsparse_spmv_alg_ell
Yes
No
Algorithm
Deterministic
Preprocessing
Notes
rocsparse_spmv_alg_bsr
Yes
No
rocsparse_spmv supports multiple combinations of data types and compute types. The tables below indicate the currently supported different data types that can be used for for the sparse matrix A and the dense vectors X and Y and the compute type for \(\alpha\) and \(\beta\). The advantage of using different data types is to save on memory bandwidth and storage when a user application allows while performing the actual computation in a higher precision.
- Uniform Precisions:
A / X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Mixed precisions:
A / X
Y
compute_type
rocsparse_datatype_i8_r
rocsparse_datatype_i32_r
rocsparse_datatype_i32_r
rocsparse_datatype_i8_r
rocsparse_datatype_f32_r
rocsparse_datatype_f32_r
- Mixed-regular real precisions
A
X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Mixed-regular Complex precisions
A
X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_r
rocsparse_datatype_f64_c
- Example
// 1 4 0 0 0 0 // A = 0 2 3 0 0 0 // 5 0 0 7 8 0 // 0 0 9 0 6 0 rocsparse_int m = 4; rocsparse_int n = 6; std::vector<int> hcsr_row_ptr = {0, 2, 4, 7, 9}; std::vector<int> hcsr_col_ind = {0, 1, 1, 2, 0, 3, 4, 2, 4}; std::vector<float> hcsr_val = {1, 4, 2, 3, 5, 7, 8, 9, 6}; std::vector<float> hx(n, 1.0f); std::vector<float> hy(m, 0.0f); // Scalar alpha float alpha = 3.7f; // Scalar beta float beta = 0.0f; rocsparse_int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0]; // Offload data to device int* dcsr_row_ptr; int* dcsr_col_ind; float* dcsr_val; float* dx; float* dy; hipMalloc((void**)&dcsr_row_ptr, sizeof(int) * (m + 1)); hipMalloc((void**)&dcsr_col_ind, sizeof(int) * nnz); hipMalloc((void**)&dcsr_val, sizeof(float) * nnz); hipMalloc((void**)&dx, sizeof(float) * n); hipMalloc((void**)&dy, sizeof(float) * m); hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice); hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice); hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice); hipMemcpy(dx, hx.data(), sizeof(float) * n, hipMemcpyHostToDevice); rocsparse_handle handle; rocsparse_spmat_descr matA; rocsparse_dnvec_descr vecX; rocsparse_dnvec_descr vecY; rocsparse_indextype row_idx_type = rocsparse_indextype_i32; rocsparse_indextype col_idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_datatype compute_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; rocsparse_operation trans = rocsparse_operation_none; rocsparse_create_handle(&handle); // Create sparse matrix A rocsparse_create_csr_descr(&matA, m, n, nnz, dcsr_row_ptr, dcsr_col_ind, dcsr_val, row_idx_type, col_idx_type, idx_base, data_type); // Create dense vector X rocsparse_create_dnvec_descr(&vecX, n, dx, data_type); // Create dense vector Y rocsparse_create_dnvec_descr(&vecY, m, dy, data_type); // Call spmv to get buffer size size_t buffer_size; rocsparse_spmv(handle, trans, &alpha, matA, vecX, &beta, vecY, compute_type, rocsparse_spmv_alg_csr_adaptive, rocsparse_spmv_stage_buffer_size, &buffer_size, nullptr); void* temp_buffer; hipMalloc((void**)&temp_buffer, buffer_size); // Call spmv to perform analysis rocsparse_spmv(handle, trans, &alpha, matA, vecX, &beta, vecY, compute_type, rocsparse_spmv_alg_csr_adaptive, rocsparse_spmv_stage_preprocess, &buffer_size, temp_buffer); // Call spmv to perform computation rocsparse_spmv(handle, trans, &alpha, matA, vecX, &beta, vecY, compute_type, rocsparse_spmv_alg_csr_adaptive, rocsparse_spmv_stage_compute, &buffer_size, temp_buffer); // Copy result back to host hipMemcpy(hy.data(), dy, sizeof(float) * m, hipMemcpyDeviceToHost); std::cout << "hy" << std::endl; for(size_t i = 0; i < hy.size(); ++i) { std::cout << hy[i] << " "; } std::cout << std::endl; // Clear rocSPARSE rocsparse_destroy_spmat_descr(matA); rocsparse_destroy_dnvec_descr(vecX); rocsparse_destroy_dnvec_descr(vecY); rocsparse_destroy_handle(handle); // Clear device memory hipFree(dcsr_row_ptr); hipFree(dcsr_col_ind); hipFree(dcsr_val); hipFree(dx); hipFree(dy); hipFree(temp_buffer);
Note
None of the algorithms above are deterministic when \(A\) is transposed.
Note
This function writes the required allocation size (in bytes) to
buffer_size
and returns without performing the SpMV operation, when a nullptr is passed fortemp_buffer
.Note
Only the rocsparse_spmv_stage_buffer_size stage and the rocsparse_spmv_stage_compute stage are non blocking and executed asynchronously with respect to the host. They may return before the actual computation has finished. The rocsparse_spmv_stage_preprocess stage is blocking with respect to the host.
Note
Only the rocsparse_spmv_stage_buffer_size stage and the rocsparse_spmv_stage_compute stage support execution in a hipGraph context. The rocsparse_spmv_stage_preprocess stage does not support hipGraph.
Note
The sparse matrix formats currently supported are: rocsparse_format_bsr, rocsparse_format_coo, rocsparse_format_coo_aos, rocsparse_format_csr, rocsparse_format_csc and rocsparse_format_ell.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
trans – [in] matrix operation type.
alpha – [in] scalar \(\alpha\).
mat – [in] matrix descriptor.
x – [in] vector descriptor.
beta – [in] scalar \(\beta\).
y – [inout] vector descriptor.
compute_type – [in] floating point precision for the SpMV computation.
alg – [in] SpMV algorithm for the SpMV computation.
stage – [in] SpMV stage for the SpMV computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when
temp_buffer
is nullptr.temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to
buffer_size
and function returns without performing the SpMV operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context
handle
was not initialized.rocsparse_status_invalid_pointer –
alpha
,mat
,x
,beta
,y
orbuffer_size
pointer is invalid.rocsparse_status_invalid_value – the value of
trans
,compute_type
,alg
, orstage
is incorrect.rocsparse_status_not_implemented –
compute_type
oralg
is currently not supported.
rocsparse_spmv_ex()#
-
rocsparse_status rocsparse_spmv_ex(rocsparse_handle handle, rocsparse_operation trans, const void *alpha, const rocsparse_spmat_descr mat, const rocsparse_dnvec_descr x, const void *beta, const rocsparse_dnvec_descr y, rocsparse_datatype compute_type, rocsparse_spmv_alg alg, rocsparse_spmv_stage stage, size_t *buffer_size, void *temp_buffer)#
Sparse matrix vector multiplication.
rocsparse_spmv_ex
multiplies the scalar \(\alpha\) with a sparse \(m \times n\) matrix and the dense vector \(x\) and adds the result to the dense vector \(y\) that is multiplied by the scalar \(\beta\), such that\[ y := \alpha \cdot op(A) \cdot x + \beta \cdot y, \]with\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans == rocsparse_operation_none} \\ A^T, & \text{if trans == rocsparse_operation_transpose} \\ A^H, & \text{if trans == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]rocsparse_spmv supports multiple different algorithms. These algorithms have different trade offs depending on the sparsity pattern of the matrix, whether or not the results need to be deterministic, and how many times the sparse-vector product will be performed.
Algorithm
Deterministic
Preprocessing
Notes
rocsparse_spmv_alg_csr_stream
Yes
No
Is best suited for matrices with all rows having a similar number of non-zeros. Can out perform adaptive and LRB algirthms in certain sparsity patterns. Will perform very poorly if some rows have few non-zeros and some rows have many non-zeros.
rocsparse_spmv_alg_csr_adaptive
No
Yes
Generally the fastest algorithm across all matrix sparsity patterns. This includes matrices that have some rows with many non-zeros and some rows with few non-zeros. Requires a lengthy preprocessing that needs to be amortized over many subsequent sparse vector products.
rocsparse_spmv_alg_csr_lrb
No
Yes
Like adaptive algorithm, generally performs well accross all matrix sparsity patterns. Generally not as fast as adaptive algorithm, however uses a much faster pre-processing step. Good for when only a few number of sparse vector products will be performed.
Algorithm
Deterministic
Preprocessing
Notes
rocsparse_spmv_alg_coo
Yes
Yes
Generally not as fast as atomic algorithm but is deterministic
rocsparse_spmv_alg_coo_atomic
No
No
Generally the fastest COO algorithm
Algorithm
Deterministic
Preprocessing
Notes
rocsparse_spmv_alg_ell
Yes
No
Algorithm
Deterministic
Preprocessing
Notes
rocsparse_spmv_alg_bsr
Yes
No
rocsparse_spmv_ex supports multiple combinations of data types and compute types. The tables below indicate the currently supported different data types that can be used for for the sparse matrix A and the dense vectors X and Y and the compute type for \(\alpha\) and \(\beta\). The advantage of using different data types is to save on memory bandwidth and storage when a user application allows while performing the actual computation in a higher precision.
- Uniform Precisions:
A / X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Mixed precisions:
A / X
Y
compute_type
rocsparse_datatype_i8_r
rocsparse_datatype_i32_r
rocsparse_datatype_i32_r
rocsparse_datatype_i8_r
rocsparse_datatype_f32_r
rocsparse_datatype_f32_r
- Mixed-regular real precisions
A
X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Mixed-regular Complex precisions
A
X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_r
rocsparse_datatype_f64_c
Note
None of the algorithms above are deterministic when \(A\) is transposed.
Note
This function writes the required allocation size (in bytes) to
buffer_size
and returns without performing the SpMV operation, when a nullptr is passed fortemp_buffer
.Note
The sparse matrix formats currently supported are: rocsparse_format_bsr, rocsparse_format_coo, rocsparse_format_coo_aos, rocsparse_format_csr, rocsparse_format_csc and rocsparse_format_ell.
Note
SpMV_ex requires three stages to complete. The first stage rocsparse_spmv_stage_buffer_size will return the size of the temporary storage buffer that is required for subsequent calls to rocsparse_spmv_ex. The second stage rocsparse_spmv_stage_preprocess will preprocess data that would be saved in the temporary storage buffer. In the final stage rocsparse_spmv_stage_compute, the actual computation is performed.
Note
Only the rocsparse_spmv_stage_buffer_size stage and the rocsparse_spmv_stage_compute stage are non blocking and executed asynchronously with respect to the host. They may return before the actual computation has finished. The rocsparse_spmv_stage_preprocess stage is blocking with respect to the host.
Note
Only the rocsparse_spmv_stage_buffer_size stage and the rocsparse_spmv_stage_compute stage support execution in a hipGraph context. The rocsparse_spmv_stage_preprocess stage does not support hipGraph.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
trans – [in] matrix operation type.
alpha – [in] scalar \(\alpha\).
mat – [in] matrix descriptor.
x – [in] vector descriptor.
beta – [in] scalar \(\beta\).
y – [inout] vector descriptor.
compute_type – [in] floating point precision for the SpMV computation.
alg – [in] SpMV algorithm for the SpMV computation.
stage – [in] SpMV stage for the SpMV computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when
temp_buffer
is nullptr.temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to
buffer_size
and function returns without performing the SpMV operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context
handle
was not initialized.rocsparse_status_invalid_pointer –
alpha
,mat
,x
,beta
,y
orbuffer_size
pointer is invalid.rocsparse_status_invalid_value – the value of
trans
,compute_type
,alg
orstage
is incorrect.rocsparse_status_not_implemented –
compute_type
oralg
is currently not supported.
rocsparse_spsv()#
-
rocsparse_status rocsparse_spsv(rocsparse_handle handle, rocsparse_operation trans, const void *alpha, rocsparse_const_spmat_descr mat, rocsparse_const_dnvec_descr x, const rocsparse_dnvec_descr y, rocsparse_datatype compute_type, rocsparse_spsv_alg alg, rocsparse_spsv_stage stage, size_t *buffer_size, void *temp_buffer)#
Sparse triangular solve.
rocsparse_spsv
solves a sparse triangular linear system of a sparse \(m \times m\) matrix, defined in CSR or COO storage format, a dense solution vector \(y\) and the right-hand side \(x\) that is multiplied by \(\alpha\), such that\[ op(A) \cdot y = \alpha \cdot x, \]with\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans == rocsparse_operation_none} \\ A^T, & \text{if trans == rocsparse_operation_transpose} \\ A^H, & \text{if trans == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]- Uniform Precisions:
A / X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Example
// 1 0 0 0 // A = 4 2 0 0 // 0 3 7 0 // 0 0 0 1 rocsparse_int m = 4; std::vector<int> hcsr_row_ptr = {0, 1, 3, 5, 6}; std::vector<int> hcsr_col_ind = {0, 0, 1, 1, 2, 3}; std::vector<float> hcsr_val = {1, 4, 2, 3, 7, 1}; std::vector<float> hx(m, 1.0f); std::vector<float> hy(m, 0.0f); // Scalar alpha float alpha = 1.0f; rocsparse_int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0]; // Offload data to device int* dcsr_row_ptr; int* dcsr_col_ind; float* dcsr_val; float* dx; float* dy; hipMalloc((void**)&dcsr_row_ptr, sizeof(int) * (m + 1)); hipMalloc((void**)&dcsr_col_ind, sizeof(int) * nnz); hipMalloc((void**)&dcsr_val, sizeof(float) * nnz); hipMalloc((void**)&dx, sizeof(float) * m); hipMalloc((void**)&dy, sizeof(float) * m); hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice); hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice); hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice); hipMemcpy(dx, hx.data(), sizeof(float) * m, hipMemcpyHostToDevice); rocsparse_handle handle; rocsparse_spmat_descr matA; rocsparse_dnvec_descr vecX; rocsparse_dnvec_descr vecY; rocsparse_indextype row_idx_type = rocsparse_indextype_i32; rocsparse_indextype col_idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_datatype compute_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; rocsparse_operation trans = rocsparse_operation_none; rocsparse_create_handle(&handle); // Create sparse matrix A rocsparse_create_csr_descr(&matA, m, m, nnz, dcsr_row_ptr, dcsr_col_ind, dcsr_val, row_idx_type, col_idx_type, idx_base, data_type); // Create dense vector X rocsparse_create_dnvec_descr(&vecX, m, dx, data_type); // Create dense vector Y rocsparse_create_dnvec_descr(&vecY, m, dy, data_type); // Call spsv to get buffer size size_t buffer_size; rocsparse_spsv(handle, trans, &alpha, matA, vecX, vecY, compute_type, rocsparse_spsv_alg_default, rocsparse_spsv_stage_buffer_size, &buffer_size, nullptr); void* temp_buffer; hipMalloc((void**)&temp_buffer, buffer_size); // Call spsv to perform analysis rocsparse_spsv(handle, trans, &alpha, matA, vecX, vecY, compute_type, rocsparse_spsv_alg_default, rocsparse_spsv_stage_preprocess, &buffer_size, temp_buffer); // Call spsv to perform computation rocsparse_spsv(handle, trans, &alpha, matA, vecX, vecY, compute_type, rocsparse_spsv_alg_default, rocsparse_spsv_stage_compute, &buffer_size, temp_buffer); // Copy result back to host hipMemcpy(hy.data(), dy, sizeof(float) * m, hipMemcpyDeviceToHost); std::cout << "hy" << std::endl; for(size_t i = 0; i < hy.size(); ++i) { std::cout << hy[i] << " "; } std::cout << std::endl; // Clear rocSPARSE rocsparse_destroy_spmat_descr(matA); rocsparse_destroy_dnvec_descr(vecX); rocsparse_destroy_dnvec_descr(vecY); rocsparse_destroy_handle(handle); // Clear device memory hipFree(dcsr_row_ptr); hipFree(dcsr_col_ind); hipFree(dcsr_val); hipFree(dx); hipFree(dy); hipFree(temp_buffer);
Note
SpSV requires three stages to complete. The first stage rocsparse_spsv_stage_buffer_size will return the size of the temporary storage buffer that is required for subsequent calls. The second stage rocsparse_spsv_stage_preprocess will preprocess data that would be saved in the temporary storage buffer. In the final stage rocsparse_spsv_stage_compute, the actual computation is performed.
Note
Only the rocsparse_spsv_stage_buffer_size stage and the rocsparse_spsv_stage_compute stage are non blocking and executed asynchronously with respect to the host. They may return before the actual computation has finished. The rocsparse_spsv_stage_preprocess stage is blocking with respect to the host.
Note
Currently, only
trans
== rocsparse_operation_none andtrans
== rocsparse_operation_transpose is supported.Note
Only the rocsparse_spsv_stage_buffer_size stage and the rocsparse_spsv_stage_compute stage support execution in a hipGraph context. The rocsparse_spsv_stage_preprocess stage does not support hipGraph.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
trans – [in] matrix operation type.
alpha – [in] scalar \(\alpha\).
mat – [in] matrix descriptor.
x – [in] vector descriptor.
y – [inout] vector descriptor.
compute_type – [in] floating point precision for the SpSV computation.
alg – [in] SpSV algorithm for the SpSV computation.
stage – [in] SpSV stage for the SpSV computation.
buffer_size – [out] number of bytes of the temporary storage buffer.
temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to
buffer_size
and function returns without performing the SpSV operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
alpha
,mat
,x
,y
orbuffer_size
pointer is invalid.rocsparse_status_not_implemented –
trans
,compute_type
,stage
oralg
is currently not supported.
rocsparse_spsm()#
-
rocsparse_status rocsparse_spsm(rocsparse_handle handle, rocsparse_operation trans_A, rocsparse_operation trans_B, const void *alpha, rocsparse_const_spmat_descr matA, rocsparse_const_dnmat_descr matB, const rocsparse_dnmat_descr matC, rocsparse_datatype compute_type, rocsparse_spsm_alg alg, rocsparse_spsm_stage stage, size_t *buffer_size, void *temp_buffer)#
Sparse triangular system solve.
rocsparse_spsm
solves a sparse triangular linear system of a sparse \(m \times m\) matrix, defined in CSR or COO storage format, a dense solution matrix \(C\) and the right-hand side \(B\) that is multiplied by \(\alpha\), such that\[ op(A) \cdot C = \alpha \cdot op(B), \]with\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans == rocsparse_operation_none} \\ A^T, & \text{if trans == rocsparse_operation_transpose} \\ A^H, & \text{if trans == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]and\[\begin{split} op(B) = \left\{ \begin{array}{ll} B, & \text{if trans_B == rocsparse_operation_none} \\ B^T, & \text{if trans_B == rocsparse_operation_transpose} \\ B^H, & \text{if trans_B == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]- Uniform Precisions:
A / B / C / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Example
// 1 0 0 0 // A = 4 2 0 0 // 0 3 7 0 // 0 0 0 1 rocsparse_int m = 4; rocsparse_int n = 2; std::vector<int> hcsr_row_ptr = {0, 1, 3, 5, 6}; std::vector<int> hcsr_col_ind = {0, 0, 1, 1, 2, 3}; std::vector<float> hcsr_val = {1, 4, 2, 3, 7, 1}; std::vector<float> hB(m * n); std::vector<float> hC(m * n); for(int i = 0; i < n; i++) { for(int j = 0; j < m; j++) { hB[m * i + j] = static_cast<float>(i + 1); } } // Scalar alpha float alpha = 1.0f; rocsparse_int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0]; // Offload data to device int* dcsr_row_ptr; int* dcsr_col_ind; float* dcsr_val; float* dB; float* dC; hipMalloc((void**)&dcsr_row_ptr, sizeof(int) * (m + 1)); hipMalloc((void**)&dcsr_col_ind, sizeof(int) * nnz); hipMalloc((void**)&dcsr_val, sizeof(float) * nnz); hipMalloc((void**)&dB, sizeof(float) * m * n); hipMalloc((void**)&dC, sizeof(float) * m * n); hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice); hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice); hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice); hipMemcpy(dB, hB.data(), sizeof(float) * m * n, hipMemcpyHostToDevice); rocsparse_handle handle; rocsparse_spmat_descr matA; rocsparse_dnmat_descr matB; rocsparse_dnmat_descr matC; rocsparse_indextype row_idx_type = rocsparse_indextype_i32; rocsparse_indextype col_idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_datatype compute_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; rocsparse_operation trans_A = rocsparse_operation_none; rocsparse_operation trans_B = rocsparse_operation_none; rocsparse_create_handle(&handle); // Create sparse matrix A rocsparse_create_csr_descr(&matA, m, m, nnz, dcsr_row_ptr, dcsr_col_ind, dcsr_val, row_idx_type, col_idx_type, idx_base, data_type); // Create dense matrix B rocsparse_create_dnmat_descr(&matB, m, n, m, dB, data_type, rocsparse_order_column); // Create dense matrix C rocsparse_create_dnmat_descr(&matC, m, n, m, dC, data_type, rocsparse_order_column); // Call spsv to get buffer size size_t buffer_size; rocsparse_spsm(handle, trans_A, trans_B, &alpha, matA, matB, matC, compute_type, rocsparse_spsm_alg_default, rocsparse_spsm_stage_buffer_size, &buffer_size, nullptr); void* temp_buffer; hipMalloc((void**)&temp_buffer, buffer_size); // Call spsv to perform analysis rocsparse_spsm(handle, trans_A, trans_B, &alpha, matA, matB, matC, compute_type, rocsparse_spsm_alg_default, rocsparse_spsm_stage_preprocess, &buffer_size, temp_buffer); // Call spsv to perform computation rocsparse_spsm(handle, trans_A, trans_B, &alpha, matA, matB, matC, compute_type, rocsparse_spsm_alg_default, rocsparse_spsm_stage_compute, &buffer_size, temp_buffer); // Copy result back to host hipMemcpy(hC.data(), dC, sizeof(float) * m * n, hipMemcpyDeviceToHost); std::cout << "hC" << std::endl; for(size_t i = 0; i < hC.size(); ++i) { std::cout << hC[i] << " "; } std::cout << std::endl; // Clear rocSPARSE rocsparse_destroy_spmat_descr(matA); rocsparse_destroy_dnmat_descr(matB); rocsparse_destroy_dnmat_descr(matC); rocsparse_destroy_handle(handle); // Clear device memory hipFree(dcsr_row_ptr); hipFree(dcsr_col_ind); hipFree(dcsr_val); hipFree(dB); hipFree(dC); hipFree(temp_buffer);
Note
SpSM requires three stages to complete. The first stage rocsparse_spsm_stage_buffer_size will return the size of the temporary storage buffer that is required for subsequent calls. The second stage rocsparse_spsm_stage_preprocess will preprocess data that would be saved in the temporary storage buffer. In the final stage rocsparse_spsm_stage_compute, the actual computation is performed.
Note
Only the rocsparse_spsm_stage_buffer_size stage and the rocsparse_spsm_stage_compute stage are non blocking and executed asynchronously with respect to the host. They may return before the actual computation has finished. The rocsparse_spsm_stage_preprocess stage is blocking with respect to the host.
Note
Currently, only
trans_A
== rocsparse_operation_none andtrans_A
== rocsparse_operation_transpose is supported. Currently, onlytrans_B
== rocsparse_operation_none andtrans_B
== rocsparse_operation_transpose is supported.Note
Only the rocsparse_spsm_stage_buffer_size stage and the rocsparse_spsm_stage_compute stage support execution in a hipGraph context. The rocsparse_spsm_stage_preprocess stage does not support hipGraph.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
trans_A – [in] matrix operation type for the sparse matrix A.
trans_B – [in] matrix operation type for the dense matrix B.
alpha – [in] scalar \(\alpha\).
matA – [in] sparse matrix descriptor.
matB – [in] dense matrix descriptor.
matC – [inout] dense matrix descriptor.
compute_type – [in] floating point precision for the SpSM computation.
alg – [in] SpSM algorithm for the SpSM computation.
stage – [in] SpSM stage for the SpSM computation.
buffer_size – [out] number of bytes of the temporary storage buffer.
temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to
buffer_size
and function returns without performing the SpSM operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
alpha
,matA
,matB
,matC
,descr
orbuffer_size
pointer is invalid.rocsparse_status_not_implemented –
trans_A
,trans_B
,compute_type
,stage
oralg
is currently not supported.
rocsparse_spmm()#
-
rocsparse_status rocsparse_spmm(rocsparse_handle handle, rocsparse_operation trans_A, rocsparse_operation trans_B, const void *alpha, rocsparse_const_spmat_descr mat_A, rocsparse_const_dnmat_descr mat_B, const void *beta, const rocsparse_dnmat_descr mat_C, rocsparse_datatype compute_type, rocsparse_spmm_alg alg, rocsparse_spmm_stage stage, size_t *buffer_size, void *temp_buffer)#
Sparse matrix dense matrix multiplication, extension routine.
rocsparse_spmm
multiplies the scalar \(\alpha\) with a sparse \(m \times k\) matrix \(A\), defined in CSR or COO or Blocked ELL storage format, and the dense \(k \times n\) matrix \(B\) and adds the result to the dense \(m \times n\) matrix \(C\) that is multiplied by the scalar \(\beta\), such that\[ C := \alpha \cdot op(A) \cdot op(B) + \beta \cdot C, \]with\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans_A == rocsparse_operation_none} \\ A^T, & \text{if trans_A == rocsparse_operation_transpose} \\ A^H, & \text{if trans_A == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]and\[\begin{split} op(B) = \left\{ \begin{array}{ll} B, & \text{if trans_B == rocsparse_operation_none} \\ B^T, & \text{if trans_B == rocsparse_operation_transpose} \\ B^H, & \text{if trans_B == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]rocsparse_spmm supports multiple different algorithms. These algorithms have different trade offs depending on the sparsity pattern of the matrix, whether or not the results need to be deterministic, and how many times the sparse-matrix product will be performed.
Algorithm
Deterministic
Preprocessing
Notes
rocsparse_spmm_alg_csr
Yes
No
Default algorithm.
rocsparse_spmm_alg_csr_row_split
Yes
No
Assigns a fixed number of threads per row regardless of the number of non-zeros in each row. This can perform well when each row in the matrix has roughly the same number of non-zeros
rocsparse_spmm_alg_csr_nnz_split
No
Yes
Distributes work by having each thread block work on a fixed number of non-zeros regardless of the number of rows that might be involved. This can perform well when the matrix has some rows with few non-zeros and some rows with many non-zeros
rocsparse_spmm_alg_csr_merge_path
No
Yes
Attempts to combine the approaches of row-split and non-zero split by having each block work on a fixed amount of work which can be either non-zeros or rows
Algorithm
Deterministic
Preprocessing
Notes
rocsparse_spmm_alg_coo_segmented
Yes
No
Generally not as fast as atomic algorithm but is deterministic
rocsparse_spmm_alg_coo_atomic
No
No
Generally the fastest COO algorithm. Is the default algorithm
rocsparse_spmm_alg_coo_segmented_atomic
No
No
Algorithm
Deterministic
Preprocessing
Notes
rocsparse_spmm_alg_bell
Yes
No
Algorithm
Deterministic
Preprocessing
Notes
rocsparse_spmm_alg_bsr
Yes
No
SpMM also supports batched computation for CSR and COO matrices. There are three supported batch modes:
\[\begin{split} C_i = A \times B_i \\ C_i = A_i \times B \\ C_i = A_i \times B_i \end{split}\]- Uniform Precisions:
A / B / C / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Mixed precisions:
A / B
C
compute_type
rocsparse_datatype_i8_r
rocsparse_datatype_i32_r
rocsparse_datatype_i32_r
rocsparse_datatype_i8_r
rocsparse_datatype_f32_r
rocsparse_datatype_f32_r
The batch mode is determined by the batch count and stride passed for each matrix. For example to use the first batch mode ( \(C_i = A \times B_i\)) with 100 batches for non-transposed \(A\), \(B\), and \(C\), one passes:
\[\begin{split} batch\_count\_A=1 \\ batch\_count\_B=100 \\ batch\_count\_C=100 \\ offsets\_batch\_stride\_A=0 \\ columns\_values\_batch\_stride\_A=0 \\ batch\_stride\_B=k*n \\ batch\_stride\_C=m*n \end{split}\]To use the second batch mode ( \(C_i = A_i \times B\)) one could use:\[\begin{split} batch\_count\_A=100 \\ batch\_count\_B=1 \\ batch\_count\_C=100 \\ offsets\_batch\_stride\_A=m+1 \\ columns\_values\_batch\_stride\_A=nnz \\ batch\_stride\_B=0 \\ batch\_stride\_C=m*n \end{split}\]And to use the third batch mode ( \(C_i = A_i \times B_i\)) one could use:\[\begin{split} batch\_count\_A=100 \\ batch\_count\_B=100 \\ batch\_count\_C=100 \\ offsets\_batch\_stride\_A=m+1 \\ columns\_values\_batch\_stride_A=nnz \\ batch\_stride_B=k*n \\ batch\_stride_C=m*n \end{split}\]See examples below.- Example
This example performs sparse matrix-dense matrix multiplication, \(C := \alpha \cdot A \cdot B + \beta \cdot C\)
// 1 4 0 0 0 0 // A = 0 2 3 0 0 0 // 5 0 0 7 8 0 // 0 0 9 0 6 0 // 1 4 2 // 1 2 3 // B = 5 4 0 // 3 1 9 // 1 2 2 // 0 3 0 // 1 1 5 // C = 1 2 1 // 1 3 1 // 6 2 4 rocsparse_int m = 4; rocsparse_int k = 6; rocsparse_int n = 3; csr_row_ptr[m + 1] = {0, 1, 3}; // device memory csr_col_ind[nnz] = {0, 0, 1}; // device memory csr_val[nnz] = {1, 0, 4, 2, 0, 3, 5, 0, 0, 0, 0, 9, 7, 0, 8, 6, 0, 0}; // device memory B[k * n] = {1, 1, 5, 3, 1, 0, 4, 2, 4, 1, 2, 3, 2, 3, 0, 9, 2, 0}; // device memory C[m * n] = {1, 1, 1, 6, 1, 2, 3, 2, 5, 1, 1, 4}; // device memory rocsparse_int nnz = csr_row_ptr[m] - csr_row_ptr[0]; float alpha = 1.0f; float beta = 0.0f; // Create CSR arrays on device rocsparse_int* csr_row_ptr; rocsparse_int* csr_col_ind; float* csr_val; float* B; float* C; hipMalloc((void**)&csr_row_ptr, sizeof(rocsparse_int) * (m + 1)); hipMalloc((void**)&csr_col_ind, sizeof(rocsparse_int) * nnz); hipMalloc((void**)&csr_val, sizeof(float) * nnz); hipMalloc((void**)&B, sizeof(float) * k * n); hipMalloc((void**)&C, sizeof(float) * m * n); // Create rocsparse handle rocsparse_local_handle handle; // Types rocsparse_indextype itype = rocsparse_indextype_i32; rocsparse_indextype jtype = rocsparse_indextype_i32; rocsparse_datatype ttype = rocsparse_datatype_f32_r; // Create descriptors rocsparse_spmat_descr mat_A; rocsparse_dnmat_descr mat_B; rocsparse_dnmat_descr mat_C; rocsparse_create_csr_descr(&mat_A, m, k, nnz, csr_row_ptr, csr_col_ind, csr_val, itype, jtype, rocsparse_index_base_zero, ttype); rocsparse_create_dnmat_descr(&mat_B, k, n, k, B, ttype, rocsparse_order_column); rocsparse_create_dnmat_descr(&mat_C, m, n, m, C, ttype, rocsparse_order_column); // Query SpMM buffer size_t buffer_size; rocsparse_spmm(handle, rocsparse_operation_none, rocsparse_operation_none, &alpha, mat_A, mat_B, &beta, mat_C, ttype, rocsparse_spmm_alg_default, rocsparse_spmm_stage_buffer_size, &buffer_size, nullptr)); // Allocate buffer void* buffer; hipMalloc(&buffer, buffer_size); rocsparse_spmm(handle, rocsparse_operation_none, rocsparse_operation_none, &alpha, mat_A, mat_B, &beta, mat_C, ttype, rocsparse_spmm_alg_default, rocsparse_spmm_stage_preprocess, &buffer_size, buffer)); // Pointer mode host rocsparse_spmm(handle, rocsparse_operation_none, rocsparse_operation_none, &alpha, mat_A, mat_B, &beta, mat_C, ttype, rocsparse_spmm_alg_default, rocsparse_spmm_stage_compute, &buffer_size, buffer)); // Clear up on device hipFree(csr_row_ptr); hipFree(csr_col_ind); hipFree(csr_val); hipFree(B); hipFree(C); hipFree(temp_buffer); rocsparse_destroy_spmat_descr(mat_A); rocsparse_destroy_dnmat_descr(mat_B); rocsparse_destroy_dnmat_descr(mat_C);
- Example
An example of the first batch mode ( \(C_i = A \times B_i\)) is provided below.
// 1 4 0 0 0 0 // A = 0 2 3 0 0 0 // 5 0 0 7 8 0 // 0 0 9 0 6 0 rocsparse_int m = 4; rocsparse_int k = 6; rocsparse_int n = 3; csr_row_ptr[m + 1] = {0, 1, 3}; // device memory csr_col_ind[nnz] = {0, 0, 1}; // device memory csr_val[nnz] = {1, 0, 4, 2, 0, 3, 5, 0, 0, 0, 0, 9, 7, 0, 8, 6, 0, 0}; // device memory B[batch_count_B * k * n] = {...}; // device memory C[batch_count_C * m * n] = {...}; // device memory rocsparse_int nnz = csr_row_ptr[m] - csr_row_ptr[0]; rocsparse_int batch_count_A = 1; rocsparse_int batch_count_B = 100; rocsparse_int batch_count_C = 100; rocsparse_int offsets_batch_stride_A = 0; rocsparse_int columns_values_batch_stride_A = 0; rocsparse_int batch_stride_B = k * n; rocsparse_int batch_stride_C = m * n; float alpha = 1.0f; float beta = 0.0f; // Create CSR arrays on device rocsparse_int* csr_row_ptr; rocsparse_int* csr_col_ind; float* csr_val; float* B; float* C; hipMalloc((void**)&csr_row_ptr, sizeof(rocsparse_int) * (m + 1)); hipMalloc((void**)&csr_col_ind, sizeof(rocsparse_int) * nnz); hipMalloc((void**)&csr_val, sizeof(float) * nnz); hipMalloc((void**)&B, sizeof(float) * batch_count_B * k * n); hipMalloc((void**)&C, sizeof(float) * batch_count_C * m * n); // Create rocsparse handle rocsparse_local_handle handle; // Types rocsparse_indextype itype = rocsparse_indextype_i32; rocsparse_indextype jtype = rocsparse_indextype_i32; rocsparse_datatype ttype = rocsparse_datatype_f32_r; // Create descriptors rocsparse_spmat_descr mat_A; rocsparse_dnmat_descr mat_B; rocsparse_dnmat_descr mat_C; rocsparse_create_csr_descr(&mat_A, m, k, nnz, csr_row_ptr, csr_col_ind, csr_val, itype, jtype, rocsparse_index_base_zero, ttype); rocsparse_create_dnmat_descr(&mat_B, k, n, k, B, ttype, rocsparse_order_column); rocsparse_create_dnmat_descr(&mat_C, m, n, m, C, ttype, rocsparse_order_column); rocsparse_csr_set_strided_batch(mat_A, batch_count_A, offsets_batch_stride_A, columns_values_batch_stride_A); rocsparse_dnmat_set_strided_batch(B, batch_count_B, batch_stride_B); rocsparse_dnmat_set_strided_batch(C, batch_count_C, batch_stride_C); // Query SpMM buffer size_t buffer_size; rocsparse_spmm(handle, rocsparse_operation_none, rocsparse_operation_none, &alpha, mat_A, mat_B, &beta, mat_C, ttype, rocsparse_spmm_alg_default, rocsparse_spmm_stage_buffer_size, &buffer_size, nullptr)); // Allocate buffer void* buffer; hipMalloc(&buffer, buffer_size); rocsparse_spmm(handle, rocsparse_operation_none, rocsparse_operation_none, &alpha, mat_A, mat_B, &beta, mat_C, ttype, rocsparse_spmm_alg_default, rocsparse_spmm_stage_preprocess, &buffer_size, buffer)); // Pointer mode host rocsparse_spmm(handle, rocsparse_operation_none, rocsparse_operation_none, &alpha, mat_A, mat_B, &beta, mat_C, ttype, rocsparse_spmm_alg_default, rocsparse_spmm_stage_compute, &buffer_size, buffer)); // Clear up on device hipFree(csr_row_ptr); hipFree(csr_col_ind); hipFree(csr_val); hipFree(B); hipFree(C); hipFree(temp_buffer); rocsparse_destroy_spmat_descr(mat_A); rocsparse_destroy_dnmat_descr(mat_B); rocsparse_destroy_dnmat_descr(mat_C);
Note
None of the algorithms above are deterministic when \(A\) is transposed.
Note
All algorithms perform best when using row ordering for the dense \(B\) and \(C\) matrices
Note
Mixed precisions only supported for CSR, CSC, and COO matrix formats.
Note
Only the rocsparse_spmm_stage_buffer_size stage and the rocsparse_spmm_stage_compute stage are non blocking and executed asynchronously with respect to the host. They may return before the actual computation has finished. The rocsparse_spmm_stage_preprocess stage is blocking with respect to the host.
Note
Currently, only
trans_A
== rocsparse_operation_none is supported for COO and Blocked ELL formats.Note
Only the rocsparse_spmm_stage_buffer_size stage and the rocsparse_spmm_stage_compute stage support execution in a hipGraph context. The rocsparse_spmm_stage_preprocess stage does not support hipGraph.
Note
Currently, only CSR, COO, BSR and Blocked ELL sparse formats are supported.
Note
Different algorithms are available which can provide better performance for different matrices. Currently, the available algorithms are rocsparse_spmm_alg_csr, rocsparse_spmm_alg_csr_row_split, rocsparse_spmm_alg_csr_nnz_split (also named rocsparse_spmm_alg_csr_merge) or rocsparse_spmm_alg_csr_merge_path for CSR matrices, rocsparse_spmm_alg_bell for Blocked ELL matrices and rocsparse_spmm_alg_coo_segmented or rocsparse_spmm_alg_coo_atomic for COO matrices. Additionally, one can specify the algorithm to be rocsparse_spmm_alg_default. In the case of CSR matrices this will set the algorithm to be rocsparse_spmm_alg_csr, in the case of Blocked ELL matrices this will set the algorithm to be rocsparse_spmm_alg_bell and for COO matrices it will set the algorithm to be rocsparse_spmm_alg_coo_atomic. When A is transposed, rocsparse_spmm will revert to using rocsparse_spmm_alg_csr for CSR format and rocsparse_spmm_alg_coo_atomic for COO format regardless of algorithm selected.
Note
This function writes the required allocation size (in bytes) to
buffer_size
and returns without performing the SpMM operation, when a nullptr is passed fortemp_buffer
.Note
SpMM requires three stages to complete. The first stage rocsparse_spmm_stage_buffer_size will return the size of the temporary storage buffer that is required for subsequent calls to rocsparse_spmm. The second stage rocsparse_spmm_stage_preprocess will preprocess data that would be saved in the temporary storage buffer. In the final stage rocsparse_spmm_stage_compute, the actual computation is performed.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
trans_A – [in] matrix operation type.
trans_B – [in] matrix operation type.
alpha – [in] scalar \(\alpha\).
mat_A – [in] matrix descriptor.
mat_B – [in] matrix descriptor.
beta – [in] scalar \(\beta\).
mat_C – [in] matrix descriptor.
compute_type – [in] floating point precision for the SpMM computation.
alg – [in] SpMM algorithm for the SpMM computation.
stage – [in] SpMM stage for the SpMM computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when
temp_buffer
is nullptr.temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to
buffer_size
and function returns without performing the SpMM operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
alpha
,mat_A
,mat_B
,mat_C
,beta
, orbuffer_size
pointer is invalid.rocsparse_status_not_implemented –
trans_A
,trans_B
,compute_type
oralg
is currently not supported.
rocsparse_spgemm()#
-
rocsparse_status rocsparse_spgemm(rocsparse_handle handle, rocsparse_operation trans_A, rocsparse_operation trans_B, const void *alpha, rocsparse_const_spmat_descr A, rocsparse_const_spmat_descr B, const void *beta, rocsparse_const_spmat_descr D, rocsparse_spmat_descr C, rocsparse_datatype compute_type, rocsparse_spgemm_alg alg, rocsparse_spgemm_stage stage, size_t *buffer_size, void *temp_buffer)#
Sparse matrix sparse matrix multiplication.
rocsparse_spgemm
multiplies the scalar \(\alpha\) with the sparse \(m \times k\) matrix \(A\) and the sparse \(k \times n\) matrix \(B\) and adds the result to the sparse \(m \times n\) matrix \(D\) that is multiplied by \(\beta\). The final result is stored in the sparse \(m \times n\) matrix \(C\), such that\[ C := \alpha \cdot op(A) \cdot op(B) + \beta \cdot D, \]with\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans_A == rocsparse_operation_none} \\ A^T, & \text{if trans_A == rocsparse_operation_transpose} \\ A^H, & \text{if trans_A == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]and\[\begin{split} op(B) = \left\{ \begin{array}{ll} B, & \text{if trans_B == rocsparse_operation_none} \\ B^T, & \text{if trans_B == rocsparse_operation_transpose} \\ B^H, & \text{if trans_B == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]- Uniform Precisions:
A / B / C / D / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
- Example
// A - m x k // B - k x n // C - m x n int m = 400; int n = 400; int k = 300; std::vector<int> hcsr_row_ptr_A = {...}; // host A m x k matrix std::vector<int> hcsr_col_ind_A = {...}; // host A m x k matrix std::vector<float> hcsr_val_A = {...}; // host A m x k matrix std::vector<int> hcsr_row_ptr_B = {...}; // host B k x n matrix std::vector<int> hcsr_col_ind_B = {...}; // host B k x n matrix std::vector<float> hcsr_val_B = {...}; // host B k x n matrix int nnz_A = hcsr_val_A.size(); int nnz_B = hcsr_val_B.size(); float alpha = 1.0f; float beta = 0.0f; int* dcsr_row_ptr_A = nullptr; int* dcsr_col_ind_A = nullptr; float* dcsr_val_A = nullptr; int* dcsr_row_ptr_B = nullptr; int* dcsr_col_ind_B = nullptr; float* dcsr_val_B = nullptr; int* dcsr_row_ptr_C = nullptr; hipMalloc((void**)&dcsr_row_ptr_A, (m + 1) * sizeof(int)); hipMalloc((void**)&dcsr_col_ind_A, nnz_A * sizeof(int)); hipMalloc((void**)&dcsr_val_A, nnz_A * sizeof(float)); hipMalloc((void**)&dcsr_row_ptr_B, (k + 1) * sizeof(int)); hipMalloc((void**)&dcsr_col_ind_B, nnz_B * sizeof(int)); hipMalloc((void**)&dcsr_val_B, nnz_B * sizeof(float)); hipMalloc((void**)&dcsr_row_ptr_C, (m + 1) * sizeof(int)); hipMemcpy(dcsr_row_ptr_A, hcsr_row_ptr_A.data(), (m + 1) * sizeof(int), hipMemcpyHostToDevice); hipMemcpy(dcsr_col_ind_A, hcsr_col_ind_A.data(), nnz_A * sizeof(int), hipMemcpyHostToDevice); hipMemcpy(dcsr_val_A, hcsr_val_A.data(), nnz_A * sizeof(float), hipMemcpyHostToDevice); hipMemcpy(dcsr_row_ptr_B, hcsr_row_ptr_B.data(), (k + 1) * sizeof(int), hipMemcpyHostToDevice); hipMemcpy(dcsr_col_ind_B, hcsr_col_ind_B.data(), nnz_B * sizeof(int), hipMemcpyHostToDevice); hipMemcpy(dcsr_val_B, hcsr_val_B.data(), nnz_B * sizeof(float), hipMemcpyHostToDevice); rocsparse_handle handle; rocsparse_spmat_descr matA, matB, matC, matD; void* temp_buffer = NULL; size_t buffer_size = 0; rocsparse_operation trans_A = rocsparse_operation_none; rocsparse_operation trans_B = rocsparse_operation_none; rocsparse_index_base index_base = rocsparse_index_base_zero; rocsparse_indextype itype = rocsparse_indextype_i32; rocsparse_indextype jtype = rocsparse_indextype_i32; rocsparse_datatype ttype = rocsparse_datatype_f32_r; rocsparse_create_handle(&handle); // Create sparse matrix A in CSR format rocsparse_create_csr_descr(&matA, m, k, nnz_A, dcsr_row_ptr_A, dcsr_col_ind_A, dcsr_val_A, itype, jtype, index_base, ttype); // Create sparse matrix B in CSR format rocsparse_create_csr_descr(&matB, k, n, nnz_B, dcsr_row_ptr_B, dcsr_col_ind_B, dcsr_val_B, itype, jtype, index_base, ttype); // Create sparse matrix C in CSR format rocsparse_create_csr_descr(&matC, m, n, 0, dcsr_row_ptr_C, nullptr, nullptr, itype, jtype, index_base, ttype); // Create sparse matrix D in CSR format rocsparse_create_csr_descr(&matD, 0, 0, 0, nullptr, nullptr, nullptr, itype, jtype, index_base, ttype); // Determine buffer size rocsparse_spgemm(handle, trans_A, trans_B, &alpha, matA, matB, &beta, matD, matC, ttype, rocsparse_spgemm_alg_default, rocsparse_spgemm_stage_buffer_size, &buffer_size, nullptr); hipMalloc(&temp_buffer, buffer_size); // Determine number of non-zeros in C matrix rocsparse_spgemm(handle, trans_A, trans_B, &alpha, matA, matB, &beta, matD, matC, ttype, rocsparse_spgemm_alg_default, rocsparse_spgemm_stage_nnz, &buffer_size, temp_buffer); int64_t rows_C; int64_t cols_C; int64_t nnz_C; Extract number of non-zeros in C matrix so we can allocate the column indices and values arrays rocsparse_spmat_get_size(matC, &rows_C, &cols_C, &nnz_C); int* dcsr_col_ind_C; float* dcsr_val_C; hipMalloc((void**)&dcsr_col_ind_C, sizeof(int) * nnz_C); hipMalloc((void**)&dcsr_val_C, sizeof(float) * nnz_C); // Set C matrix pointers rocsparse_csr_set_pointers(matC, dcsr_row_ptr_C, dcsr_col_ind_C, dcsr_val_C); // SpGEMM computation rocsparse_spgemm(handle, trans_A, trans_B, &alpha, matA, matB, &beta, matD, matC, ttype, rocsparse_spgemm_alg_default, rocsparse_spgemm_stage_compute, &buffer_size, temp_buffer); // Copy C matrix result back to host std::vector<int> hcsr_row_ptr_C(m + 1); std::vector<int> hcsr_col_ind_C(nnz_C); std::vector<float> hcsr_val_C(nnz_C); hipMemcpy(hcsr_row_ptr_C.data(), dcsr_row_ptr_C, sizeof(int) * (m + 1), hipMemcpyDeviceToHost); hipMemcpy(hcsr_col_ind_C.data(), dcsr_col_ind_C, sizeof(int) * nnz_C, hipMemcpyDeviceToHost); hipMemcpy(hcsr_val_C.data(), dcsr_val_C, sizeof(float) * nnz_C, hipMemcpyDeviceToHost); // Destroy matrix descriptors rocsparse_destroy_spmat_descr(matA); rocsparse_destroy_spmat_descr(matB); rocsparse_destroy_spmat_descr(matC); rocsparse_destroy_spmat_descr(matD); rocsparse_destroy_handle(handle); // Free device arrays hipFree(temp_buffer); hipFree(dcsr_row_ptr_A); hipFree(dcsr_col_ind_A); hipFree(dcsr_val_A); hipFree(dcsr_row_ptr_B); hipFree(dcsr_col_ind_B); hipFree(dcsr_val_B); hipFree(dcsr_row_ptr_C); hipFree(dcsr_col_ind_C); hipFree(dcsr_val_C);
Note
This function does not produce deterministic results.
Note
SpGEMM requires three stages to complete. The first stage rocsparse_spgemm_stage_buffer_size will return the size of the temporary storage buffer that is required for subsequent calls to rocsparse_spgemm. The second stage rocsparse_spgemm_stage_nnz will determine the number of non-zero elements of the resulting \(C\) matrix. If the sparsity pattern of \(C\) is already known, this stage can be skipped. In the final stage rocsparse_spgemm_stage_compute, the actual computation is performed.
Note
If \(\alpha == 0\), then \(C = \beta \cdot D\) will be computed.
Note
If \(\beta == 0\), then \(C = \alpha \cdot op(A) \cdot op(B)\) will be computed.
Note
Currently only CSR and BSR formats are supported.
Note
If rocsparse_spgemm_stage_symbolic is selected then the symbolic computation is performed only.
Note
If rocsparse_spgemm_stage_numeric is selected then the numeric computation is performed only.
Note
For the rocsparse_spgemm_stage_symbolic and rocsparse_spgemm_stage_numeric stages, only CSR matrix format is currently supported.
Note
\(\alpha == beta == 0\) is invalid.
Note
It is allowed to pass the same sparse matrix for \(C\) and \(D\), if both matrices have the same sparsity pattern.
Note
Currently, only
trans_A
== rocsparse_operation_none is supported.Note
Currently, only
trans_B
== rocsparse_operation_none is supported.Note
This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.
Note
Please note, that for rare matrix products with more than 4096 non-zero entries per row, additional temporary storage buffer is allocated by the algorithm.
Note
This routine does not support execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
trans_A – [in] sparse matrix \(A\) operation type.
trans_B – [in] sparse matrix \(B\) operation type.
alpha – [in] scalar \(\alpha\).
A – [in] sparse matrix \(A\) descriptor.
B – [in] sparse matrix \(B\) descriptor.
beta – [in] scalar \(\beta\).
D – [in] sparse matrix \(D\) descriptor.
C – [out] sparse matrix \(C\) descriptor.
compute_type – [in] floating point precision for the SpGEMM computation.
alg – [in] SpGEMM algorithm for the SpGEMM computation.
stage – [in] SpGEMM stage for the SpGEMM computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when
temp_buffer
is nullptr.temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to
buffer_size
and function returns without performing the SpGEMM operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
alpha
andbeta
are invalid,A
,B
,D
,C
orbuffer_size
pointer is invalid.rocsparse_status_memory_error – additional buffer for long rows could not be allocated.
rocsparse_status_not_implemented –
trans_A
!= rocsparse_operation_none ortrans_B
!= rocsparse_operation_none.
rocsparse_sddmm_buffer_size()#
-
rocsparse_status rocsparse_sddmm_buffer_size(rocsparse_handle handle, rocsparse_operation opA, rocsparse_operation opB, const void *alpha, rocsparse_const_dnmat_descr A, rocsparse_const_dnmat_descr B, const void *beta, rocsparse_spmat_descr C, rocsparse_datatype compute_type, rocsparse_sddmm_alg alg, size_t *buffer_size)#
Calculate the size in bytes of the required buffer for the use of rocsparse_sddmm and rocsparse_sddmm_preprocess.
rocsparse_sddmm_buffer_size
returns the size of the required buffer to execute the SDDMM operation from a given configuration.Note
This routine does not support execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
opA – [in] dense matrix \(A\) operation type.
opB – [in] dense matrix \(B\) operation type.
alpha – [in] scalar \(\alpha\).
A – [in] dense matrix \(A\) descriptor.
B – [in] dense matrix \(B\) descriptor.
beta – [in] scalar \(\beta\).
C – [inout] sparse matrix \(C\) descriptor.
compute_type – [in] floating point precision for the SDDMM computation.
alg – [in] specification of the algorithm to use.
buffer_size – [out] number of bytes of the temporary storage buffer.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_value – the value of
opA
oropB
is incorrect.rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
alpha
andbeta
are invalid,A
,B
,D
,C
orbuffer_size
pointer is invalid.rocsparse_status_not_implemented –
opA
== rocsparse_operation_conjugate_transpose oropB
== rocsparse_operation_conjugate_transpose.
rocsparse_sddmm_preprocess()#
-
rocsparse_status rocsparse_sddmm_preprocess(rocsparse_handle handle, rocsparse_operation opA, rocsparse_operation opB, const void *alpha, rocsparse_const_dnmat_descr A, rocsparse_const_dnmat_descr B, const void *beta, rocsparse_spmat_descr C, rocsparse_datatype compute_type, rocsparse_sddmm_alg alg, void *temp_buffer)#
Preprocess data before the use of rocsparse_sddmm.
rocsparse_sddmm_preprocess
executes a part of the algorithm that can be calculated once in the context of multiple calls of the rocsparse_sddmm with the same sparsity pattern.Note
This routine does not support execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
opA – [in] dense matrix \(A\) operation type.
opB – [in] dense matrix \(B\) operation type.
alpha – [in] scalar \(\alpha\).
A – [in] dense matrix \(A\) descriptor.
B – [in] dense matrix \(B\) descriptor.
beta – [in] scalar \(\beta\).
C – [inout] sparse matrix \(C\) descriptor.
compute_type – [in] floating point precision for the SDDMM computation.
alg – [in] specification of the algorithm to use.
temp_buffer – [in] temporary storage buffer allocated by the user. The size must be greater or equal to the size obtained with rocsparse_sddmm_buffer_size.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_value – the value of
opA
oropB
is incorrect.rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
alpha
andbeta
are invalid,A
,B
,D
,C
ortemp_buffer
pointer is invalid.rocsparse_status_not_implemented –
opA
== rocsparse_operation_conjugate_transpose oropB
== rocsparse_operation_conjugate_transpose.
rocsparse_sddmm()#
-
rocsparse_status rocsparse_sddmm(rocsparse_handle handle, rocsparse_operation opA, rocsparse_operation opB, const void *alpha, rocsparse_const_dnmat_descr A, rocsparse_const_dnmat_descr B, const void *beta, rocsparse_spmat_descr C, rocsparse_datatype compute_type, rocsparse_sddmm_alg alg, void *temp_buffer)#
Sampled Dense-Dense Matrix Multiplication.
rocsparse_sddmm
multiplies the scalar \(\alpha\) with the dense \(m \times k\) matrix \(A\), the dense \(k \times n\) matrix \(B\), filtered by the sparsity pattern of the \(m \times n\) sparse matrix \(C\) and adds the result to \(C\) scaled by \(\beta\). The final result is stored in the sparse \(m \times n\) matrix \(C\), such that\[ C := \alpha ( op(A) \cdot op(B) ) \circ spy(C) + \beta C, \]with\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if op(A) == rocsparse_operation_none} \\ A^T, & \text{if op(A) == rocsparse_operation_transpose} \\ \end{array} \right. \end{split}\],\[\begin{split} op(B) = \left\{ \begin{array}{ll} B, & \text{if op(B) == rocsparse_operation_none} \\ B^T, & \text{if op(B) == rocsparse_operation_transpose} \\ \end{array} \right. \end{split}\]and\[\begin{split} spy(C)_{ij} = \left\{ \begin{array}{ll} 1, & \text{ if C_{ij} != 0} \\ 0, & \text{ otherwise} \\ \end{array} \right. \end{split}\]- Example
This example performs sampled dense-dense matrix product, \(C := \alpha ( A \cdot B ) \circ spy(C) + \beta C\) where \(\circ\) is the hadamard product
// rocSPARSE handle rocsparse_handle handle; rocsparse_create_handle(&handle); float halpha = 1.0f; float hbeta = -1.0f; // A, B, and C are mxk, kxn, and mxn int m = 4; int k = 3; int n = 2; int nnzC = 5; // 2 3 -1 // A = 0 2 1 // 0 0 5 // 0 -2 0.5 // 0 4 // B = 1 0 // -2 0.5 // 1 0 1 0 // C = 2 3 spy(C) = 1 1 // 0 0 0 0 // 4 5 1 1 std::vector<float> hA = {2.0f, 3.0f, -1.0f, 0.0, 2.0f, 1.0f, 0.0f, 0.0f, 5.0f, 0.0f, -2.0f, 0.5f}; std::vector<float> hB = {0.0f, 4.0f, 1.0f, 0.0, -2.0f, 0.5f}; std::vector<int> hcsr_row_ptrC = {0, 1, 3, 3, 5}; std::vector<int> hcsr_col_indC = {0, 0, 1, 0, 1}; std::vector<float> hcsr_valC = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f}; float* dA = nullptr; float* dB = nullptr; hipMalloc((void**)&dA, sizeof(float) * m * k); hipMalloc((void**)&dB, sizeof(float) * k * n); int* dcsr_row_ptrC = nullptr; int* dcsr_col_indC = nullptr; float* dcsr_valC = nullptr; hipMalloc((void**)&dcsr_row_ptrC, sizeof(int) * (m + 1)); hipMalloc((void**)&dcsr_col_indC, sizeof(int) * nnzC); hipMalloc((void**)&dcsr_valC, sizeof(float) * nnzC); hipMemcpy(dA, hA.data(), sizeof(float) * m * k, hipMemcpyHostToDevice); hipMemcpy(dB, hB.data(), sizeof(float) * k * n, hipMemcpyHostToDevice); hipMemcpy(dcsr_row_ptrC, hcsr_row_ptrC.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice); hipMemcpy(dcsr_col_indC, hcsr_col_indC.data(), sizeof(int) * nnzC, hipMemcpyHostToDevice); hipMemcpy(dcsr_valC, hcsr_valC.data(), sizeof(float) * nnzC, hipMemcpyHostToDevice); rocsparse_dnmat_descr matA; rocsparse_create_dnmat_descr(&matA, m, k, k, dA, rocsparse_datatype_f32_r, rocsparse_order_row); rocsparse_dnmat_descr matB; rocsparse_create_dnmat_descr(&matB, k, n, n, dB, rocsparse_datatype_f32_r, rocsparse_order_row); rocsparse_spmat_descr matC; rocsparse_create_csr_descr(&matC, m, n, nnzC, dcsr_row_ptrC, dcsr_col_indC, dcsr_valC, rocsparse_indextype_i32, rocsparse_indextype_i32, rocsparse_index_base_zero, rocsparse_datatype_f32_r); size_t buffer_size = 0; rocsparse_sddmm_buffer_size(handle, rocsparse_operation_none, rocsparse_operation_none, &halpha, matA, matB, &hbeta, matC, rocsparse_datatype_f32_r, rocsparse_sddmm_alg_default, &buffer_size); void* dbuffer = nullptr; hipMalloc((void**) &dbuffer, buffer_size); rocsparse_sddmm_preprocess(handle, rocsparse_operation_none, rocsparse_operation_none, &halpha, matA, matB, &hbeta, matC, rocsparse_datatype_f32_r, rocsparse_sddmm_alg_default, dbuffer); rocsparse_sddmm(handle, rocsparse_operation_none, rocsparse_operation_none, &halpha, matA, matB, &hbeta, matC, rocsparse_datatype_f32_r, rocsparse_sddmm_alg_default, dbuffer); hipMemcpy(hcsr_row_ptrC.data(), dcsr_row_ptrC, sizeof(int) * (m + 1), hipMemcpyDeviceToHost); hipMemcpy(hcsr_col_indC.data(), dcsr_col_indC, sizeof(int) * nnzC, hipMemcpyDeviceToHost); hipMemcpy(hcsr_valC.data(), dcsr_valC, sizeof(float) * nnzC, hipMemcpyDeviceToHost); rocsparse_destroy_dnmat_descr(matA); rocsparse_destroy_dnmat_descr(matB); rocsparse_destroy_spmat_descr(matC); rocsparse_destroy_handle(handle); hipFree(dA); hipFree(dB); hipFree(dcsr_row_ptrC); hipFree(dcsr_col_indC); hipFree(dcsr_valC); hipFree(dbuffer);
Note
opA
== rocsparse_operation_conjugate_transpose is not supported.Note
opB
== rocsparse_operation_conjugate_transpose is not supported.Note
This routine supports execution in a hipGraph context only when
alg
== rocsparse_sddmm_alg_default.Note
Different algorithms are available which can provide better performance for different matrices. Currently, the available algorithms are rocsparse_sddmm_alg_default or rocsparse_sddmm_alg_dense. The algorithm rocsparse_sddmm_alg_default uses the sparsity pattern of matrix C to perform a limited set of dot products. On the other hand, rocsparse_sddmm_alg_dense explicitly converts the matrix C into a dense matrix to perform a dense matrix multiply and add.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
opA – [in] dense matrix \(A\) operation type.
opB – [in] dense matrix \(B\) operation type.
alpha – [in] scalar \(\alpha\).
A – [in] dense matrix \(A\) descriptor.
B – [in] dense matrix \(B\) descriptor.
beta – [in] scalar \(\beta\).
C – [inout] sparse matrix \(C\) descriptor.
compute_type – [in] floating point precision for the SDDMM computation.
alg – [in] specification of the algorithm to use.
temp_buffer – [in] temporary storage buffer allocated by the user. The size must be greater or equal to the size obtained with rocsparse_sddmm_buffer_size.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_value – the value of
opA
,opB
,compute_type
or alg is incorrect.rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
alpha
andbeta
are invalid,A
,B
,D
,C
ortemp_buffer
pointer is invalid.rocsparse_status_not_implemented –
opA
== rocsparse_operation_conjugate_transpose oropB
== rocsparse_operation_conjugate_transpose.
rocsparse_dense_to_sparse()#
-
rocsparse_status rocsparse_dense_to_sparse(rocsparse_handle handle, rocsparse_const_dnmat_descr mat_A, rocsparse_spmat_descr mat_B, rocsparse_dense_to_sparse_alg alg, size_t *buffer_size, void *temp_buffer)#
Dense matrix to sparse matrix conversion.
rocsparse_dense_to_sparse
performs the conversion of a dense matrix to a sparse matrix in CSR, CSC, or COO format.- Example
// 1 4 0 0 0 0 // A = 0 2 3 0 0 0 // 5 0 0 7 8 0 // 0 0 9 0 6 0 rocsparse_int m = 4; rocsparse_int n = 6; std::vector<float> hdense = {1, 0, 5, 0, 4, 2, 0, 0, 0, 3, 0, 9, 0, 0, 7, 0, 0, 0, 8, 6, 0, 0, 0, 0}; // Offload data to device int* dcsr_row_ptr; float* ddense; hipMalloc((void**)&dcsr_row_ptr, sizeof(int) * (m + 1)); hipMalloc((void**)&ddense, sizeof(float) * m * n); hipMemcpy(ddense, hdense.data(), sizeof(float) * m * n, hipMemcpyHostToDevice); rocsparse_handle handle; rocsparse_dnmat_descr matA; rocsparse_spmat_descr matB; rocsparse_indextype row_idx_type = rocsparse_indextype_i32; rocsparse_indextype col_idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; rocsparse_create_handle(&handle); // Create sparse matrix A rocsparse_create_dnmat_descr(&matA, m, n, m, ddense, data_type, rocsparse_order_column); // Create dense matrix B rocsparse_create_csr_descr(&matB, m, n, 0, dcsr_row_ptr, nullptr, nullptr, row_idx_type, col_idx_type, idx_base, data_type); // Call dense_to_sparse to get required buffer size size_t buffer_size = 0; rocsparse_dense_to_sparse(handle, matA, matB, rocsparse_dense_to_sparse_alg_default, &buffer_size, nullptr); void* temp_buffer; hipMalloc((void**)&temp_buffer, buffer_size); // Call dense_to_sparse to perform analysis rocsparse_dense_to_sparse(handle, matA, matB, rocsparse_dense_to_sparse_alg_default, nullptr, temp_buffer); int64_t num_rows_tmp, num_cols_tmp, nnz; rocsparse_spmat_get_size(matB, &num_rows_tmp, &num_cols_tmp, &nnz); int* dcsr_col_ind; float* dcsr_val; hipMalloc((void**)&dcsr_col_ind, sizeof(int) * nnz); hipMalloc((void**)&dcsr_val, sizeof(float) * nnz); rocsparse_csr_set_pointers(matB, dcsr_row_ptr, dcsr_col_ind, dcsr_val); // Call dense_to_sparse to complete conversion rocsparse_dense_to_sparse(handle, matA, matB, rocsparse_dense_to_sparse_alg_default, &buffer_size, temp_buffer); std::vector<int> hcsr_row_ptr(m + 1, 0); std::vector<int> hcsr_col_ind(nnz, 0); std::vector<float> hcsr_val(nnz, 0); // Copy result back to host hipMemcpy(hcsr_row_ptr.data(), dcsr_row_ptr, sizeof(int) * (m + 1), hipMemcpyDeviceToHost); hipMemcpy(hcsr_col_ind.data(), dcsr_col_ind, sizeof(int) * nnz, hipMemcpyDeviceToHost); hipMemcpy(hcsr_val.data(), dcsr_val, sizeof(int) * nnz, hipMemcpyDeviceToHost); std::cout << "hcsr_row_ptr" << std::endl; for(size_t i = 0; i < hcsr_row_ptr.size(); ++i) { std::cout << hcsr_row_ptr[i] << " "; } std::cout << std::endl; std::cout << "hcsr_col_ind" << std::endl; for(size_t i = 0; i < hcsr_col_ind.size(); ++i) { std::cout << hcsr_col_ind[i] << " "; } std::cout << std::endl; std::cout << "hcsr_val" << std::endl; for(size_t i = 0; i < hcsr_val.size(); ++i) { std::cout << hcsr_val[i] << " "; } std::cout << std::endl; // Clear rocSPARSE rocsparse_destroy_dnmat_descr(matA); rocsparse_destroy_spmat_descr(matB); rocsparse_destroy_handle(handle); // Clear device memory hipFree(dcsr_row_ptr); hipFree(dcsr_col_ind); hipFree(dcsr_val); hipFree(ddense);
Note
This function writes the required allocation size (in bytes) to
buffer_size
and returns without performing the dense to sparse operation, when a nullptr is passed fortemp_buffer
.Note
This function is blocking with respect to the host.
Note
This routine does not support execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
mat_A – [in] dense matrix descriptor.
mat_B – [in] sparse matrix descriptor.
alg – [in] algorithm for the dense to sparse computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when
temp_buffer
is nullptr.temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to
buffer_size
and function returns without performing the dense to sparse operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
mat_A
,mat_B
, orbuffer_size
pointer is invalid.
rocsparse_sparse_to_dense()#
-
rocsparse_status rocsparse_sparse_to_dense(rocsparse_handle handle, rocsparse_const_spmat_descr mat_A, rocsparse_dnmat_descr mat_B, rocsparse_sparse_to_dense_alg alg, size_t *buffer_size, void *temp_buffer)#
Sparse matrix to dense matrix conversion.
rocsparse_sparse_to_dense
performs the conversion of a sparse matrix in CSR, CSC, or COO format to a dense matrix- Example
// 1 4 0 0 0 0 // A = 0 2 3 0 0 0 // 5 0 0 7 8 0 // 0 0 9 0 6 0 rocsparse_int m = 4; rocsparse_int n = 6; std::vector<int> hcsr_row_ptr = {0, 2, 4, 7, 9}; std::vector<int> hcsr_col_ind = {0, 1, 1, 2, 0, 3, 4, 2, 4}; std::vector<float> hcsr_val = {1, 4, 2, 3, 5, 7, 8, 9, 6}; std::vector<float> hdense(m * n, 0.0f); rocsparse_int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0]; // Offload data to device int* dcsr_row_ptr; int* dcsr_col_ind; float* dcsr_val; float* ddense; hipMalloc((void**)&dcsr_row_ptr, sizeof(int) * (m + 1)); hipMalloc((void**)&dcsr_col_ind, sizeof(int) * nnz); hipMalloc((void**)&dcsr_val, sizeof(float) * nnz); hipMalloc((void**)&ddense, sizeof(float) * m * n); hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice); hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice); hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice); hipMemcpy(ddense, hdense.data(), sizeof(float) * m * n, hipMemcpyHostToDevice); rocsparse_handle handle; rocsparse_spmat_descr matA; rocsparse_dnmat_descr matB; rocsparse_indextype row_idx_type = rocsparse_indextype_i32; rocsparse_indextype col_idx_type = rocsparse_indextype_i32; rocsparse_datatype data_type = rocsparse_datatype_f32_r; rocsparse_index_base idx_base = rocsparse_index_base_zero; rocsparse_create_handle(&handle); // Create sparse matrix A rocsparse_create_csr_descr(&matA, m, n, nnz, dcsr_row_ptr, dcsr_col_ind, dcsr_val, row_idx_type, col_idx_type, idx_base, data_type); // Create dense matrix B rocsparse_create_dnmat_descr(&matB, m, n, m, ddense, data_type, rocsparse_order_column); // Call sparse_to_dense size_t buffer_size = 0; rocsparse_sparse_to_dense(handle, matA, matB, rocsparse_sparse_to_dense_alg_default, &buffer_size, nullptr); void* temp_buffer; hipMalloc((void**)&temp_buffer, buffer_size); rocsparse_sparse_to_dense(handle, matA, matB, rocsparse_sparse_to_dense_alg_default, &buffer_size, temp_buffer); // Copy result back to host hipMemcpy(hdense.data(), ddense, sizeof(float) * m * n, hipMemcpyDeviceToHost); std::cout << "hdense" << std::endl; for(size_t i = 0; i < hdense.size(); ++i) { std::cout << hdense[i] << " "; } std::cout << std::endl; // Clear rocSPARSE rocsparse_destroy_spmat_descr(matA); rocsparse_destroy_dnmat_descr(matB); rocsparse_destroy_handle(handle); // Clear device memory hipFree(dcsr_row_ptr); hipFree(dcsr_col_ind); hipFree(dcsr_val); hipFree(ddense);
Note
This function writes the required allocation size (in bytes) to
buffer_size
and returns without performing the sparse to dense operation, when a nullptr is passed fortemp_buffer
.Note
This function is blocking with respect to the host.
Note
This routine does not support execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
mat_A – [in] sparse matrix descriptor.
mat_B – [in] dense matrix descriptor.
alg – [in] algorithm for the sparse to dense computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when
temp_buffer
is nullptr.temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to
buffer_size
and function returns without performing the sparse to dense operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
mat_A
,mat_B
, orbuffer_size
pointer is invalid.
rocsparse_sparse_to_sparse_buffer_size()#
-
rocsparse_status rocsparse_sparse_to_sparse_buffer_size(rocsparse_handle handle, rocsparse_sparse_to_sparse_descr descr, rocsparse_const_spmat_descr source, rocsparse_spmat_descr target, rocsparse_sparse_to_sparse_stage stage, size_t *buffer_size_in_bytes)#
Sparse matrix to sparse matrix conversion.
rocsparse_sparse_to_sparse_buffer_size
calculates the required buffer size in bytes for a given stagestage
.- Parameters:
handle – [in] handle to the rocsparse library context queue.
descr – [in] descriptor of the sparse_to_sparse algorithm.
source – [in] source sparse matrix descriptor.
target – [in] target sparse matrix descriptor.
stage – [in] stage of the sparse_to_sparse computation.
buffer_size_in_bytes – [out] size in bytes of the
buffer
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_value – if any required enumeration is invalid.
rocsparse_status_invalid_pointer –
mat_A
,mat_B
, orbuffer_size_in_bytes
pointer is invalid.
rocsparse_sparse_to_sparse()#
-
rocsparse_status rocsparse_sparse_to_sparse(rocsparse_handle handle, rocsparse_sparse_to_sparse_descr descr, rocsparse_const_spmat_descr source, rocsparse_spmat_descr target, rocsparse_sparse_to_sparse_stage stage, size_t buffer_size_in_bytes, void *buffer)#
Sparse matrix to sparse matrix conversion.
rocsparse_sparse_to_sparse
performs the conversion of a sparse matrix to a sparse matrix.- Example
This example converts a CSR matrix into an ELL matrix.
// It assumes the CSR arrays (ptr, ind, val) have already been allocated and filled. // Build Source rocsparse_spmat_descr source; rocsparse_create_csr_descr(&source, M, N, nnz, ptr, ind, val, rocsparse_indextype_i32, rocsparse_indextype_i32, rocsparse_index_base_zero, rocsparse_datatype_f32_r); // Build target void * ell_ind, * ell_val; int64_t ell_width = 0; rocsparse_spmat_descr target; rocsparse_create_ell_descr(&target, M, N, ell_ind, ell_val, ell_width, rocsparse_indextype_i32, rocsparse_index_base_zero, rocsparse_datatype_f32_r); // Create descriptor rocsparse_sparse_to_sparse_descr descr; rocsparse_sparse_to_sparse_create_descr(&descr, source, target, rocsparse_sparse_to_sparse_alg_default); // Analysis phase rocsparse_sparse_to_sparse_buffer_size(handle, descr, source, target, rocsparse_sparse_to_sparse_stage_analysis, &buffer_size); hipMalloc(&buffer,buffer_size); rocsparse_sparse_to_sparse(handle, descr, source, target, rocsparse_sparse_to_sparse_stage_analysis, buffer_size, buffer); hipFree(buffer); // // the user is responsible to allocate target arrays after the analysis phase. // { int64_t rows, cols, ell_width; void * ind, * val; rocsparse_indextype idx_type; rocsparse_index_base idx_base; rocsparse_datatype data_type; rocsparse_ell_get(target, &rows, &cols, &ind, &val, &ell_width, &idx_type, &idx_base, &data_type); hipMalloc(&ell_ind,ell_width * M * sizeof(int32_t)); hipMalloc(&ell_val,ell_width * M * sizeof(float))); rocsparse_ell_set_pointers(target, ell_ind, ell_val); } // Calculation phase rocsparse_sparse_to_sparse_buffer_size(handle, descr, source, target, rocsparse_sparse_to_sparse_stage_compute, &buffer_size); hipMalloc(&buffer,buffer_size); rocsparse_sparse_to_sparse(handle, descr, source, target, rocsparse_sparse_to_sparse_stage_compute, buffer_size, buffer); hipFree(buffer);
Note
The required allocation size (in bytes) to
buffer_size_in_bytes
must be obtained from rocsparse_sparse_to_sparse_buffer_size for each stage, indeed the required buffer size can be different between stages.Note
The format rocsparse_format_bell is not supported.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
descr – [in] descriptor of the sparse_to_sparse algorithm.
source – [in] sparse matrix descriptor.
target – [in] sparse matrix descriptor.
stage – [in] stage of the sparse_to_sparse computation.
buffer_size_in_bytes – [in] size in bytes of the
buffer
buffer – [in] temporary storage buffer allocated by the user.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_extract_buffer_size()#
-
rocsparse_status rocsparse_extract_buffer_size(rocsparse_handle handle, rocsparse_extract_descr descr, rocsparse_const_spmat_descr source, rocsparse_spmat_descr target, rocsparse_extract_stage stage, size_t *buffer_size_in_bytes)#
Sparse matrix extraction.
rocsparse_extract_buffer_size
calculates the required buffer size in bytes for a given stagestage
.Note
This routine is asynchronous with respect to the host. This routine does support execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
descr – [in] descriptor of the extract algorithm.
source – [in] source sparse matrix descriptor.
target – [in] target sparse matrix descriptor.
stage – [in] stage of the extract computation.
buffer_size_in_bytes – [out] size in bytes of the buffer.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_value – if
stage
is invalid.rocsparse_status_invalid_pointer –
descr
,source
,target
, orbuffer_size_in_bytes
pointer is invalid.
rocsparse_extract()#
-
rocsparse_status rocsparse_extract(rocsparse_handle handle, rocsparse_extract_descr descr, rocsparse_const_spmat_descr source, rocsparse_spmat_descr target, rocsparse_extract_stage stage, size_t buffer_size_in_bytes, void *buffer)#
Sparse matrix extraction.
rocsparse_extract
performs the extraction of the lower or upper part of a sparse matrix.The source and the target matrices must have the same format rocsparse_format. The source and the target matrices must have the same storage mode rocsparse_storage_mode. The attributes of the target matrix, the fill mode rocsparse_fill_mode and the diagonal type rocsparse_diag_type are used to parametrise the algorithm.
The required allocation size (in bytes) to
buffer_size_in_bytes
must be obtained from rocsparse_extract_buffer_size for each stage, since the required buffer size can be different between stages.The value of the number of non-zeros is available after the analysis phase rocsparse_extract_stage_analysis being executed. This value can be fetched with rocsparse_extract_nnz.
This routine is asynchronous with respect to the host. This routine does support execution in a hipGraph context.
- Example
This example extracts the lower part of CSR matrix into a CSR matrix.
// It assumes the CSR arrays (ptr, ind, val) have already been allocated and filled. // Build Source rocsparse_spmat_descr source; rocsparse_create_csr_descr(&source, M, N, nnz, ptr, ind, val, rocsparse_indextype_i32, rocsparse_indextype_i32, rocsparse_index_base_zero, rocsparse_datatype_f32_r); // Build target void * target_ptr; hipMalloc(&target_ptr,sizeof(int32_t)*(M+1)); rocsparse_spmat_descr target; rocsparse_create_csr_descr(&target, M, N, 0, target_ptr, nullptr, nullptr, rocsparse_indextype_i32, rocsparse_indextype_i32, rocsparse_index_base_zero, rocsparse_datatype_f32_r); const rocsparse_fill_mode fill_mode = rocsparse_fill_mode_lower; const rocsparse_diag_type diag_type = rocsparse_diag_type_non_unit; rocsparse_spmat_set_attribute(target, rocsparse_spmat_fill_mode, &fill_mode, sizeof(fill_mode)); rocsparse_spmat_set_attribute(target, rocsparse_spmat_diag_type, &diag_type, sizeof(diag_type)); // Create descriptor rocsparse_extract_descr descr; rocsparse_create_extract_descr(&descr, source, target, rocsparse_extract_alg_default); // Analysis phase rocsparse_extract_buffer_size(handle, descr, source, target, rocsparse_extract_stage_analysis, &buffer_size); hipMalloc(&buffer,buffer_size); rocsparse_extract(handle, descr, source, target, rocsparse_extract_stage_analysis, buffer_size, buffer); hipFree(buffer); // // The user is responsible to allocate target arrays after the analysis phase. // { int64_t target_nnz; rocsparse_extract_nnz(handle, descr, &target_nnz); void * target_ind, * target_val; hipMalloc(&target_ind, target_nnz * sizeof(int32_t)); hipMalloc(&target_val, target_nnz* sizeof(float))); rocsparse_csr_set_pointers(target, target_ptr, target_ind, target_val); } // Calculation phase rocsparse_extract_buffer_size(handle, descr, source, target, rocsparse_extract_stage_compute, &buffer_size); hipMalloc(&buffer,buffer_size); rocsparse_extract(handle, descr, source, target, rocsparse_extract_stage_compute, buffer_size, buffer); hipFree(buffer); rocsparse_destroy_extract_descr(descr);
Note
Supported formats are rocsparse_format_csr and rocsparse_format_csc.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
descr – [in] descriptor of the extract algorithm.
source – [in] sparse matrix descriptor.
target – [in] sparse matrix descriptor.
stage – [in] stage of the extract computation.
buffer_size_in_bytes – [in] size in bytes of the
buffer
buffer – [in] temporary storage buffer allocated by the user.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_value – if
stage
is invalid.rocsparse_status_invalid_pointer –
descr
,source
,target
, orbuffer
pointer is invalid.
rocsparse_extract_nnz#
-
rocsparse_status rocsparse_extract_nnz(rocsparse_handle handle, rocsparse_extract_descr descr, int64_t *nnz)#
Sparse matrix extraction.
rocsparse_extract_nnz
returns the number of non-zeros of the extracted matrix. The value is available after the analysis phase rocsparse_extract_stage_analysis being executed.Note
This routine is asynchronous with respect to the host. This routine does support execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
descr – [in] descriptor of the extract algorithm.
nnz – [out] the number of non-zeros.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
descr
ornnz
pointer is invalid.
rocsparse_check_spmat#
-
rocsparse_status rocsparse_check_spmat(rocsparse_handle handle, rocsparse_const_spmat_descr mat, rocsparse_data_status *data_status, rocsparse_check_spmat_stage stage, size_t *buffer_size, void *temp_buffer)#
Check matrix to see if it is valid.
rocsparse_check_spmat
checks if the input matrix is valid.- Uniform Precisions:
A
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c
Note
This function writes the required allocation size (in bytes) to
buffer_size
and returns without performing the checking operation, when stage is equal to rocsparse_check_spmat_stage_buffer_size.Note
The sparse matrix formats currently supported are: rocsparse_format_coo, rocsparse_format_csr, rocsparse_format_csc and rocsparse_format_ell.
Note
check_spmat requires two stages to complete. The first stage rocsparse_check_spmat_stage_buffer_size will return the size of the temporary storage buffer that is required for subsequent calls to rocsparse_check_spmat. In the final stage rocsparse_check_spmat_stage_compute, the actual computation is performed.
Note
This routine does not support execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
mat – [in] matrix descriptor.
data_status – [out] modified to indicate the status of the data
stage – [in] check_matrix stage for the matrix computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when
temp_buffer
is nullptr.temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to
buffer_size
and function returns without performing the checking operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
mat
,buffer_size
,temp_buffer
ordata_status
pointer is invalid.rocsparse_status_invalid_value – the value of stage is incorrect.
rocsparse_spitsv#
-
rocsparse_status rocsparse_spitsv(rocsparse_handle handle, rocsparse_int *host_nmaxiter, const void *host_tol, void *host_history, rocsparse_operation trans, const void *alpha, const rocsparse_spmat_descr mat, const rocsparse_dnvec_descr x, const rocsparse_dnvec_descr y, rocsparse_datatype compute_type, rocsparse_spitsv_alg alg, rocsparse_spitsv_stage stage, size_t *buffer_size, void *temp_buffer)#
Sparse iterative triangular solve.
rocsparse_spitsv
solves, using the Jacobi iterative method, a sparse triangular linear system of a sparse \(m \times m\) matrix, defined in CSR format, a dense solution vector \(y\) and the right-hand side \(x\) that is multiplied by \(\alpha\), such that\[ op(A) y = \alpha x, \]with\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans == rocsparse_operation_none} \\ A^T, & \text{if trans == rocsparse_operation_transpose} \\ A^H, & \text{if trans == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]The Jacobi method applied to the sparse triangular linear system above gives
\[ y_{k+1} = y_{k} + D^{-1} ( \alpha x - (D + T) y_{k} ) \]with \(A = D + T\), \(D\) the diagonal of \(A\) and \(T\) the strict triangular part of \(A\).The above equation can be also written as
\[ y_{k+1} = y_{k} + D^{-1} r_k \]where\[ r_k = \alpha x - (D + T) y_k. \]Starting with \(y_0 = \)y
, the method iterates if \( k \lt \)host_nmaxiter
and until\[ \Vert r_k \Vert_{\infty} \le \epsilon, \]with \(\epsilon\) =host_tol
.Note
SpITSV requires three stages to complete. The first stage rocsparse_spitsv_stage_buffer_size will return the size of the temporary storage buffer that is required for subsequent calls. The second stage rocsparse_spitsv_stage_preprocess will preprocess data that would be saved in the temporary storage buffer. In the final stage rocsparse_spitsv_stage_compute, the actual computation is performed.
Note
Currently, only non-mixed numerical precision is supported.
Note
This routine does not support execution in a hipGraph context.
- Parameters:
handle – [in] handle to the rocsparse library context queue.
host_nmaxiter – [inout] maximum number of iteration on input and number of iteration on output. If the output number of iterations is strictly less than the input maximum number of iterations, then the algorithm converged.
host_tol – [in] if the pointer is null then loop will execute
nmaxiter
[0] iterations. The precision is float for f32 based calculation (including the complex case) and double for f64 based calculation (including the complex case).host_history – [out] Optional array to record the norm of the residual before each iteration. The precision is float for f32 based calculation (including the complex case) and double for f64 based calculation (including the complex case).
trans – [in] matrix operation type.
alpha – [in] scalar \(\alpha\).
mat – [in] matrix descriptor.
x – [in] vector descriptor.
y – [inout] vector descriptor.
compute_type – [in] floating point precision for the SpITSV computation.
alg – [in] SpITSV algorithm for the SpITSV computation.
stage – [in] SpITSV stage for the SpITSV computation.
buffer_size – [out] number of bytes of the temporary storage buffer.
temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to
buffer_size
and function returns without performing the SpITSV operation.
- Return values:
rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer –
alpha
,mat
,x
,y
,descr
orbuffer_size
pointer is invalid.rocsparse_status_not_implemented –
trans
,compute_type
,stage
oralg
is currently not supported.