Sparse generic functions

Sparse generic functions#

This module contains all sparse generic routines.

The sparse generic routines describe some of the most common operations that manipulate sparse matrices and vectors. The generic API is more flexible than the other rocSPARSE APIs because it is easy to set different index types, data types, and compute types. For some generic routines, for example, SpMV, the generic API also lets users select different algorithms which have different performance characteristics depending on the sparse matrix being operated on.

rocsparse_axpby()#

rocsparse_status rocsparse_axpby(rocsparse_handle handle, const void *alpha, rocsparse_const_spvec_descr x, const void *beta, rocsparse_dnvec_descr y)#

Scale a sparse vector and add it to a scaled dense vector.

rocsparse_axpby multiplies the sparse vector \(x\) with scalar \(\alpha\) and adds the result to the dense vector \(y\) that is multiplied with scalar \(\beta\), such that

\[ y := \alpha \cdot x + \beta \cdot y \]

for(i = 0; i < size; ++i)
{
    y[i] = beta * y[i]
}
for(i = 0; i < nnz; ++i)
{
    y[x_ind[i]] += alpha * x_val[i]
}

rocsparse_axpby supports the following uniform precision data types for the sparse and dense vectors x and y and compute types for the scalars \(\alpha\) and \(\beta\).

Uniform Precisions:

X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c

Mixed precisions:

X / Y	compute_type
rocsparse_datatype_f16_r	rocsparse_datatype_f32_r
rocsparse_datatype_bf16_r	rocsparse_datatype_f32_r

Example

int main()
{
    // Number of non-zeros of the sparse vector
    int nnz = 3;

    // Size of sparse and dense vector
    int size = 9;

    // Sparse index vector
    std::vector<int> hx_ind = {0, 3, 5};

    // Sparse value vector
    std::vector<float> hx_val = {1.0f, 2.0f, 3.0f};

    // Dense vector
    std::vector<float> hy = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f};

    // Scalar alpha
    float alpha = 3.7f;

    // Scalar beta
    float beta = 1.2f;

    // Offload data to device
    int*   dx_ind;
    float* dx_val;
    float* dy;
    HIP_CHECK(hipMalloc(&dx_ind, sizeof(int) * nnz));
    HIP_CHECK(hipMalloc(&dx_val, sizeof(float) * nnz));
    HIP_CHECK(hipMalloc(&dy, sizeof(float) * size));

    HIP_CHECK(hipMemcpy(dx_ind, hx_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dx_val, hx_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dy, hy.data(), sizeof(float) * size, hipMemcpyHostToDevice));

    rocsparse_handle      handle;
    rocsparse_spvec_descr vecX;
    rocsparse_dnvec_descr vecY;

    rocsparse_indextype  idx_type  = rocsparse_indextype_i32;
    rocsparse_datatype   data_type = rocsparse_datatype_f32_r;
    rocsparse_index_base idx_base  = rocsparse_index_base_zero;

    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    // Create sparse vector X
    ROCSPARSE_CHECK(rocsparse_create_spvec_descr(
        &vecX, size, nnz, dx_ind, dx_val, idx_type, idx_base, data_type));

    // Create dense vector Y
    ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, size, dy, data_type));

    // Call axpby to perform y = beta * y + alpha * x
    ROCSPARSE_CHECK(rocsparse_axpby(handle, &alpha, vecX, &beta, vecY));

    ROCSPARSE_CHECK(rocsparse_dnvec_get_values(vecY, (void**)&dy));

    // Copy result back to host
    HIP_CHECK(hipMemcpy(hy.data(), dy, sizeof(float) * size, hipMemcpyDeviceToHost));

    std::cout << "y" << std::endl;
    for(size_t i = 0; i < hy.size(); ++i)
    {
        std::cout << hy[i] << " ";
    }
    std::cout << std::endl;

    // Clear rocSPARSE
    ROCSPARSE_CHECK(rocsparse_destroy_spvec_descr(vecX));
    ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY));
    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));

    // Clear device memory
    HIP_CHECK(hipFree(dx_ind));
    HIP_CHECK(hipFree(dx_val));
    HIP_CHECK(hipFree(dy));

    return 0;
}

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Note

This routine supports execution in a hipGraph context.

Parameters:

handle – [in] handle to the rocsparse library context queue.
alpha – [in] scalar \(\alpha\).
x – [in] sparse matrix descriptor.
beta – [in] scalar \(\beta\).
y – [inout] dense matrix descriptor.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – alpha, x, beta or y pointer is invalid.

rocsparse_gather()#

rocsparse_status rocsparse_gather(rocsparse_handle handle, rocsparse_const_dnvec_descr y, rocsparse_spvec_descr x)#

Gather elements from a dense vector and store them into a sparse vector.

rocsparse_gather gathers the elements from the dense vector \(y\) and stores them in the sparse vector \(x\).

for(i = 0; i < nnz; ++i)
{
    x_val[i] = y[x_ind[i]];
}

rocsparse_gather supports the following uniform precision data types for the sparse and dense vectors x and y.

Uniform Precisions:

X / Y
rocsparse_datatype_i8_r
rocsparse_datatype_f16_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c

Example

int main()
{
    // Number of non-zeros of the sparse vector
    int nnz = 3;

    // Size of sparse and dense vector
    int size = 9;

    // Sparse index vector
    std::vector<int> hx_ind = {0, 3, 5};

    // Dense vector
    std::vector<float> hy = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f};

    // Offload data to device
    int*   dx_ind;
    float* dx_val;
    float* dy;
    HIP_CHECK(hipMalloc(&dx_ind, sizeof(int) * nnz));
    HIP_CHECK(hipMalloc(&dx_val, sizeof(float) * nnz));
    HIP_CHECK(hipMalloc(&dy, sizeof(float) * size));

    HIP_CHECK(hipMemcpy(dx_ind, hx_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dy, hy.data(), sizeof(float) * size, hipMemcpyHostToDevice));

    rocsparse_handle      handle;
    rocsparse_spvec_descr vecX;
    rocsparse_dnvec_descr vecY;

    rocsparse_indextype  idx_type  = rocsparse_indextype_i32;
    rocsparse_datatype   data_type = rocsparse_datatype_f32_r;
    rocsparse_index_base idx_base  = rocsparse_index_base_zero;

    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    // Create sparse vector X
    ROCSPARSE_CHECK(rocsparse_create_spvec_descr(
        &vecX, size, nnz, dx_ind, dx_val, idx_type, idx_base, data_type));

    // Create dense vector Y
    ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, size, dy, data_type));

    // Call axpby to perform gather
    ROCSPARSE_CHECK(rocsparse_gather(handle, vecY, vecX));

    ROCSPARSE_CHECK(rocsparse_spvec_get_values(vecX, (void**)&dx_val));

    // Copy result back to host
    std::vector<float> hx_val(nnz, 0.0f);
    HIP_CHECK(hipMemcpy(hx_val.data(), dx_val, sizeof(float) * nnz, hipMemcpyDeviceToHost));

    std::cout << "x" << std::endl;
    for(size_t i = 0; i < hx_val.size(); ++i)
    {
        std::cout << hx_val[i] << " ";
    }

    std::cout << std::endl;

    // Clear rocSPARSE
    ROCSPARSE_CHECK(rocsparse_destroy_spvec_descr(vecX));
    ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY));
    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));

    // Clear device memory
    HIP_CHECK(hipFree(dx_ind));
    HIP_CHECK(hipFree(dx_val));
    HIP_CHECK(hipFree(dy));

    return 0;
}

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Note

This routine supports execution in a hipGraph context.

Parameters:

handle – [in] handle to the rocsparse library context queue.
y – [in] dense vector \(y\).
x – [out] sparse vector \(x\).

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – x or y pointer is invalid.

rocsparse_scatter()#

rocsparse_status rocsparse_scatter(rocsparse_handle handle, rocsparse_const_spvec_descr x, rocsparse_dnvec_descr y)#

Scatter elements from a sparse vector into a dense vector.

rocsparse_scatter scatters the elements from the sparse vector \(x\) in the dense vector \(y\).

for(i = 0; i < nnz; ++i)
{
    y[x_ind[i]] = x_val[i];
}

rocsparse_scatter supports the following uniform precision data types for the sparse and dense vectors x and y.

Uniform Precisions:

X / Y
rocsparse_datatype_i8_r
rocsparse_datatype_f16_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c

Example

int main()
{
    // Number of non-zeros of the sparse vector
    int nnz = 3;

    // Size of sparse and dense vector
    int size = 9;

    // Sparse index vector
    std::vector<int> hx_ind = {0, 3, 5};

    // Sparse value vector
    std::vector<float> hx_val = {1.0f, 2.0f, 3.0f};

    // Dense vector
    std::vector<float> hy = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f};

    // Offload data to device
    int*   dx_ind;
    float* dx_val;
    float* dy;
    HIP_CHECK(hipMalloc(&dx_ind, sizeof(int) * nnz));
    HIP_CHECK(hipMalloc(&dx_val, sizeof(float) * nnz));
    HIP_CHECK(hipMalloc(&dy, sizeof(float) * size));

    HIP_CHECK(hipMemcpy(dx_ind, hx_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dx_val, hx_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dy, hy.data(), sizeof(float) * size, hipMemcpyHostToDevice));

    rocsparse_handle      handle;
    rocsparse_spvec_descr vecX;
    rocsparse_dnvec_descr vecY;

    rocsparse_indextype  idx_type  = rocsparse_indextype_i32;
    rocsparse_datatype   data_type = rocsparse_datatype_f32_r;
    rocsparse_index_base idx_base  = rocsparse_index_base_zero;

    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    // Create sparse vector X
    ROCSPARSE_CHECK(rocsparse_create_spvec_descr(
        &vecX, size, nnz, dx_ind, dx_val, idx_type, idx_base, data_type));

    // Create dense vector Y
    ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, size, dy, data_type));

    // Call axpby to perform scatter
    ROCSPARSE_CHECK(rocsparse_scatter(handle, vecX, vecY));

    ROCSPARSE_CHECK(rocsparse_dnvec_get_values(vecY, (void**)&dy));

    // Copy result back to host
    HIP_CHECK(hipMemcpy(hy.data(), dy, sizeof(float) * size, hipMemcpyDeviceToHost));

    std::cout << "y" << std::endl;
    for(size_t i = 0; i < hy.size(); ++i)
    {
        std::cout << hy[i] << " ";
    }

    std::cout << std::endl;

    // Clear rocSPARSE
    ROCSPARSE_CHECK(rocsparse_destroy_spvec_descr(vecX));
    ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY));
    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));

    // Clear device memory
    HIP_CHECK(hipFree(dx_ind));
    HIP_CHECK(hipFree(dx_val));
    HIP_CHECK(hipFree(dy));

    return 0;
}

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Note

This routine supports execution in a hipGraph context.

Parameters:

handle – [in] handle to the rocsparse library context queue.
x – [in] sparse vector \(x\).
y – [out] dense vector \(y\).

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – x or y pointer is invalid.

rocsparse_rot()#

rocsparse_status rocsparse_rot(rocsparse_handle handle, const void *c, const void *s, rocsparse_spvec_descr x, rocsparse_dnvec_descr y)#

Apply Givens rotation to a dense and a sparse vector.

rocsparse_rot applies the Givens rotation matrix \(G\) to the sparse vector \(x\) and the dense vector \(y\), where

\[\begin{split} G = \begin{pmatrix} c & s \\ -s & c \end{pmatrix} \end{split}\]

for(i = 0; i < nnz; ++i)
{
    x_tmp = x_val[i];
    y_tmp = y[x_ind[i]];

    x_val[i]    = c * x_tmp + s * y_tmp;
    y[x_ind[i]] = c * y_tmp - s * x_tmp;
}

rocsparse_rot supports the following uniform precision data types for the sparse and dense vectors x and y and compute types for the scalars \(c\) and \(s\).

Uniform Precisions:

X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c

Example

int main()
{
    // Number of non-zeros of the sparse vector
    int nnz = 3;

    // Size of sparse and dense vector
    int size = 9;

    // Sparse index vector
    std::vector<int> hx_ind = {0, 3, 5};

    // Sparse value vector
    std::vector<float> hx_val = {1.0f, 2.0f, 3.0f};

    // Dense vector
    std::vector<float> hy = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f};

    // Scalar c
    float c = 3.7f;

    // Scalar s
    float s = 1.2f;

    // Offload data to device
    int*   dx_ind;
    float* dx_val;
    float* dy;
    HIP_CHECK(hipMalloc(&dx_ind, sizeof(int) * nnz));
    HIP_CHECK(hipMalloc(&dx_val, sizeof(float) * nnz));
    HIP_CHECK(hipMalloc(&dy, sizeof(float) * size));

    HIP_CHECK(hipMemcpy(dx_ind, hx_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dx_val, hx_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dy, hy.data(), sizeof(float) * size, hipMemcpyHostToDevice));

    rocsparse_handle      handle;
    rocsparse_spvec_descr vecX;
    rocsparse_dnvec_descr vecY;

    rocsparse_indextype  idx_type  = rocsparse_indextype_i32;
    rocsparse_datatype   data_type = rocsparse_datatype_f32_r;
    rocsparse_index_base idx_base  = rocsparse_index_base_zero;

    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    // Create sparse vector X
    ROCSPARSE_CHECK(rocsparse_create_spvec_descr(
        &vecX, size, nnz, dx_ind, dx_val, idx_type, idx_base, data_type));

    // Create dense vector Y
    ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, size, dy, data_type));

    // Call rot
    ROCSPARSE_CHECK(rocsparse_rot(handle, (void*)&c, (void*)&s, vecX, vecY));

    ROCSPARSE_CHECK(rocsparse_spvec_get_values(vecX, (void**)&dx_val));
    ROCSPARSE_CHECK(rocsparse_dnvec_get_values(vecY, (void**)&dy));

    // Copy result back to host
    HIP_CHECK(hipMemcpy(hx_val.data(), dx_val, sizeof(float) * nnz, hipMemcpyDeviceToHost));
    HIP_CHECK(hipMemcpy(hy.data(), dy, sizeof(float) * size, hipMemcpyDeviceToHost));

    std::cout << "x" << std::endl;
    for(size_t i = 0; i < hx_val.size(); ++i)
    {
        std::cout << hx_val[i] << " ";
    }

    std::cout << std::endl;

    std::cout << "y" << std::endl;
    for(size_t i = 0; i < hy.size(); ++i)
    {
        std::cout << hy[i] << " ";
    }

    std::cout << std::endl;

    // Clear rocSPARSE
    ROCSPARSE_CHECK(rocsparse_destroy_spvec_descr(vecX));
    ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY));
    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));

    // Clear device memory
    HIP_CHECK(hipFree(dx_ind));
    HIP_CHECK(hipFree(dx_val));
    HIP_CHECK(hipFree(dy));

    return 0;
}

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Note

This routine supports execution in a hipGraph context.

Parameters:

handle – [in] handle to the rocsparse library context queue.
c – [in] pointer to the cosine element of \(G\), can be on host or device.
s – [in] pointer to the sine element of \(G\), can be on host or device.
x – [inout] sparse vector \(x\).
y – [inout] dense vector \(y\).

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – c, s, x or y pointer is invalid.

rocsparse_spvv()#

rocsparse_status rocsparse_spvv(rocsparse_handle handle, rocsparse_operation trans, rocsparse_const_spvec_descr x, rocsparse_const_dnvec_descr y, void *result, rocsparse_datatype compute_type, size_t *buffer_size, void *temp_buffer)#

Sparse vector inner dot product.

rocsparse_spvv computes the inner dot product of the sparse vector \(x\) with the dense vector \(y\), such that

\[ \text{result} := op(x) \cdot y, \]

with

\[\begin{split} op(x) = \left\{ \begin{array}{ll} x, & \text{if trans == rocsparse_operation_none} \\ \bar{x}, & \text{if trans == rocsparse_operation_conjugate_transpose} \\ \end{array} \right. \end{split}\]

result = 0;
for(i = 0; i < nnz; ++i)
{
    result += x_val[i] * y[x_ind[i]];
}

Performing the above operation involves two steps. First, the user calls rocsparse_spvv with temp_buffer set to nullptr which will return the required temporary buffer size in the parameter buffer_size. The user then allocates this buffer. Finally, the user then completes the computation by calling rocsparse_spvv a second time with the newly allocated buffer. Once the computation is complete, the user is free to deallocate the buffer.

rocsparse_spvv supports the following uniform and mixed precision data types for the sparse and dense vectors \(x\) and \(y\) and compute types for the scalar \(result\).

Uniform Precisions:

X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c

Mixed precisions:

X / Y	compute_type / result
rocsparse_datatype_i8_r	rocsparse_datatype_i32_r
rocsparse_datatype_i8_r	rocsparse_datatype_f32_r
rocsparse_datatype_f16_r	rocsparse_datatype_f32_r
rocsparse_datatype_bf16_r	rocsparse_datatype_f32_r

Example

int main()
{
    // Number of non-zeros of the sparse vector
    int nnz = 3;

    // Size of sparse and dense vector
    int size = 9;

    // Sparse index vector
    std::vector<int> hx_ind = {0, 3, 5};

    // Sparse value vector
    std::vector<float> hx_val = {1.0f, 2.0f, 3.0f};

    // Dense vector
    std::vector<float> hy = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f};

    // Offload data to device
    int*   dx_ind;
    float* dx_val;
    float* dy;
    HIP_CHECK(hipMalloc(&dx_ind, sizeof(int) * nnz));
    HIP_CHECK(hipMalloc(&dx_val, sizeof(float) * nnz));
    HIP_CHECK(hipMalloc(&dy, sizeof(float) * size));

    HIP_CHECK(hipMemcpy(dx_ind, hx_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dx_val, hx_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dy, hy.data(), sizeof(float) * size, hipMemcpyHostToDevice));

    rocsparse_handle      handle;
    rocsparse_spvec_descr vecX;
    rocsparse_dnvec_descr vecY;

    rocsparse_indextype  idx_type     = rocsparse_indextype_i32;
    rocsparse_datatype   data_type    = rocsparse_datatype_f32_r;
    rocsparse_datatype   compute_type = rocsparse_datatype_f32_r;
    rocsparse_operation  trans        = rocsparse_operation_none;
    rocsparse_index_base idx_base     = rocsparse_index_base_zero;

    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    // Create sparse vector X
    ROCSPARSE_CHECK(rocsparse_create_spvec_descr(
        &vecX, size, nnz, dx_ind, dx_val, idx_type, idx_base, data_type));

    // Create dense vector Y
    ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, size, dy, data_type));

    // Obtain buffer size
    float  hresult = 0.0f;
    size_t buffer_size;
    ROCSPARSE_CHECK(
        rocsparse_spvv(handle, trans, vecX, vecY, &hresult, compute_type, &buffer_size, nullptr));

    void* temp_buffer;
    HIP_CHECK(hipMalloc(&temp_buffer, buffer_size));

    // SpVV
    ROCSPARSE_CHECK(rocsparse_spvv(
        handle, trans, vecX, vecY, &hresult, compute_type, &buffer_size, temp_buffer));

    HIP_CHECK(hipDeviceSynchronize());

    // Clear rocSPARSE
    ROCSPARSE_CHECK(rocsparse_destroy_spvec_descr(vecX));
    ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY));
    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));

    // Clear device memory
    HIP_CHECK(hipFree(dx_ind));
    HIP_CHECK(hipFree(dx_val));
    HIP_CHECK(hipFree(dy));
    HIP_CHECK(hipFree(temp_buffer));

    return 0;
}

Note

This function writes the required allocation size (in bytes) to buffer_size and returns without performing the SpVV operation, when a nullptr is passed for temp_buffer.

Note

This function is blocking with respect to the host.

Note

This routine does not support execution in a hipGraph context.

Parameters:

handle – [in] handle to the rocsparse library context queue.
trans – [in] sparse vector operation type.
x – [in] sparse vector descriptor.
y – [in] dense vector descriptor.
result – [out] pointer to the result, can be host or device memory
compute_type – [in] floating point precision for the SpVV computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when temp_buffer is nullptr.
temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to buffer_size and function returns without performing the SpVV operation.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – x, y, result or buffer_size pointer is invalid.
rocsparse_status_not_implemented – compute_type is currently not supported.

rocsparse_spmv()#

rocsparse_status rocsparse_spmv(rocsparse_handle handle, rocsparse_operation trans, const void *alpha, rocsparse_const_spmat_descr mat, rocsparse_const_dnvec_descr x, const void *beta, const rocsparse_dnvec_descr y, rocsparse_datatype compute_type, rocsparse_spmv_alg alg, rocsparse_spmv_stage stage, size_t *buffer_size, void *temp_buffer)#

Sparse matrix vector multiplication.

rocsparse_spmv multiplies the scalar \(\alpha\) with a sparse \(m \times n\) matrix \(op(A)\), defined in CSR, CSC, COO, COO (AoS), BSR, or ELL format, with the dense vector \(x\) and adds the result to the dense vector \(y\) that is multiplied by the scalar \(\beta\), such that

\[ y := \alpha \cdot op(A) \cdot x + \beta \cdot y, \]

with

\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans == rocsparse_operation_none} \\ A^T, & \text{if trans == rocsparse_operation_transpose} \\ A^H, & \text{if trans == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]

Performing the above operation involves multiple steps. First the user calls rocsparse_spmv with the stage parameter set to rocsparse_spmv_stage_buffer_size to determine the size of the required temporary storage buffer. The user then allocates this buffer and calls rocsparse_spmv with the stage parameter set to rocsparse_spmv_stage_preprocess. Depending on the algorithm and sparse matrix format, this will perform analysis on the sparsity pattern of \(op(A)\). Finally the user completes the operation by calling rocsparse_spmv with the stage parmeter set to rocsparse_spmv_stage_compute. The buffer size, buffer allocation, and preprecess stages only need to be called once for a given sparse matrix \(op(A)\) while the computation stage can be repeatedly used with different \(x\) and \(y\) vectors. Once all calls to rocsparse_spmv are complete, the temporary buffer can be deallocated.

rocsparse_spmv supports multiple different algorithms. These algorithms have different trade offs depending on the sparsity pattern of the matrix, whether or not the results need to be deterministic, and how many times the sparse-vector product will be performed.

Algorithm	Deterministic	Preprocessing	Notes
rocsparse_spmv_alg_csr_rowsplit	Yes	No	Is best suited for matrices with all rows having a similar number of non-zeros. Can out perform adaptive and LRB algorithms in certain sparsity patterns. Will perform very poorly if some rows have few non-zeros and some rows have many non-zeros.
rocsparse_spmv_alg_csr_stream	Yes	No	[Deprecated] old name for rocsparse_spmv_alg_csr_rowsplit.
rocsparse_spmv_alg_csr_adaptive	No	Yes	Generally the fastest algorithm across all matrix sparsity patterns. This includes matrices that have some rows with many non-zeros and some rows with few non-zeros. Requires a lengthy preprocessing that needs to be amortized over many subsequent sparse vector products.
rocsparse_spmv_alg_csr_lrb	No	Yes	Like adaptive algorithm, generally performs well across all matrix sparsity patterns. Generally not as fast as adaptive algorithm, however uses a much faster pre-processing step. Good for when only a few number of sparse vector products will be performed.
rocsparse_spmv_alg_csr_nnzsplit	No	Yes	Like adaptive algorithm, generally performs well across all matrix sparsity patterns. Generally not as fast as adaptive algorithm but faster than LRB algorithm. It uses a much faster pre-processing step than LRB. Good for when the number of sparse vector products that will be performed is less than one hundred. If more products need to be computed, the adaptive algorithm is probably faster.

COO Algorithms	Deterministic	Preprocessing	Notes
rocsparse_spmv_alg_coo	Yes	Yes	Generally not as fast as atomic algorithm but is deterministic
rocsparse_spmv_alg_coo_atomic	No	No	Generally the fastest COO algorithm

ELL Algorithms	Deterministic	Preprocessing	Notes
rocsparse_spmv_alg_ell	Yes	No

BSR Algorithm	Deterministic	Preprocessing	Notes
rocsparse_spmv_alg_bsr	Yes	No

rocsparse_spmv supports multiple combinations of data types and compute types. The tables below indicate the currently supported different data types that can be used for for the sparse matrix \(op(A)\) and the dense vectors \(x\) and \(y\) and the compute type for \(\alpha\) and \(\beta\). The advantage of using different data types is to save on memory bandwidth and storage when a user application allows while performing the actual computation in a higher precision.

rocsparse_spmv supports rocsparse_indextype_i32 and rocsparse_indextype_i64 index precisions for storing the row pointer and column indices arrays of the sparse matrices.

Uniform Precisions:

A / X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c

Mixed precisions:

A / X	Y	compute_type
rocsparse_datatype_i8_r	rocsparse_datatype_i32_r	rocsparse_datatype_i32_r
rocsparse_datatype_i8_r	rocsparse_datatype_f32_r	rocsparse_datatype_f32_r
rocsparse_datatype_f16_r	rocsparse_datatype_f32_r	rocsparse_datatype_f32_r
rocsparse_datatype_bf16_r	rocsparse_datatype_f32_r	rocsparse_datatype_f32_r

Mixed-regular real precisions

A	X / Y / compute_type
rocsparse_datatype_f32_r	rocsparse_datatype_f64_r
rocsparse_datatype_f32_c	rocsparse_datatype_f64_c

Mixed-regular Complex precisions

A	X / Y / compute_type
rocsparse_datatype_f32_r	rocsparse_datatype_f32_c
rocsparse_datatype_f64_r	rocsparse_datatype_f64_c

Example

int main()
{
    //     1 4 0 0 0 0
    // A = 0 2 3 0 0 0
    //     5 0 0 7 8 0
    //     0 0 9 0 6 0
    int m = 4;
    int n = 6;

    std::vector<int>   hcsr_row_ptr = {0, 2, 4, 7, 9};
    std::vector<int>   hcsr_col_ind = {0, 1, 1, 2, 0, 3, 4, 2, 4};
    std::vector<float> hcsr_val     = {1, 4, 2, 3, 5, 7, 8, 9, 6};
    std::vector<float> hx(n, 1.0f);
    std::vector<float> hy(m, 0.0f);

    // Scalar alpha
    float alpha = 3.7f;

    // Scalar beta
    float beta = 0.0f;

    int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0];

    // Offload data to device
    int*   dcsr_row_ptr;
    int*   dcsr_col_ind;
    float* dcsr_val;
    float* dx;
    float* dy;
    HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1)));
    HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz));
    HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz));
    HIP_CHECK(hipMalloc(&dx, sizeof(float) * n));
    HIP_CHECK(hipMalloc(&dy, sizeof(float) * m));

    HIP_CHECK(
        hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dx, hx.data(), sizeof(float) * n, hipMemcpyHostToDevice));

    rocsparse_handle      handle;
    rocsparse_spmat_descr matA;
    rocsparse_dnvec_descr vecX;
    rocsparse_dnvec_descr vecY;

    rocsparse_indextype  row_idx_type = rocsparse_indextype_i32;
    rocsparse_indextype  col_idx_type = rocsparse_indextype_i32;
    rocsparse_datatype   data_type    = rocsparse_datatype_f32_r;
    rocsparse_datatype   compute_type = rocsparse_datatype_f32_r;
    rocsparse_index_base idx_base     = rocsparse_index_base_zero;
    rocsparse_operation  trans        = rocsparse_operation_none;

    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    // Create sparse matrix A
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA,
                                               m,
                                               n,
                                               nnz,
                                               dcsr_row_ptr,
                                               dcsr_col_ind,
                                               dcsr_val,
                                               row_idx_type,
                                               col_idx_type,
                                               idx_base,
                                               data_type));

    // Create dense vector X
    ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecX, n, dx, data_type));

    // Create dense vector Y
    ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, m, dy, data_type));

    // Call spmv to get buffer size
    size_t buffer_size;
    ROCSPARSE_CHECK(rocsparse_spmv(handle,
                                   trans,
                                   &alpha,
                                   matA,
                                   vecX,
                                   &beta,
                                   vecY,
                                   compute_type,
                                   rocsparse_spmv_alg_csr_adaptive,
                                   rocsparse_spmv_stage_buffer_size,
                                   &buffer_size,
                                   nullptr));

    void* temp_buffer;
    HIP_CHECK(hipMalloc(&temp_buffer, buffer_size));

    // Call spmv to perform analysis
    ROCSPARSE_CHECK(rocsparse_spmv(handle,
                                   trans,
                                   &alpha,
                                   matA,
                                   vecX,
                                   &beta,
                                   vecY,
                                   compute_type,
                                   rocsparse_spmv_alg_csr_adaptive,
                                   rocsparse_spmv_stage_preprocess,
                                   &buffer_size,
                                   temp_buffer));

    // Call spmv to perform computation
    ROCSPARSE_CHECK(rocsparse_spmv(handle,
                                   trans,
                                   &alpha,
                                   matA,
                                   vecX,
                                   &beta,
                                   vecY,
                                   compute_type,
                                   rocsparse_spmv_alg_csr_adaptive,
                                   rocsparse_spmv_stage_compute,
                                   &buffer_size,
                                   temp_buffer));

    // Copy result back to host
    HIP_CHECK(hipMemcpy(hy.data(), dy, sizeof(float) * m, hipMemcpyDeviceToHost));

    // Clear rocSPARSE
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA));
    ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecX));
    ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY));
    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));

    // Clear device memory
    HIP_CHECK(hipFree(dcsr_row_ptr));
    HIP_CHECK(hipFree(dcsr_col_ind));
    HIP_CHECK(hipFree(dcsr_val));
    HIP_CHECK(hipFree(dx));
    HIP_CHECK(hipFree(dy));
    HIP_CHECK(hipFree(temp_buffer));

    return 0;
}

Note

None of the algorithms above are deterministic when \(A\) is transposed.

Note

The sparse matrix formats currently supported are: rocsparse_format_bsr, rocsparse_format_coo, rocsparse_format_coo_aos, rocsparse_format_csr, rocsparse_format_csc and rocsparse_format_ell.

Note

Only the rocsparse_spmv_stage_buffer_size stage and the rocsparse_spmv_stage_compute stage are non blocking and executed asynchronously with respect to the host. They may return before the actual computation has finished. The rocsparse_spmv_stage_preprocess stage is blocking with respect to the host.

Note

Only the rocsparse_spmv_stage_buffer_size stage and the rocsparse_spmv_stage_compute stage support execution in a hipGraph context. The rocsparse_spmv_stage_preprocess stage does not support hipGraph.

Parameters:

handle – [in] handle to the rocsparse library context queue.
trans – [in] matrix operation type.
alpha – [in] scalar \(\alpha\).
mat – [in] matrix descriptor.
x – [in] vector descriptor.
beta – [in] scalar \(\beta\).
y – [inout] vector descriptor.
compute_type – [in] floating point precision for the SpMV computation.
alg – [in] SpMV algorithm for the SpMV computation.
stage – [in] SpMV stage for the SpMV computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when temp_buffer is nullptr.
temp_buffer – [in] temporary storage buffer allocated by the user. When the rocsparse_spmv_stage_buffer_size stage is passed, the required allocation size (in bytes) is written to buffer_size and function returns without performing the SpMV operation.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context handle was not initialized.
rocsparse_status_invalid_pointer – alpha, mat, x, beta, y or buffer_size pointer is invalid.
rocsparse_status_invalid_value – the value of trans, compute_type, alg, or stage is incorrect.
rocsparse_status_not_implemented – compute_type or alg is currently not supported.

rocsparse_v2_spmv_buffer_size()#

rocsparse_status rocsparse_v2_spmv_buffer_size(rocsparse_handle handle, rocsparse_spmv_descr descr, rocsparse_const_spmat_descr mat, rocsparse_const_dnvec_descr x, rocsparse_const_dnvec_descr y, rocsparse_v2_spmv_stage stage, size_t *buffer_size_in_bytes, rocsparse_error *error)#

rocsparse_v2_spmv_buffer_size returns the size of the required buffer to execute the given stage of the Version 2 SpMV operation. This routine is used in conjunction with rocsparse_v2_spmv(). See rocsparse_v2_spmv for full description and example.

Note

This routine does not support execution in a hipGraph context.

Parameters:

handle – [in] handle to the rocsparse library context queue.
descr – [in] SpMV descriptor
mat – [in] sparse matrix descriptor.
x – [in] dense vector descriptor.
y – [in] dense vector descriptor.
stage – [in] Version 2 SpMV stage for the SpMV computation.
buffer_size_in_bytes – [out] number of bytes of the buffer.
error – [out] error descriptor created if the returned status is not rocsparse_status_success. A null pointer can be passed if the user is not interested in obtaining an error descriptor.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_value – the stage value is invalid.
rocsparse_status_invalid_pointer – mat, x, y, descr or buffer_size_in_bytes pointer is invalid.

rocsparse_v2_spmv()#

rocsparse_status rocsparse_v2_spmv(rocsparse_handle handle, rocsparse_spmv_descr descr, const void *alpha, rocsparse_const_spmat_descr mat, rocsparse_const_dnvec_descr x, const void *beta, rocsparse_dnvec_descr y, rocsparse_v2_spmv_stage stage, size_t buffer_size_in_bytes, void *buffer, rocsparse_error *error)#

Sparse matrix vector multiplication.

rocsparse_v2_spmv multiplies the scalar \(\alpha\) with a sparse \(m \times n\) matrix \(op(A)\) with the dense vector \(x\) and adds the result to the dense vector \(y\) that is multiplied by the scalar \(\beta\), such that

\[ y := \alpha \cdot op(A) \cdot x + \beta \cdot y, \]

with

\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans == rocsparse_operation_none} \\ A^T, & \text{if trans == rocsparse_operation_transpose} \\ A^H, & \text{if trans == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]

Performing the above operation involves two stages. The first stage is rocsparse_v2_spmv_stage_analysis. This will perform an analysis of the symbolic information of \(op(A)\). The second stage is rocsparse_v2_spmv_stage_compute which corresponds to the actual calculation. The size of the buffer required for each stage is determined with calling the routine rocsparse_v2_spmv_buffer_size. The stage rocsparse_v2_spmv_stage_analysis only needs to be called once for a given sparse matrix \(op(A)\) while the computation stage can be repeatedly used with different \(x\) and \(y\) vectors.

rocsparse_v2_spmv supports multiple algorithms. These algorithms have different trade offs depending on the sparsity pattern of the matrix, whether or not the results need to be deterministic, and how many times the sparse-vector product will be performed.

Algorithm	Deterministic	Notes
rocsparse_spmv_alg_csr_rowsplit	Yes	Is best suited for matrices with all rows having a similar number of non-zeros. Can out perform adaptive and LRB algorithms in certain sparsity patterns. Will perform very poorly if some rows have few non-zeros and some rows have many non-zeros.
rocsparse_spmv_alg_csr_stream	Yes	[Deprecated] old name for rocsparse_spmv_alg_csr_rowsplit.
rocsparse_spmv_alg_csr_adaptive	No	Generally the fastest algorithm across all matrix sparsity patterns. This includes matrices that have some rows with many non-zeros and some rows with few non-zeros. Requires a lengthy preprocessing that needs to be amortized over many subsequent sparse vector products.
rocsparse_spmv_alg_csr_lrb	No	Like adaptive algorithm, generally performs well across all matrix sparsity patterns. Generally not as fast as adaptive algorithm, however uses a much faster pre-processing step. Good for when only a few number of sparse vector products will be performed.
rocsparse_spmv_alg_csr_nnzsplit	No	Like adaptive algorithm, generally performs well across all matrix sparsity patterns. Generally not as fast as adaptive algorithm but faster than LRB algorithm. It uses a much faster pre-processing step than LRB. Good for when the number of sparse vector products that will be performed is less than one hundred. If more products need to be computed, the adaptive algorithm is probably faster.

COO Algorithms	Deterministic	Notes
rocsparse_spmv_alg_coo	Yes	Generally not as fast as atomic algorithm but is deterministic
rocsparse_spmv_alg_coo_atomic	No	Generally the fastest COO algorithm

ELL Algorithms	Deterministic	Notes
rocsparse_spmv_alg_ell	Yes

BSR Algorithm	Deterministic	Notes
rocsparse_spmv_alg_bsr	Yes

rocsparse_v2_spmv supports multiple combinations of data types and compute types. The tables below indicate the currently supported different data types that can be used for for the sparse matrix \(op(A)\) and the dense vectors \(x\) and \(y\) and the compute type for \(\alpha\) and \(\beta\). The advantage of using different data types is to save on memory bandwidth and storage when a user application allows while performing the actual computation in a higher precision.

rocsparse_v2_spmv supports rocsparse_indextype_i32 and rocsparse_indextype_i64 index precisions for storing the row pointer and column indices arrays of the sparse matrices.

Uniform Precisions:

A / X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c

Mixed precisions:

A / X	Y	compute_type
rocsparse_datatype_i8_r	rocsparse_datatype_i32_r	rocsparse_datatype_i32_r
rocsparse_datatype_i8_r	rocsparse_datatype_f32_r	rocsparse_datatype_f32_r
rocsparse_datatype_f16_r	rocsparse_datatype_f32_r	rocsparse_datatype_f32_r

Mixed-regular real precisions

A	X / Y / compute_type
rocsparse_datatype_f32_r	rocsparse_datatype_f64_r
rocsparse_datatype_f32_c	rocsparse_datatype_f64_c

Mixed-regular Complex precisions

A	X / Y / compute_type
rocsparse_datatype_f32_r	rocsparse_datatype_f32_c
rocsparse_datatype_f64_r	rocsparse_datatype_f64_c

Example

int main()
{
    //     1 4 0 0 0 0
    // A = 0 2 3 0 0 0
    //     5 0 0 7 8 0
    //     0 0 9 0 6 0
    int m = 4;
    int n = 6;

    std::vector<int>    hcsr_row_ptr = {0, 2, 4, 7, 9};
    std::vector<int>    hcsr_col_ind = {0, 1, 1, 2, 0, 3, 4, 2, 4};
    std::vector<float>  hcsr_val     = {1, 4, 2, 3, 5, 7, 8, 9, 6};
    std::vector<double> hx(n, 1.0f);
    std::vector<double> hy(m, 0.0f);

    // Scalar alpha
    double alpha = 3.7f;

    // Scalar beta
    double beta = 0.0f;

    int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0];

    // Offload data to device
    int*    dcsr_row_ptr;
    int*    dcsr_col_ind;
    float*  dcsr_val;
    double* dx;
    double* dy;
    HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1)));
    HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz));
    HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz));
    HIP_CHECK(hipMalloc(&dx, sizeof(double) * n));
    HIP_CHECK(hipMalloc(&dy, sizeof(double) * m));

    HIP_CHECK(
        hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dx, hx.data(), sizeof(double) * n, hipMemcpyHostToDevice));

    rocsparse_handle      handle;
    rocsparse_error       p_error[1] = {};
    rocsparse_spmat_descr matA;
    rocsparse_dnvec_descr vecX;
    rocsparse_dnvec_descr vecY;

    rocsparse_indextype  row_idx_type = rocsparse_indextype_i32;
    rocsparse_indextype  col_idx_type = rocsparse_indextype_i32;
    rocsparse_datatype   data_type    = rocsparse_datatype_f32_r;
    rocsparse_index_base idx_base     = rocsparse_index_base_zero;

    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    // Create sparse matrix A
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA,
                                               m,
                                               n,
                                               nnz,
                                               dcsr_row_ptr,
                                               dcsr_col_ind,
                                               dcsr_val,
                                               row_idx_type,
                                               col_idx_type,
                                               idx_base,
                                               data_type));

    // Create dense vector X
    ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecX, n, dx, rocsparse_datatype_f64_r));

    // Create dense vector Y
    ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, m, dy, rocsparse_datatype_f64_r));

    rocsparse_spmv_descr spmv_descr;
    ROCSPARSE_CHECK(rocsparse_create_spmv_descr(&spmv_descr));

    const rocsparse_spmv_alg spmv_alg = rocsparse_spmv_alg_csr_adaptive;
    ROCSPARSE_CHECK(rocsparse_spmv_set_input(
        handle, spmv_descr, rocsparse_spmv_input_alg, &spmv_alg, sizeof(spmv_alg), p_error));

    const rocsparse_operation spmv_operation = rocsparse_operation_none;
    ROCSPARSE_CHECK(rocsparse_spmv_set_input(handle,
                                             spmv_descr,
                                             rocsparse_spmv_input_operation,
                                             &spmv_operation,
                                             sizeof(spmv_operation),
                                             p_error));

    const rocsparse_datatype spmv_scalar_datatype = rocsparse_datatype_f64_r;
    ROCSPARSE_CHECK(rocsparse_spmv_set_input(handle,
                                             spmv_descr,
                                             rocsparse_spmv_input_scalar_datatype,
                                             &spmv_scalar_datatype,
                                             sizeof(spmv_scalar_datatype),
                                             p_error));

    const rocsparse_datatype spmv_compute_datatype = rocsparse_datatype_f64_r;
    ROCSPARSE_CHECK(rocsparse_spmv_set_input(handle,
                                             spmv_descr,
                                             rocsparse_spmv_input_compute_datatype,
                                             &spmv_compute_datatype,
                                             sizeof(spmv_compute_datatype),
                                             p_error));

    // Call spmv to get buffer size
    size_t buffer_size;
    ROCSPARSE_CHECK(rocsparse_v2_spmv_buffer_size(handle,
                                                  spmv_descr,
                                                  matA,
                                                  vecX,
                                                  vecY,
                                                  rocsparse_v2_spmv_stage_analysis,
                                                  &buffer_size,
                                                  p_error));

    void* buffer;
    HIP_CHECK(hipMalloc(&buffer, buffer_size));

    // Call spmv to perform analysis
    ROCSPARSE_CHECK(rocsparse_v2_spmv(handle,
                                      spmv_descr,
                                      &alpha,
                                      matA,
                                      vecX,
                                      &beta,
                                      vecY,
                                      rocsparse_v2_spmv_stage_analysis,
                                      buffer_size,
                                      buffer,
                                      p_error));

    HIP_CHECK(hipFree(buffer));

    ROCSPARSE_CHECK(rocsparse_v2_spmv_buffer_size(handle,
                                                  spmv_descr,
                                                  matA,
                                                  vecX,
                                                  vecY,
                                                  rocsparse_v2_spmv_stage_compute,
                                                  &buffer_size,
                                                  p_error));

    HIP_CHECK(hipMalloc(&buffer, buffer_size));

    // Call spmv to perform computation
    ROCSPARSE_CHECK(rocsparse_v2_spmv(handle,
                                      spmv_descr,
                                      &alpha,
                                      matA,
                                      vecX,
                                      &beta,
                                      vecY,
                                      rocsparse_v2_spmv_stage_compute,
                                      buffer_size,
                                      buffer,
                                      p_error));

    HIP_CHECK(hipFree(buffer));

    ROCSPARSE_CHECK(rocsparse_destroy_error(p_error[0]));
    ROCSPARSE_CHECK(rocsparse_destroy_spmv_descr(spmv_descr));

    // Copy result back to host
    HIP_CHECK(hipMemcpy(hy.data(), dy, sizeof(double) * m, hipMemcpyDeviceToHost));

    // Clear rocSPARSE
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA));
    ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecX));
    ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY));
    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));

    // Clear device memory
    HIP_CHECK(hipFree(dcsr_row_ptr));
    HIP_CHECK(hipFree(dcsr_col_ind));
    HIP_CHECK(hipFree(dcsr_val));
    HIP_CHECK(hipFree(dx));
    HIP_CHECK(hipFree(dy));

    return 0;
}

Note

The sparse matrix format rocsparse_format_bell is not supported.

Note

The stage rocsparse_v2_spmv_stage_analysis is mandatory, an error will be returned if that stage was not executed before the stage rocsparse_v2_spmv_stage_compute.

Note

None of the algorithms above are deterministic when \(A\) is transposed.

Note

All the sparse matrix formats are supported except rocsparse_format_bell.

Note

The rocsparse_v2_spmv_stage_compute stage is non blocking and executed asynchronously with respect to the host. They may return before the actual computation has finished. The stage rocsparse_v2_spmv_stage_analysis is blocking with respect to the host.

Note

Only the stage rocsparse_v2_spmv_stage_compute supports execution in a hipGraph context. The rocsparse_v2_spmv_stage_analysis stage does not support hipGraph.

Parameters:

handle – [in] handle to the rocsparse library context queue.
descr – [in] spmv descriptor.
alpha – [in] scalar \(\alpha\).
mat – [in] matrix descriptor.
x – [in] vector descriptor.
beta – [in] scalar \(\beta\).
y – [inout] vector descriptor.
stage – [in] SpMV stage of the SpMV algorithm.
buffer_size_in_bytes – [in] size in bytes of the buffer, must be greater or equal to the buffer size obtained from rocsparse_v2_spmv_buffer_size.
buffer – [in] temporary buffer allocated by the user.
error – [out] error descriptor created if the returned status is not rocsparse_status_success. A null pointer can be passed if the user is not interested in obtaining an error descriptor.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context handle was not initialized.
rocsparse_status_invalid_pointer – alpha, mat, x, beta, y or buffer pointer is invalid.
rocsparse_status_invalid_value – the value of stage is invalid.
rocsparse_status_not_implemented – if alg is not supported or if the mixed precision configuration is not supported.

rocsparse_spsv()#

rocsparse_status rocsparse_spsv(rocsparse_handle handle, rocsparse_operation trans, const void *alpha, rocsparse_const_spmat_descr mat, rocsparse_const_dnvec_descr x, const rocsparse_dnvec_descr y, rocsparse_datatype compute_type, rocsparse_spsv_alg alg, rocsparse_spsv_stage stage, size_t *buffer_size, void *temp_buffer)#

Sparse triangular system solve.

rocsparse_spsv solves a triangular linear system of equations defined by a sparse \(m \times m\) square matrix \(op(A)\), given in CSR or COO storage format, such that

\[ op(A) \cdot y = \alpha \cdot x, \]

with

\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans == rocsparse_operation_none} \\ A^T, & \text{if trans == rocsparse_operation_transpose} \end{array} \right. \end{split}\]

and where \(y\) is the dense solution vector and \(x\) is the dense right-hand side vector.

Performing the above operation requires three stages. First, rocsparse_spsv must be called with the stage rocsparse_spsv_stage_buffer_size which will determine the size of the required temporary storage buffer. The user then allocates this buffer and calls rocsparse_spsv with the stage rocsparse_spsv_stage_preprocess which will perform analysis on the sparse matrix \(op(A)\). Finally, the user completes the computation by calling rocsparse_spsv with the stage rocsparse_spsv_stage_compute. The buffer size, buffer allocation, and preprecess stages only need to be called once for a given sparse matrix \(op(A)\) while the computation stage can be repeatedly used with different \(x\) and \(y\) vectors.

rocsparse_spsv supports rocsparse_indextype_i32 and rocsparse_indextype_i64 index types for storing the row pointer and column indices arrays of the sparse matrices. rocsparse_spsv supports the following data types for \(op(A)\), \(x\), \(y\) and compute types for \(\alpha\):

Uniform Precisions:

A / X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c

Example

int main()
{
    //     1 0 0 0
    // A = 4 2 0 0
    //     0 3 7 0
    //     0 0 0 1
    int m = 4;

    std::vector<int>   hcsr_row_ptr = {0, 1, 3, 5, 6};
    std::vector<int>   hcsr_col_ind = {0, 0, 1, 1, 2, 3};
    std::vector<float> hcsr_val     = {1, 4, 2, 3, 7, 1};
    std::vector<float> hx(m, 1.0f);
    std::vector<float> hy(m, 0.0f);

    // Scalar alpha
    float alpha = 1.0f;

    int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0];

    // Offload data to device
    int*   dcsr_row_ptr;
    int*   dcsr_col_ind;
    float* dcsr_val;
    float* dx;
    float* dy;
    HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1)));
    HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz));
    HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz));
    HIP_CHECK(hipMalloc(&dx, sizeof(float) * m));
    HIP_CHECK(hipMalloc(&dy, sizeof(float) * m));

    HIP_CHECK(
        hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dx, hx.data(), sizeof(float) * m, hipMemcpyHostToDevice));

    rocsparse_handle      handle;
    rocsparse_spmat_descr matA;
    rocsparse_dnvec_descr vecX;
    rocsparse_dnvec_descr vecY;

    rocsparse_indextype  row_idx_type = rocsparse_indextype_i32;
    rocsparse_indextype  col_idx_type = rocsparse_indextype_i32;
    rocsparse_datatype   data_type    = rocsparse_datatype_f32_r;
    rocsparse_datatype   compute_type = rocsparse_datatype_f32_r;
    rocsparse_index_base idx_base     = rocsparse_index_base_zero;
    rocsparse_operation  trans        = rocsparse_operation_none;

    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    // Create sparse matrix A
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA,
                                               m,
                                               m,
                                               nnz,
                                               dcsr_row_ptr,
                                               dcsr_col_ind,
                                               dcsr_val,
                                               row_idx_type,
                                               col_idx_type,
                                               idx_base,
                                               data_type));

    // Create dense vector X
    ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecX, m, dx, data_type));

    // Create dense vector Y
    ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, m, dy, data_type));

    // Call spsv to get buffer size
    size_t buffer_size;
    ROCSPARSE_CHECK(rocsparse_spsv(handle,
                                   trans,
                                   &alpha,
                                   matA,
                                   vecX,
                                   vecY,
                                   compute_type,
                                   rocsparse_spsv_alg_default,
                                   rocsparse_spsv_stage_buffer_size,
                                   &buffer_size,
                                   nullptr));

    void* temp_buffer;
    HIP_CHECK(hipMalloc(&temp_buffer, buffer_size));

    // Call spsv to perform analysis
    ROCSPARSE_CHECK(rocsparse_spsv(handle,
                                   trans,
                                   &alpha,
                                   matA,
                                   vecX,
                                   vecY,
                                   compute_type,
                                   rocsparse_spsv_alg_default,
                                   rocsparse_spsv_stage_preprocess,
                                   &buffer_size,
                                   temp_buffer));

    // Call spsv to perform computation
    ROCSPARSE_CHECK(rocsparse_spsv(handle,
                                   trans,
                                   &alpha,
                                   matA,
                                   vecX,
                                   vecY,
                                   compute_type,
                                   rocsparse_spsv_alg_default,
                                   rocsparse_spsv_stage_compute,
                                   &buffer_size,
                                   temp_buffer));

    // Copy result back to host
    HIP_CHECK(hipMemcpy(hy.data(), dy, sizeof(float) * m, hipMemcpyDeviceToHost));

    // Clear rocSPARSE
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA));
    ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecX));
    ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY));
    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));

    // Clear device memory
    HIP_CHECK(hipFree(dcsr_row_ptr));
    HIP_CHECK(hipFree(dcsr_col_ind));
    HIP_CHECK(hipFree(dcsr_val));
    HIP_CHECK(hipFree(dx));
    HIP_CHECK(hipFree(dy));
    HIP_CHECK(hipFree(temp_buffer));

    return 0;
}

Note

The sparse matrix formats currently supported are: rocsparse_format_coo and rocsparse_format_csr.

Note

Only the rocsparse_spsv_stage_buffer_size stage and the rocsparse_spsv_stage_compute stage are non blocking and executed asynchronously with respect to the host. They may return before the actual computation has finished. The rocsparse_spsv_stage_preprocess stage is blocking with respect to the host.

Note

Currently, only trans == rocsparse_operation_none and trans == rocsparse_operation_transpose is supported.

Note

Only the rocsparse_spsv_stage_buffer_size stage and the rocsparse_spsv_stage_compute stage support execution in a hipGraph context. The rocsparse_spsv_stage_preprocess stage does not support hipGraph.

Parameters:

handle – [in] handle to the rocsparse library context queue.
trans – [in] matrix operation type.
alpha – [in] scalar \(\alpha\).
mat – [in] matrix descriptor.
x – [in] vector descriptor.
y – [inout] vector descriptor.
compute_type – [in] floating point precision for the SpSV computation.
alg – [in] SpSV algorithm for the SpSV computation.
stage – [in] SpSV stage for the SpSV computation.
buffer_size – [out] number of bytes of the temporary storage buffer.
temp_buffer – [in] temporary storage buffer allocated by the user. When the rocsparse_spsv_stage_buffer_size stage is passed, the required allocation size (in bytes) is written to buffer_size and function returns without performing the SpSV operation. This buffer is non-persistent, no data are stored in it; therefore, this memory can be freed or reuse for other tasks between the analysis phase and the compute phase.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – alpha, mat, x, y or buffer_size pointer is invalid.
rocsparse_status_not_implemented – trans, compute_type, stage or alg is currently not supported.

rocsparse_spsm()#

rocsparse_status rocsparse_spsm(rocsparse_handle handle, rocsparse_operation trans_A, rocsparse_operation trans_B, const void *alpha, rocsparse_const_spmat_descr matA, rocsparse_const_dnmat_descr matB, const rocsparse_dnmat_descr matC, rocsparse_datatype compute_type, rocsparse_spsm_alg alg, rocsparse_spsm_stage stage, size_t *buffer_size, void *temp_buffer)#

Sparse triangular system solve with multiple right-hand sides.

rocsparse_spsm solves a triangular linear system of equations defined by a sparse \(m \times m\) square matrix \(op(A)\), given in CSR or COO storage format, such that

\[ op(A) \cdot C = \alpha \cdot op(B), \]

with

\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans == rocsparse_operation_none} \\ A^T, & \text{if trans == rocsparse_operation_transpose} \end{array} \right. \end{split}\]

and

\[\begin{split} op(B) = \left\{ \begin{array}{ll} B, & \text{if trans_B == rocsparse_operation_none} \\ B^T, & \text{if trans_B == rocsparse_operation_transpose} \end{array} \right. \end{split}\]

and where \(C\) is the dense solution matrix and \(B\) is the dense right-hand side matrix. Both \(B\) and \(C\) can be in row or column order.

Performing the above operation requires three stages. First, rocsparse_spsm must be called with the stage rocsparse_spsm_stage_buffer_size which will determine the size of the required temporary storage buffer. The user then allocates this buffer and calls rocsparse_spsm with the stage rocsparse_spsm_stage_preprocess which will perform analysis on the sparse matrix \(op(A)\). Finally, the user completes the computation by calling rocsparse_spsm with the stage rocsparse_spsm_stage_compute. The buffer size, buffer allocation, and preprecess stages only need to be called once for a given sparse triangular matrix \(op(A)\) while the computation stage can be repeatedly used with different \(B\) and \(C\) matrices. Once all calls to rocsparse_spsm are complete, the temporary buffer can be deallocated.

As noted above, both \(B\) and \(C\) can be in row or column order (this includes mixing the order so that \(B\) is row order and \(C\) is column order and vice versa). Internally however, rocSPARSE kernels solve the system assuming the matrices \(B\) and \(C\) are in row order as this provides the best memory access. This means that if the matrix \(C\) is not in row order and/or the matrix \(B\) is not row order (or \(B^{T}\) is not column order as this is equivalent to being in row order), then internally memory copies and/or transposing of data may be performed to get them into the correct order (possbily using extra buffer size). Once computation is completed, additional memory copies and/or transposing of data may be performed to get them back into the user arrays. For best performance and smallest required temporary storage buffers, use row order for the matrix \(C\) and row order for the matrix \(B\) (or column order if \(B\) is being transposed).

rocsparse_spsm supports rocsparse_indextype_i32 and rocsparse_indextype_i64 index precisions for storing the row pointer and column indices arrays of the sparse matrices. rocsparse_spsm supports the following data types for \(op(A)\), \(op(B)\), \(C\) and compute types for \(\alpha\):

Uniform Precisions:

A / B / C / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c

Example

int main()
{
    //     1 0 0 0
    // A = 4 2 0 0
    //     0 3 7 0
    //     0 0 0 1
    int m = 4;
    int n = 2;

    std::vector<int>   hcsr_row_ptr = {0, 1, 3, 5, 6};
    std::vector<int>   hcsr_col_ind = {0, 0, 1, 1, 2, 3};
    std::vector<float> hcsr_val     = {1, 4, 2, 3, 7, 1};
    std::vector<float> hB(m * n);
    std::vector<float> hC(m * n);

    for(int i = 0; i < n; i++)
    {
        for(int j = 0; j < m; j++)
        {
            hB[m * i + j] = static_cast<float>(i + 1);
        }
    }

    // Scalar alpha
    float alpha = 1.0f;

    int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0];

    // Offload data to device
    int*   dcsr_row_ptr;
    int*   dcsr_col_ind;
    float* dcsr_val;
    float* dB;
    float* dC;
    HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1)));
    HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz));
    HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz));
    HIP_CHECK(hipMalloc(&dB, sizeof(float) * m * n));
    HIP_CHECK(hipMalloc(&dC, sizeof(float) * m * n));

    HIP_CHECK(
        hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dB, hB.data(), sizeof(float) * m * n, hipMemcpyHostToDevice));

    rocsparse_handle      handle;
    rocsparse_spmat_descr matA;
    rocsparse_dnmat_descr matB;
    rocsparse_dnmat_descr matC;

    rocsparse_indextype  row_idx_type = rocsparse_indextype_i32;
    rocsparse_indextype  col_idx_type = rocsparse_indextype_i32;
    rocsparse_datatype   data_type    = rocsparse_datatype_f32_r;
    rocsparse_datatype   compute_type = rocsparse_datatype_f32_r;
    rocsparse_index_base idx_base     = rocsparse_index_base_zero;
    rocsparse_operation  trans_A      = rocsparse_operation_none;
    rocsparse_operation  trans_B      = rocsparse_operation_none;

    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    // Create sparse matrix A
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA,
                                               m,
                                               m,
                                               nnz,
                                               dcsr_row_ptr,
                                               dcsr_col_ind,
                                               dcsr_val,
                                               row_idx_type,
                                               col_idx_type,
                                               idx_base,
                                               data_type));

    // Create dense matrix B
    ROCSPARSE_CHECK(
        rocsparse_create_dnmat_descr(&matB, m, n, m, dB, data_type, rocsparse_order_column));

    // Create dense matrix C
    ROCSPARSE_CHECK(
        rocsparse_create_dnmat_descr(&matC, m, n, m, dC, data_type, rocsparse_order_column));

    // Call spsv to get buffer size
    size_t buffer_size;
    ROCSPARSE_CHECK(rocsparse_spsm(handle,
                                   trans_A,
                                   trans_B,
                                   &alpha,
                                   matA,
                                   matB,
                                   matC,
                                   compute_type,
                                   rocsparse_spsm_alg_default,
                                   rocsparse_spsm_stage_buffer_size,
                                   &buffer_size,
                                   nullptr));

    void* temp_buffer;
    HIP_CHECK(hipMalloc(&temp_buffer, buffer_size));

    // Call spsv to perform analysis
    ROCSPARSE_CHECK(rocsparse_spsm(handle,
                                   trans_A,
                                   trans_B,
                                   &alpha,
                                   matA,
                                   matB,
                                   matC,
                                   compute_type,
                                   rocsparse_spsm_alg_default,
                                   rocsparse_spsm_stage_preprocess,
                                   &buffer_size,
                                   temp_buffer));

    // Call spsv to perform computation
    ROCSPARSE_CHECK(rocsparse_spsm(handle,
                                   trans_A,
                                   trans_B,
                                   &alpha,
                                   matA,
                                   matB,
                                   matC,
                                   compute_type,
                                   rocsparse_spsm_alg_default,
                                   rocsparse_spsm_stage_compute,
                                   &buffer_size,
                                   temp_buffer));

    // Copy result back to host
    HIP_CHECK(hipMemcpy(hC.data(), dC, sizeof(float) * m * n, hipMemcpyDeviceToHost));

    std::cout << "hC" << std::endl;
    for(size_t i = 0; i < hC.size(); ++i)
    {
        std::cout << hC[i] << " ";
    }
    std::cout << std::endl;

    // Clear rocSPARSE
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA));
    ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(matB));
    ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(matC));
    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));

    // Clear device memory
    HIP_CHECK(hipFree(dcsr_row_ptr));
    HIP_CHECK(hipFree(dcsr_col_ind));
    HIP_CHECK(hipFree(dcsr_val));
    HIP_CHECK(hipFree(dB));
    HIP_CHECK(hipFree(dC));
    HIP_CHECK(hipFree(temp_buffer));

    return 0;
}

Note

The sparse matrix formats currently supported are: rocsparse_format_coo and rocsparse_format_csr.

Note

Only the rocsparse_spsm_stage_buffer_size stage and the rocsparse_spsm_stage_compute stage are non blocking and executed asynchronously with respect to the host. They may return before the actual computation has finished. The rocsparse_spsm_stage_preprocess stage is blocking with respect to the host.

Note

Currently, only trans_A == rocsparse_operation_none and trans_A == rocsparse_operation_transpose is supported. Currently, only trans_B == rocsparse_operation_none and trans_B == rocsparse_operation_transpose is supported.

Note

Only the rocsparse_spsm_stage_buffer_size stage and the rocsparse_spsm_stage_compute stage support execution in a hipGraph context. The rocsparse_spsm_stage_preprocess stage does not support hipGraph.

Parameters:

handle – [in] handle to the rocsparse library context queue.
trans_A – [in] matrix operation type for the sparse matrix \(op(A)\).
trans_B – [in] matrix operation type for the dense matrix \(op(B)\).
alpha – [in] scalar \(\alpha\).
matA – [in] sparse matrix descriptor.
matB – [in] dense matrix descriptor.
matC – [inout] dense matrix descriptor.
compute_type – [in] floating point precision for the SpSM computation.
alg – [in] SpSM algorithm for the SpSM computation.
stage – [in] SpSM stage for the SpSM computation.
buffer_size – [out] number of bytes of the temporary storage buffer.
temp_buffer – [in] temporary storage buffer allocated by the user. When the rocsparse_spsm_stage_buffer_size stage is passed, the required allocation size (in bytes) is written to buffer_size and function returns without performing the SpSM operation.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – alpha, matA, matB, matC, descr or buffer_size pointer is invalid.
rocsparse_status_not_implemented – trans_A, trans_B, compute_type, stage or alg is currently not supported.

rocsparse_spmm()#

rocsparse_status rocsparse_spmm(rocsparse_handle handle, rocsparse_operation trans_A, rocsparse_operation trans_B, const void *alpha, rocsparse_const_spmat_descr mat_A, rocsparse_const_dnmat_descr mat_B, const void *beta, const rocsparse_dnmat_descr mat_C, rocsparse_datatype compute_type, rocsparse_spmm_alg alg, rocsparse_spmm_stage stage, size_t *buffer_size, void *temp_buffer)#

Sparse matrix dense matrix multiplication.

rocsparse_spmm multiplies the scalar \(\alpha\) with a sparse \(m \times k\) matrix \(op(A)\), defined in CSR, COO, BSR or Blocked ELL storage format, and the dense \(k \times n\) matrix \(op(B)\) and adds the result to the dense \(m \times n\) matrix \(C\) that is multiplied by the scalar \(\beta\), such that

\[ C := \alpha \cdot op(A) \cdot op(B) + \beta \cdot C, \]

with

\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans_A == rocsparse_operation_none} \\ A^T, & \text{if trans_A == rocsparse_operation_transpose} \\ A^H, & \text{if trans_A == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]

and

\[\begin{split} op(B) = \left\{ \begin{array}{ll} B, & \text{if trans_B == rocsparse_operation_none} \\ B^T, & \text{if trans_B == rocsparse_operation_transpose} \\ B^H, & \text{if trans_B == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]

Both \(B\) and \(C\) can be in row or column order.

rocsparse_spmm requires three stages to complete. First, the user passes the rocsparse_spmm_stage_buffer_size stage to determine the size of the required temporary storage buffer. Next, the user allocates this buffer and calls rocsparse_spmm again with the rocsparse_spmm_stage_preprocess stage which will perform analysis on the sparse matrix \(op(A)\). Finally, the user calls rocsparse_spmm with the rocsparse_spmm_stage_compute stage to perform the actual computation. The buffer size, buffer allocation, and preprecess stages only need to be called once for a given sparse matrix \(op(A)\) while the computation stage can be repeatedly used with different \(B\) and \(C\) matrices. Once all calls to rocsparse_spmm are complete, the temporary buffer can be deallocated.

As noted above, both \(B\) and \(C\) can be in row or column order (this includes mixing the order so that \(B\) is row order and \(C\) is column order and vice versa). For best performance, use row order for both \(B\) and \(C\) as this provides the best memory access.

rocsparse_spmm supports multiple different algorithms. These algorithms have different trade offs depending on the sparsity pattern of the matrix, whether or not the results need to be deterministic, and how many times the sparse-matrix product will be performed.

CSR Algorithms	Deterministic	Preprocessing	Notes
rocsparse_spmm_alg_csr	Yes	No	Default algorithm.
rocsparse_spmm_alg_csr_row_split	Yes	No	Assigns a fixed number of threads per row regardless of the number of non-zeros in each row. This can perform well when each row in the matrix has roughly the same number of non-zeros
rocsparse_spmm_alg_csr_nnz_split	No	Yes	Distributes work by having each thread block work on a fixed number of non-zeros regardless of the number of rows that might be involved. This can perform well when the matrix has some rows with few non-zeros and some rows with many non-zeros
rocsparse_spmm_alg_csr_merge_path	No	Yes	Attempts to combine the approaches of row-split and non-zero split by having each block work on a fixed amount of work which can be either non-zeros or rows

COO Algorithms	Deterministic	Preprocessing	Notes
rocsparse_spmm_alg_coo_segmented	Yes	No	Generally not as fast as atomic algorithm but is deterministic
rocsparse_spmm_alg_coo_atomic	No	No	Generally the fastest COO algorithm. Is the default algorithm
rocsparse_spmm_alg_coo_segmented_atomic	No	No

ELL Algorithms	Deterministic	Preprocessing	Notes
rocsparse_spmm_alg_bell	Yes	No

BSR Algorithms	Deterministic	Preprocessing	Notes
rocsparse_spmm_alg_bsr	Yes	No

One can also pass rocsparse_spmm_alg_default which will automatically select from the algorithms listed above based on the sparse matrix format. In the case of CSR matrices this will set the algorithm to be rocsparse_spmm_alg_csr, in the case of Blocked ELL matrices this will set the algorithm to be rocsparse_spmm_alg_bell, in the case of BSR matrix this will set the algorithm to be rocsparse_spmm_alg_bsr and for COO matrices it will set the algorithm to be rocsparse_spmm_alg_coo_atomic.

When A is transposed, rocsparse_spmm will revert to using rocsparse_spmm_alg_csr for CSR format and rocsparse_spmm_alg_coo_atomic for COO format regardless of algorithm selected.

rocsparse_spmm supports multiple combinations of data types and compute types. The tables below indicate the currently supported different data types that can be used for for the sparse matrix \(op(A)\) and the dense matrices \(op(B)\) and \(C\) and the compute type for \(\alpha\) and \(\beta\). The advantage of using different data types is to save on memory bandwidth and storage when a user application allows while performing the actual computation in a higher precision.

rocsparse_spmm supports rocsparse_indextype_i32 and rocsparse_indextype_i64 index precisions for storing the row pointer and column indices arrays of the sparse matrices.

Uniform Precisions:

A / B / C / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c

Mixed precisions:

A / B	C	compute_type
rocsparse_datatype_i8_r	rocsparse_datatype_i32_r	rocsparse_datatype_i32_r
rocsparse_datatype_i8_r	rocsparse_datatype_f32_r	rocsparse_datatype_f32_r
rocsparse_datatype_f16_r	rocsparse_datatype_f32_r	rocsparse_datatype_f32_r
rocsparse_datatype_bf16_r	rocsparse_datatype_f32_r	rocsparse_datatype_f32_r

rocsparse_spmm also supports batched computation for CSR and COO matrices. There are three supported batch modes:

\[\begin{split} C_i = A \times B_i \\ C_i = A_i \times B \\ C_i = A_i \times B_i \end{split}\]

The batch mode is determined by the batch count and stride passed for each matrix. For example to use the first batch mode ( \(C_i = A \times B_i\)) with 100 batches for non-transposed \(A\), \(B\), and \(C\), one passes:

\[\begin{split} batch\_count\_A=1 \\ batch\_count\_B=100 \\ batch\_count\_C=100 \\ offsets\_batch\_stride\_A=0 \\ columns\_values\_batch\_stride\_A=0 \\ batch\_stride\_B=k*n \\ batch\_stride\_C=m*n \end{split}\]

To use the second batch mode ( \(C_i = A_i \times B\)) one could use:

\[\begin{split} batch\_count\_A=100 \\ batch\_count\_B=1 \\ batch\_count\_C=100 \\ offsets\_batch\_stride\_A=m+1 \\ columns\_values\_batch\_stride\_A=nnz \\ batch\_stride\_B=0 \\ batch\_stride\_C=m*n \end{split}\]

And to use the third batch mode ( \(C_i = A_i \times B_i\)) one could use:

\[\begin{split} batch\_count\_A=100 \\ batch\_count\_B=100 \\ batch\_count\_C=100 \\ offsets\_batch\_stride\_A=m+1 \\ columns\_values\_batch\_stride_A=nnz \\ batch\_stride_B=k*n \\ batch\_stride_C=m*n \end{split}\]

See examples below.

Example

This example performs sparse matrix-dense matrix multiplication, \(C := \alpha \cdot A \cdot B + \beta \cdot C\)

int main()
{
    //     1 4 0 0 0 0
    // A = 0 2 3 0 0 0
    //     5 0 0 7 8 0
    //     0 0 9 0 6 0

    //     1 4 2
    //     1 2 3
    // B = 5 4 0
    //     3 1 9
    //     1 2 2
    //     0 3 0

    //     1 1 5
    // C = 1 2 1
    //     1 3 1
    //     6 2 4

    int m = 4;
    int k = 6;
    int n = 3;

    std::vector<int>   hcsr_row_ptr = {0, 2, 4, 7, 9};
    std::vector<int>   hcsr_col_ind = {0, 1, 1, 2, 0, 3, 4, 2, 4};
    std::vector<float> hcsr_val     = {1, 4, 2, 3, 5, 7, 8, 9, 6};

    std::vector<float> hB(k * n, 1.0f);
    std::vector<float> hC(m * n, 1.0f);

    int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0];

    float alpha = 1.0f;
    float beta  = 0.0f;

    // Create CSR arrays on device
    int*   dcsr_row_ptr;
    int*   dcsr_col_ind;
    float* dcsr_val;
    float* dB;
    float* dC;
    HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1)));
    HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz));
    HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz));
    HIP_CHECK(hipMalloc(&dB, sizeof(float) * k * n));
    HIP_CHECK(hipMalloc(&dC, sizeof(float) * m * n));

    HIP_CHECK(
        hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dB, hB.data(), sizeof(float) * k * n, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dC, hC.data(), sizeof(float) * m * n, hipMemcpyHostToDevice));

    // Create rocsparse handle
    rocsparse_handle handle;
    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    // Types
    rocsparse_indextype itype = rocsparse_indextype_i32;
    rocsparse_indextype jtype = rocsparse_indextype_i32;
    rocsparse_datatype  ttype = rocsparse_datatype_f32_r;

    // Create descriptors
    rocsparse_spmat_descr mat_A;
    rocsparse_dnmat_descr mat_B;
    rocsparse_dnmat_descr mat_C;

    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&mat_A,
                                               m,
                                               k,
                                               nnz,
                                               dcsr_row_ptr,
                                               dcsr_col_ind,
                                               dcsr_val,
                                               itype,
                                               jtype,
                                               rocsparse_index_base_zero,
                                               ttype));
    ROCSPARSE_CHECK(
        rocsparse_create_dnmat_descr(&mat_B, k, n, k, dB, ttype, rocsparse_order_column));
    ROCSPARSE_CHECK(
        rocsparse_create_dnmat_descr(&mat_C, m, n, m, dC, ttype, rocsparse_order_column));

    // Query SpMM buffer
    size_t buffer_size;
    ROCSPARSE_CHECK(rocsparse_spmm(handle,
                                   rocsparse_operation_none,
                                   rocsparse_operation_none,
                                   &alpha,
                                   mat_A,
                                   mat_B,
                                   &beta,
                                   mat_C,
                                   ttype,
                                   rocsparse_spmm_alg_default,
                                   rocsparse_spmm_stage_buffer_size,
                                   &buffer_size,
                                   nullptr));

    // Allocate buffer
    void* buffer;
    HIP_CHECK(hipMalloc(&buffer, buffer_size));

    ROCSPARSE_CHECK(rocsparse_spmm(handle,
                                   rocsparse_operation_none,
                                   rocsparse_operation_none,
                                   &alpha,
                                   mat_A,
                                   mat_B,
                                   &beta,
                                   mat_C,
                                   ttype,
                                   rocsparse_spmm_alg_default,
                                   rocsparse_spmm_stage_preprocess,
                                   &buffer_size,
                                   buffer));

    // Pointer mode host
    ROCSPARSE_CHECK(rocsparse_spmm(handle,
                                   rocsparse_operation_none,
                                   rocsparse_operation_none,
                                   &alpha,
                                   mat_A,
                                   mat_B,
                                   &beta,
                                   mat_C,
                                   ttype,
                                   rocsparse_spmm_alg_default,
                                   rocsparse_spmm_stage_compute,
                                   &buffer_size,
                                   buffer));

    // Clear up on device
    HIP_CHECK(hipFree(dcsr_row_ptr));
    HIP_CHECK(hipFree(dcsr_col_ind));
    HIP_CHECK(hipFree(dcsr_val));
    HIP_CHECK(hipFree(dB));
    HIP_CHECK(hipFree(dC));
    HIP_CHECK(hipFree(buffer));

    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(mat_A));
    ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(mat_B));
    ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(mat_C));

    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));

    return 0;
}

Example

An example of the first batch mode ( \(C_i = A \times B_i\)) is provided below.

int main()
{
    //     1 4 0 0 0 0
    // A = 0 2 3 0 0 0
    //     5 0 0 7 8 0
    //     0 0 9 0 6 0
    int m             = 4;
    int k             = 6;
    int n             = 3;
    int batch_count_A = 1;
    int batch_count_B = 100;
    int batch_count_C = 100;

    std::vector<int>   hcsr_row_ptr = {0, 2, 4, 7, 9};
    std::vector<int>   hcsr_col_ind = {0, 1, 1, 2, 0, 3, 4, 2, 4};
    std::vector<float> hcsr_val     = {1, 4, 2, 3, 5, 7, 8, 9, 6};

    std::vector<float> hB(batch_count_B * k * n, 1.0f);
    std::vector<float> hC(batch_count_C * m * n, 1.0f);

    int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0];

    int offsets_batch_stride_A        = 0;
    int columns_values_batch_stride_A = 0;
    int batch_stride_B                = k * n;
    int batch_stride_C                = m * n;

    float alpha = 1.0f;
    float beta  = 0.0f;

    // Create CSR arrays on device
    int*   dcsr_row_ptr;
    int*   dcsr_col_ind;
    float* dcsr_val;
    float* dB;
    float* dC;
    HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1)));
    HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz));
    HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz));
    HIP_CHECK(hipMalloc(&dB, sizeof(float) * batch_count_B * k * n));
    HIP_CHECK(hipMalloc(&dC, sizeof(float) * batch_count_C * m * n));

    HIP_CHECK(
        hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dB, hB.data(), sizeof(float) * batch_count_B * k * n, hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dC, hC.data(), sizeof(float) * batch_count_C * m * n, hipMemcpyHostToDevice));

    // Create rocsparse handle
    rocsparse_handle handle;
    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    // Types
    rocsparse_indextype itype = rocsparse_indextype_i32;
    rocsparse_indextype jtype = rocsparse_indextype_i32;
    rocsparse_datatype  ttype = rocsparse_datatype_f32_r;

    // Create descriptors
    rocsparse_spmat_descr mat_A;
    rocsparse_dnmat_descr mat_B;
    rocsparse_dnmat_descr mat_C;

    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&mat_A,
                                               m,
                                               k,
                                               nnz,
                                               dcsr_row_ptr,
                                               dcsr_col_ind,
                                               dcsr_val,
                                               itype,
                                               jtype,
                                               rocsparse_index_base_zero,
                                               ttype));
    ROCSPARSE_CHECK(
        rocsparse_create_dnmat_descr(&mat_B, k, n, k, dB, ttype, rocsparse_order_column));
    ROCSPARSE_CHECK(
        rocsparse_create_dnmat_descr(&mat_C, m, n, m, dC, ttype, rocsparse_order_column));

    ROCSPARSE_CHECK(rocsparse_csr_set_strided_batch(
        mat_A, batch_count_A, offsets_batch_stride_A, columns_values_batch_stride_A));
    ROCSPARSE_CHECK(rocsparse_dnmat_set_strided_batch(mat_B, batch_count_B, batch_stride_B));
    ROCSPARSE_CHECK(rocsparse_dnmat_set_strided_batch(mat_C, batch_count_C, batch_stride_C));

    // Query SpMM buffer
    size_t buffer_size;
    ROCSPARSE_CHECK(rocsparse_spmm(handle,
                                   rocsparse_operation_none,
                                   rocsparse_operation_none,
                                   &alpha,
                                   mat_A,
                                   mat_B,
                                   &beta,
                                   mat_C,
                                   ttype,
                                   rocsparse_spmm_alg_default,
                                   rocsparse_spmm_stage_buffer_size,
                                   &buffer_size,
                                   nullptr));

    // Allocate buffer
    void* buffer;
    HIP_CHECK(hipMalloc(&buffer, buffer_size));

    ROCSPARSE_CHECK(rocsparse_spmm(handle,
                                   rocsparse_operation_none,
                                   rocsparse_operation_none,
                                   &alpha,
                                   mat_A,
                                   mat_B,
                                   &beta,
                                   mat_C,
                                   ttype,
                                   rocsparse_spmm_alg_default,
                                   rocsparse_spmm_stage_preprocess,
                                   &buffer_size,
                                   buffer));

    // Pointer mode host
    ROCSPARSE_CHECK(rocsparse_spmm(handle,
                                   rocsparse_operation_none,
                                   rocsparse_operation_none,
                                   &alpha,
                                   mat_A,
                                   mat_B,
                                   &beta,
                                   mat_C,
                                   ttype,
                                   rocsparse_spmm_alg_default,
                                   rocsparse_spmm_stage_compute,
                                   &buffer_size,
                                   buffer));

    // Clear up on device
    HIP_CHECK(hipFree(dcsr_row_ptr));
    HIP_CHECK(hipFree(dcsr_col_ind));
    HIP_CHECK(hipFree(dcsr_val));
    HIP_CHECK(hipFree(dB));
    HIP_CHECK(hipFree(dC));
    HIP_CHECK(hipFree(buffer));

    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(mat_A));
    ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(mat_B));
    ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(mat_C));

    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));

    return 0;
}

Note

None of the algorithms above are deterministic when \(A\) is transposed or conjugate transposed.

Note

All algorithms perform best when using row ordering for the dense \(B\) and \(C\) matrices

Note

The sparse matrix formats currently supported are: rocsparse_format_coo, rocsparse_format_csr, rocsparse_format_csc, rocsparse_format_bsr, and rocsparse_format_bell.

Note

Mixed precisions only supported for BSR, CSR, CSC, and COO matrix formats.

Note

Only the rocsparse_spmm_stage_buffer_size stage and the rocsparse_spmm_stage_compute stage are non blocking and executed asynchronously with respect to the host. They may return before the actual computation has finished. The rocsparse_spmm_stage_preprocess stage is blocking with respect to the host.

Note

Currently, only trans_A == rocsparse_operation_none is supported for COO and Blocked ELL formats.

Note

Only the rocsparse_spmm_stage_buffer_size stage and the rocsparse_spmm_stage_compute stage support execution in a hipGraph context. The rocsparse_spmm_stage_preprocess stage does not support hipGraph.

Note

Currently, only CSR, COO, BSR and Blocked ELL sparse formats are supported.

Parameters:

handle – [in] handle to the rocsparse library context queue.
trans_A – [in] matrix operation type.
trans_B – [in] matrix operation type.
alpha – [in] scalar \(\alpha\).
mat_A – [in] matrix descriptor.
mat_B – [in] matrix descriptor.
beta – [in] scalar \(\beta\).
mat_C – [in] matrix descriptor.
compute_type – [in] floating point precision for the SpMM computation.
alg – [in] SpMM algorithm for the SpMM computation.
stage – [in] SpMM stage for the SpMM computation.
buffer_size – [out] number of bytes of the temporary storage buffer.
temp_buffer – [in] temporary storage buffer allocated by the user. When the rocsparse_spmm_stage_buffer_size stage is passed, the required allocation size (in bytes) is written to buffer_size and function returns without performing the SpMM operation.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – alpha, mat_A, mat_B, mat_C, beta, or buffer_size pointer is invalid.
rocsparse_status_not_implemented – trans_A, trans_B, compute_type or alg is currently not supported.

rocsparse_spgemm()#

rocsparse_status rocsparse_spgemm(rocsparse_handle handle, rocsparse_operation trans_A, rocsparse_operation trans_B, const void *alpha, rocsparse_const_spmat_descr A, rocsparse_const_spmat_descr B, const void *beta, rocsparse_const_spmat_descr D, rocsparse_spmat_descr C, rocsparse_datatype compute_type, rocsparse_spgemm_alg alg, rocsparse_spgemm_stage stage, size_t *buffer_size, void *temp_buffer)#

Sparse matrix sparse matrix multiplication.

rocsparse_spgemm multiplies the scalar \(\alpha\) with the sparse \(m \times k\) matrix \(op(A)\) and the sparse \(k \times n\) matrix \(op(B)\) and adds the result to the sparse \(m \times n\) matrix \(D\) that is multiplied by \(\beta\). The final result is stored in the sparse \(m \times n\) matrix \(C\), such that

\[ C := \alpha \cdot op(A) \cdot op(B) + \beta \cdot D, \]

with

\[ op(A) = \left\{ \begin{array}{ll} A, & \text{if trans_A == rocsparse_operation_none} \end{array} \right. \]

and

\[ op(B) = \left\{ \begin{array}{ll} B, & \text{if trans_B == rocsparse_operation_none} \end{array} \right. \]

rocsparse_spgemm requires three stages to complete. First, the user passes the rocsparse_spgemm_stage_buffer_size stage to determine the size of the required temporary storage buffer. Next, the user allocates this buffer and calls rocsparse_spgemm again with the rocsparse_spgemm_stage_nnz stage which will determine the number of non-zeros in \(C\). This stage will also fill in the row pointer array of \(C\). Now that the number of non-zeros in \(C\) is known, the user allocates space for the column indices and values arrays of \(C\). Finally, the user calls rocsparse_spgemm with the rocsparse_spgemm_stage_compute stage to perform the actual computation which fills in the column indices and values arrays of \(C\). Once all calls to rocsparse_spgemm are complete, the temporary buffer can be deallocated.

Alternatively, the user may also want to perform sparse matrix products multiple times with matrices having the same sparsity pattern, but whose values differ. In this scenario, the process begins like before. First, the user calls rocsparse_spgemm with stage rocsparse_spgemm_stage_buffer_size to determine the required buffer size. The user again allocates this buffer and calls rocsparse_spgemm with the stage rocsparse_spgemm_stage_nnz to determine the number of non-zeros in \(C\). The user allocates the \(C\) column indices and values arrays. Now, however, the user calls rocsparse_spgemm with the rocsparse_spgemm_stage_symbolic stage which will fill in the column indices array of \(C\) but not the values array. The user is then free to repeatedly change the values of \(A\), \(B\), and \(D\) and call rocsparse_spgemm with the rocsparse_spgemm_stage_numeric stage which fill the values array of \(C\). The use of the extra rocsparse_spgemm_stage_symbolic and rocsparse_spgemm_stage_numeric stages allows the user to compute sparsity pattern of \(C\) once, but compute the values multiple times.

rocsparse_spgemm supports multiple combinations of data types and compute types. The tables below indicate the currently supported different data types that can be used for for the sparse matrices \(op(A)\), \(op(B)\), \(C\), and \(D\) and the compute type for \(\alpha\) and \(\beta\). The advantage of using different data types is to save on memory bandwidth and storage when a user application allows while performing the actual computation in a higher precision.

rocsparse_spgemm supports rocsparse_indextype_i32 and rocsparse_indextype_i64 index precisions for storing the row pointer and column indices arrays of the sparse matrices.

Uniform Precisions:

A / B / C / D / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c

In general, when multiplying two sparse matrices together, it is entirely possible that the resulting matrix will require a larger index representation to store correctly. For example, when multiplying \(A \times B\) using rocsparse_indextype_i32 index types for the row pointer and column indices arrays, it may be the case that the row pointer of the resulting \(C\) matrix would require index precision rocsparse_indextype_i64. This is currently not supported. In this scenario, the user would need to store the \(A\) and \(B\) matrices using the higher index precision.

Example

int main()
{
    // A - m x k
    // B - k x n
    // C - m x n
    int m = 4;
    int n = 4;
    int k = 3;

    // A
    // 1 2 3
    // 0 1 0
    // 2 0 0
    // 0 0 3

    // B
    // 0 1 2 0
    // 0 0 0 1
    // 1 2 3 4

    std::vector<int>   hcsr_row_ptr_A = {0, 3, 4, 5};
    std::vector<int>   hcsr_col_ind_A = {0, 1, 2, 1, 0, 2};
    std::vector<float> hcsr_val_A     = {1.0f, 2.0f, 3.0f, 1.0f, 2.0f, 3.0f};

    std::vector<int>   hcsr_row_ptr_B = {0, 2, 3, 7};
    std::vector<int>   hcsr_col_ind_B = {1, 2, 3, 0, 1, 2, 3};
    std::vector<float> hcsr_val_B     = {1.0f, 2.0f, 1.0f, 1.0f, 2.0f, 3.0f, 4.0f};

    int nnz_A = hcsr_val_A.size();
    int nnz_B = hcsr_val_B.size();

    float alpha = 1.0f;
    float beta  = 0.0f;

    int*   dcsr_row_ptr_A;
    int*   dcsr_col_ind_A;
    float* dcsr_val_A;

    int*   dcsr_row_ptr_B;
    int*   dcsr_col_ind_B;
    float* dcsr_val_B;

    int* dcsr_row_ptr_C;

    HIP_CHECK(hipMalloc(&dcsr_row_ptr_A, (m + 1) * sizeof(int)));
    HIP_CHECK(hipMalloc(&dcsr_col_ind_A, nnz_A * sizeof(int)));
    HIP_CHECK(hipMalloc(&dcsr_val_A, nnz_A * sizeof(float)));

    HIP_CHECK(hipMalloc(&dcsr_row_ptr_B, (k + 1) * sizeof(int)));
    HIP_CHECK(hipMalloc(&dcsr_col_ind_B, nnz_B * sizeof(int)));
    HIP_CHECK(hipMalloc(&dcsr_val_B, nnz_B * sizeof(float)));

    HIP_CHECK(hipMalloc(&dcsr_row_ptr_C, (m + 1) * sizeof(int)));

    HIP_CHECK(hipMemcpy(
        dcsr_row_ptr_A, hcsr_row_ptr_A.data(), (m + 1) * sizeof(int), hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(
        dcsr_col_ind_A, hcsr_col_ind_A.data(), nnz_A * sizeof(int), hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dcsr_val_A, hcsr_val_A.data(), nnz_A * sizeof(float), hipMemcpyHostToDevice));

    HIP_CHECK(hipMemcpy(
        dcsr_row_ptr_B, hcsr_row_ptr_B.data(), (k + 1) * sizeof(int), hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(
        dcsr_col_ind_B, hcsr_col_ind_B.data(), nnz_B * sizeof(int), hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dcsr_val_B, hcsr_val_B.data(), nnz_B * sizeof(float), hipMemcpyHostToDevice));

    rocsparse_handle      handle;
    rocsparse_spmat_descr matA, matB, matC, matD;
    void*                 temp_buffer;
    size_t                buffer_size = 0;

    rocsparse_operation  trans_A    = rocsparse_operation_none;
    rocsparse_operation  trans_B    = rocsparse_operation_none;
    rocsparse_index_base index_base = rocsparse_index_base_zero;
    rocsparse_indextype  itype      = rocsparse_indextype_i32;
    rocsparse_indextype  jtype      = rocsparse_indextype_i32;
    rocsparse_datatype   ttype      = rocsparse_datatype_f32_r;

    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    // Create sparse matrix A in CSR format
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA,
                                               m,
                                               k,
                                               nnz_A,
                                               dcsr_row_ptr_A,
                                               dcsr_col_ind_A,
                                               dcsr_val_A,
                                               itype,
                                               jtype,
                                               index_base,
                                               ttype));

    // Create sparse matrix B in CSR format
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matB,
                                               k,
                                               n,
                                               nnz_B,
                                               dcsr_row_ptr_B,
                                               dcsr_col_ind_B,
                                               dcsr_val_B,
                                               itype,
                                               jtype,
                                               index_base,
                                               ttype));

    // Create sparse matrix C in CSR format
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(
        &matC, m, n, 0, dcsr_row_ptr_C, nullptr, nullptr, itype, jtype, index_base, ttype));

    // Create sparse matrix D in CSR format
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(
        &matD, 0, 0, 0, nullptr, nullptr, nullptr, itype, jtype, index_base, ttype));

    // Determine buffer size
    ROCSPARSE_CHECK(rocsparse_spgemm(handle,
                                     trans_A,
                                     trans_B,
                                     &alpha,
                                     matA,
                                     matB,
                                     &beta,
                                     matD,
                                     matC,
                                     ttype,
                                     rocsparse_spgemm_alg_default,
                                     rocsparse_spgemm_stage_buffer_size,
                                     &buffer_size,
                                     nullptr));

    HIP_CHECK(hipMalloc(&temp_buffer, buffer_size));

    // Determine number of non-zeros in C matrix
    ROCSPARSE_CHECK(rocsparse_spgemm(handle,
                                     trans_A,
                                     trans_B,
                                     &alpha,
                                     matA,
                                     matB,
                                     &beta,
                                     matD,
                                     matC,
                                     ttype,
                                     rocsparse_spgemm_alg_default,
                                     rocsparse_spgemm_stage_nnz,
                                     &buffer_size,
                                     temp_buffer));

    int64_t rows_C;
    int64_t cols_C;
    int64_t nnz_C;

    // Extract number of non-zeros in C matrix so we can allocate the column indices and values arrays
    ROCSPARSE_CHECK(rocsparse_spmat_get_size(matC, &rows_C, &cols_C, &nnz_C));

    std::cout << "rows_C: " << rows_C << " cols_C: " << cols_C << " nnz_C: " << nnz_C << std::endl;
    int*   dcsr_col_ind_C;
    float* dcsr_val_C;
    HIP_CHECK(hipMalloc(&dcsr_col_ind_C, sizeof(int) * nnz_C));
    HIP_CHECK(hipMalloc(&dcsr_val_C, sizeof(float) * nnz_C));

    // Set C matrix pointers
    ROCSPARSE_CHECK(rocsparse_csr_set_pointers(matC, dcsr_row_ptr_C, dcsr_col_ind_C, dcsr_val_C));

    // SpGEMM computation
    ROCSPARSE_CHECK(rocsparse_spgemm(handle,
                                     trans_A,
                                     trans_B,
                                     &alpha,
                                     matA,
                                     matB,
                                     &beta,
                                     matD,
                                     matC,
                                     ttype,
                                     rocsparse_spgemm_alg_default,
                                     rocsparse_spgemm_stage_compute,
                                     &buffer_size,
                                     temp_buffer));

    // Copy C matrix result back to host
    std::vector<int>   hcsr_row_ptr_C(m + 1);
    std::vector<int>   hcsr_col_ind_C(nnz_C);
    std::vector<float> hcsr_val_C(nnz_C);

    HIP_CHECK(hipMemcpy(
        hcsr_row_ptr_C.data(), dcsr_row_ptr_C, sizeof(int) * (m + 1), hipMemcpyDeviceToHost));
    HIP_CHECK(hipMemcpy(
        hcsr_col_ind_C.data(), dcsr_col_ind_C, sizeof(int) * nnz_C, hipMemcpyDeviceToHost));
    HIP_CHECK(
        hipMemcpy(hcsr_val_C.data(), dcsr_val_C, sizeof(float) * nnz_C, hipMemcpyDeviceToHost));

    // Destroy matrix descriptors
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA));
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matB));
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matC));
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matD));
    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));

    // Free device arrays
    HIP_CHECK(hipFree(temp_buffer));
    HIP_CHECK(hipFree(dcsr_row_ptr_A));
    HIP_CHECK(hipFree(dcsr_col_ind_A));
    HIP_CHECK(hipFree(dcsr_val_A));

    HIP_CHECK(hipFree(dcsr_row_ptr_B));
    HIP_CHECK(hipFree(dcsr_col_ind_B));
    HIP_CHECK(hipFree(dcsr_val_B));

    HIP_CHECK(hipFree(dcsr_row_ptr_C));
    HIP_CHECK(hipFree(dcsr_col_ind_C));
    HIP_CHECK(hipFree(dcsr_val_C));

    return 0;
}

Note

This function does not produce deterministic results.

Note

SpGEMM requires three stages to complete. The first stage rocsparse_spgemm_stage_buffer_size will return the size of the temporary storage buffer that is required for subsequent calls to rocsparse_spgemm. The second stage rocsparse_spgemm_stage_nnz will determine the number of non-zero elements of the resulting \(C\) matrix. If the sparsity pattern of \(C\) is already known, this stage can be skipped. In the final stage rocsparse_spgemm_stage_compute, the actual computation is performed.

Note

If \(\alpha == 0\), then \(C = \beta \cdot D\) will be computed.

Note

If \(\beta == 0\), then \(C = \alpha \cdot op(A) \cdot op(B)\) will be computed.

Note

Currently only CSR and BSR formats are supported.

Note

If rocsparse_spgemm_stage_symbolic is selected then the symbolic computation is performed only.

Note

If rocsparse_spgemm_stage_numeric is selected then the numeric computation is performed only.

Note

For the rocsparse_spgemm_stage_symbolic and rocsparse_spgemm_stage_numeric stages, only CSR matrix format is currently supported.

Note

\(\alpha == beta == 0\) is invalid.

Note

It is allowed to pass the same sparse matrix for \(C\) and \(D\), if both matrices have the same sparsity pattern.

Note

Currently, only trans_A == rocsparse_operation_none is supported.

Note

Currently, only trans_B == rocsparse_operation_none is supported.

Note

This function is non blocking and executed asynchronously with respect to the host. It may return before the actual computation has finished.

Note

Please note, that for rare matrix products with more than 4096 non-zero entries per row, additional temporary storage buffer is allocated by the algorithm.

Note

This routine does not support execution in a hipGraph context.

Parameters:

handle – [in] handle to the rocsparse library context queue.
trans_A – [in] sparse matrix \(A\) operation type.
trans_B – [in] sparse matrix \(B\) operation type.
alpha – [in] scalar \(\alpha\).
A – [in] sparse matrix \(A\) descriptor.
B – [in] sparse matrix \(B\) descriptor.
beta – [in] scalar \(\beta\).
D – [in] sparse matrix \(D\) descriptor.
C – [out] sparse matrix \(C\) descriptor.
compute_type – [in] floating point precision for the SpGEMM computation.
alg – [in] SpGEMM algorithm for the SpGEMM computation.
stage – [in] SpGEMM stage for the SpGEMM computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when temp_buffer is nullptr.
temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to buffer_size and function returns without performing the SpGEMM operation.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – alpha and beta are invalid, A, B, D, C or buffer_size pointer is invalid.
rocsparse_status_memory_error – additional buffer for long rows could not be allocated.
rocsparse_status_not_implemented – trans_A != rocsparse_operation_none or trans_B != rocsparse_operation_none.

rocsparse_spgeam_buffer_size()#

rocsparse_status rocsparse_spgeam_buffer_size(rocsparse_handle handle, rocsparse_spgeam_descr descr, rocsparse_const_spmat_descr mat_A, rocsparse_const_spmat_descr mat_B, rocsparse_const_spmat_descr mat_C, rocsparse_spgeam_stage stage, size_t *buffer_size, rocsparse_error *error)#

rocsparse_spgeam_buffer_size returns the size of the required buffer to execute the given stage of the SpGEAM operation. This routine is used in conjunction with rocsparse_spgeam(). See rocsparse_spgeam for full description and example.

Note

This routine does not support execution in a hipGraph context.

Parameters:

handle – [in] handle to the rocsparse library context queue.
descr – [in] SpGEAM descriptor
mat_A – [in] sparse matrix \(A\) descriptor.
mat_B – [in] sparse matrix \(B\) descriptor.
mat_C – [in] sparse matrix \(C\) descriptor.
stage – [in] SpGEAM stage for the SpGEAM computation.
buffer_size – [out] number of bytes of the temporary storage buffer.
error – [out] error descriptor created if the returned status is not rocsparse_status_success. A null pointer can be passed if the user is not interested in obtaining an error descriptor.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – mat_A, mat_B, descr or buffer_size pointer is invalid.

rocsparse_spgeam()#

rocsparse_status rocsparse_spgeam(rocsparse_handle handle, rocsparse_spgeam_descr descr, rocsparse_const_spmat_descr mat_A, rocsparse_const_spmat_descr mat_B, rocsparse_spmat_descr mat_C, rocsparse_spgeam_stage stage, size_t buffer_size, void *temp_buffer, rocsparse_error *error)#

Sparse matrix sparse matrix addition.

rocsparse_spgeam multiplies the scalar \(\alpha\) with the sparse \(m \times n\) CSR matrix \(op(A)\) and adds it to \(\beta\) multiplied by the sparse \(m \times n\) matrix \(op(B)\). The final result is stored in the sparse \(m \times n\) matrix \(C\), such that

\[ C := \alpha op(A) + \beta op(B), \]

with

\[ op(A) = \left\{ \begin{array}{ll} A, & \text{if trans_A == rocsparse_operation_none} \end{array} \right. \]

and

\[ op(B) = \left\{ \begin{array}{ll} B, & \text{if trans_B == rocsparse_operation_none} \end{array} \right. \]

rocsparse_spgeam requires multiple steps to complete. First, the user must create a rocsparse_spgeam_descr by calling rocsparse_create_spgeam_descr. The user sets the SpGEAM algorithm (currently only rocsparse_spgeam_alg_default supported) as well as the compute type and the transpose operation type for the sparse matrices \(op(A)\) and \(op(B)\) using rocsparse_spgeam_set_input. Next, the user must calculate the total nonzeros that will exist in the sparse matrix \(C\). To do so, the user calls rocsparse_spgeam_buffer_size with the stage set to rocsparse_spgeam_stage_analysis. This will fill the buffer_size parameter allowing the user to then allocate this buffer. Once the buffer has been allocated, the user calls rocsparse_spgeam with the same stage rocsparse_spgeam_stage_analysis. The total nonzeros and the row offset array for \(C\) has now been calculated and is stored internally in the rocsparse_spgeam_descr. The user now needs to retrieve the nonzero count using rocsparse_spgeam_get_output and then allocate the \(C\) matrix. To complete the computation, the user repeats the process (this time passing the stage rocsparse_spgeam_stage_compute) by calling rocsparse_spgeam_buffer_size to determine the required buffer size, then allocating the buffer, and finally calling rocsparse_spgeam. The user allocated buffers can be freed after each call to rocsparse_spgeam. Once the computation is complete and the SpGEAM descriptor is no longer needed, the user must call rocsparse_destroy_spgeam_descr. See full code example below.

The stage rocsparse_spgeam_stage_compute computes the symbolic part and the numeric of the resulting matrix C. If the user wants to perform multiple operations involving matrices of same sparsity patterns but with different numerical values, then the symbolic stages (rocsparse_spgeam_stage_symbolic_analysis and rocsparse_spgeam_stage_symbolic_compute) and the numeric stages (rocsparse_spgeam_stage_numeric_analysis and rocsparse_spgeam_stage_numeric_compute) can be used to separate the symbolic calculation from the numeric calculation.

rocsparse_spgeam supports multiple combinations of index types, data types, and compute types. The tables below indicate the currently supported different index and data types that can be used for the sparse matrices \(op(A)\), \(op(B)\), and \(C\), and the compute type for \(\alpha\) and \(\beta\). The advantage of using different index and data types is to save on memory bandwidth and storage when a user application allows while performing the actual computation in a higher precision.

In general, when adding two sparse matrices together, it is entirely possible that the resulting matrix will require a a larger index representation to store correctly. For example, when adding \(A + B\) using rocsparse_indextype_i32 index types for the row pointer and column indices arrays, it may be the case that the row pointer of the resulting \(C\) matrix would require index type rocsparse_indextype_i64. This is currently not supported. In this scenario, the user would need to store the \(A\), \(B\), and \(C\) matrices using the higher index precision.

Uniform Precisions:

A / B / C / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c

Uniform Index Types:

CSR Row offset	CSR Column indices
rocsparse_datatype_f32_r	rocsparse_datatype_f32_r
rocsparse_datatype_f64_r	rocsparse_datatype_f64_r

Mixed Index Types:

CSR Row offset	CSR Column indices
rocsparse_datatype_f64_r	rocsparse_datatype_f32_r

Additionally, all three matrices \(A\), \(B\), and \(C\) must use the same index types. For example, if \(A\) uses the index type rocsparse_datatype_f32_r for the row offset array and the index type rocsparse_datatype_f32_r for the column indices array, then both \(B\) and \(C\) must also use these same index types for their respective row offset and column index arrays. In the scenario where \(C\) requires a larger index type for the row offset array, the user would need to store all three matrices using the larger index type rocsparse_datatype_f64_r for the row offsets array.

First Example

int main()
{
    // A - m x n
    // B - m x n
    // C - m x n
    int m = 4;
    int n = 6;

    // 1 2 0 0 3 7
    // 0 0 1 4 6 8
    // 0 2 0 4 0 0
    // 9 8 0 0 2 0
    std::vector<int> hcsr_row_ptr_A = {0, 4, 8, 10, 13}; // host A m x n matrix
    std::vector<int> hcsr_col_ind_A
        = {0, 1, 4, 5, 2, 3, 4, 5, 1, 3, 0, 1, 4}; // host A m x n matrix
    std::vector<float> hcsr_val_A = {1, 2, 3, 7, 1, 4, 6, 8, 2, 4, 9, 8, 2}; // host A m x n matrix

    // 0 2 1 0 0 5
    // 0 1 1 3 0 2
    // 0 0 0 0 0 0
    // 1 2 3 4 5 6
    std::vector<int> hcsr_row_ptr_B = {0, 3, 7, 7, 13}; // host B m x n matrix
    std::vector<int> hcsr_col_ind_B
        = {1, 2, 5, 1, 2, 3, 5, 0, 1, 2, 3, 4, 5}; // host B m x n matrix
    std::vector<float> hcsr_val_B = {2, 1, 5, 1, 1, 3, 2, 1, 2, 3, 4, 5, 6}; // host B m x n matrix

    int nnz_A = hcsr_val_A.size();
    int nnz_B = hcsr_val_B.size();

    float alpha = 1.0f;
    float beta  = 1.0f;

    int*   dcsr_row_ptr_A;
    int*   dcsr_col_ind_A;
    float* dcsr_val_A;

    int*   dcsr_row_ptr_B;
    int*   dcsr_col_ind_B;
    float* dcsr_val_B;

    HIP_CHECK(hipMalloc(&dcsr_row_ptr_A, (m + 1) * sizeof(int)));
    HIP_CHECK(hipMalloc(&dcsr_col_ind_A, nnz_A * sizeof(int)));
    HIP_CHECK(hipMalloc(&dcsr_val_A, nnz_A * sizeof(float)));

    HIP_CHECK(hipMalloc(&dcsr_row_ptr_B, (m + 1) * sizeof(int)));
    HIP_CHECK(hipMalloc(&dcsr_col_ind_B, nnz_B * sizeof(int)));
    HIP_CHECK(hipMalloc(&dcsr_val_B, nnz_B * sizeof(float)));

    HIP_CHECK(hipMemcpy(
        dcsr_row_ptr_A, hcsr_row_ptr_A.data(), (m + 1) * sizeof(int), hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(
        dcsr_col_ind_A, hcsr_col_ind_A.data(), nnz_A * sizeof(int), hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dcsr_val_A, hcsr_val_A.data(), nnz_A * sizeof(float), hipMemcpyHostToDevice));

    HIP_CHECK(hipMemcpy(
        dcsr_row_ptr_B, hcsr_row_ptr_B.data(), (m + 1) * sizeof(int), hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(
        dcsr_col_ind_B, hcsr_col_ind_B.data(), nnz_B * sizeof(int), hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dcsr_val_B, hcsr_val_B.data(), nnz_B * sizeof(float), hipMemcpyHostToDevice));

    rocsparse_handle      handle;
    rocsparse_error       p_error[1] = {};
    rocsparse_spmat_descr matA, matB, matC;
    rocsparse_index_base  index_base = rocsparse_index_base_zero;
    rocsparse_indextype   itype      = rocsparse_indextype_i32;
    rocsparse_indextype   jtype      = rocsparse_indextype_i32;
    rocsparse_datatype    ttype      = rocsparse_datatype_f32_r;

    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    hipStream_t stream;
    ROCSPARSE_CHECK(rocsparse_get_stream(handle, &stream));

    // Create sparse matrix A in CSR format
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA,
                                               m,
                                               n,
                                               nnz_A,
                                               dcsr_row_ptr_A,
                                               dcsr_col_ind_A,
                                               dcsr_val_A,
                                               itype,
                                               jtype,
                                               index_base,
                                               ttype));

    // Create sparse matrix B in CSR format
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matB,
                                               m,
                                               n,
                                               nnz_B,
                                               dcsr_row_ptr_B,
                                               dcsr_col_ind_B,
                                               dcsr_val_B,
                                               itype,
                                               jtype,
                                               index_base,
                                               ttype));

    // Create SpGEAM descriptor.
    rocsparse_spgeam_descr descr;
    ROCSPARSE_CHECK(rocsparse_create_spgeam_descr(&descr));

    // Set the algorithm on the descriptor
    const rocsparse_spgeam_alg alg = rocsparse_spgeam_alg_default;
    ROCSPARSE_CHECK(rocsparse_spgeam_set_input(
        handle, descr, rocsparse_spgeam_input_alg, &alg, sizeof(alg), p_error));

    // Set the transpose operation for sparses matrix A and B on the descriptor
    const rocsparse_operation trans_A = rocsparse_operation_none;
    const rocsparse_operation trans_B = rocsparse_operation_none;
    ROCSPARSE_CHECK(rocsparse_spgeam_set_input(
        handle, descr, rocsparse_spgeam_input_operation_A, &trans_A, sizeof(trans_A), p_error));
    ROCSPARSE_CHECK(rocsparse_spgeam_set_input(
        handle, descr, rocsparse_spgeam_input_operation_B, &trans_B, sizeof(trans_B), p_error));

    // Set the scalar type on the descriptor
    const rocsparse_datatype scalar_datatype = rocsparse_datatype_f32_r;
    ROCSPARSE_CHECK(rocsparse_spgeam_set_input(handle,
                                               descr,
                                               rocsparse_spgeam_input_scalar_datatype,
                                               &scalar_datatype,
                                               sizeof(scalar_datatype),
                                               p_error));

    // Set the compute type on the descriptor
    const rocsparse_datatype compute_datatype = rocsparse_datatype_f32_r;
    ROCSPARSE_CHECK(rocsparse_spgeam_set_input(handle,
                                               descr,
                                               rocsparse_spgeam_input_compute_datatype,
                                               &compute_datatype,
                                               sizeof(compute_datatype),
                                               p_error));

    // Calculate NNZ phase
    size_t buffer_size_in_bytes;
    void*  buffer;
    ROCSPARSE_CHECK(rocsparse_spgeam_buffer_size(handle,
                                                 descr,
                                                 matA,
                                                 matB,
                                                 nullptr,
                                                 rocsparse_spgeam_stage_analysis,
                                                 &buffer_size_in_bytes,
                                                 p_error));

    HIP_CHECK(hipMalloc(&buffer, buffer_size_in_bytes));
    ROCSPARSE_CHECK(rocsparse_spgeam(handle,
                                     descr,
                                     matA,
                                     matB,
                                     nullptr,
                                     rocsparse_spgeam_stage_analysis,
                                     buffer_size_in_bytes,
                                     buffer,
                                     p_error));
    HIP_CHECK(hipFree(buffer));

    // Ensure analysis stage is complete before grabbing C non-zero count
    HIP_CHECK(hipStreamSynchronize(stream));

    int64_t nnz_C;
    ROCSPARSE_CHECK(rocsparse_spgeam_get_output(
        handle, descr, rocsparse_spgeam_output_nnz, &nnz_C, sizeof(int64_t), p_error));

    // Compute column indices and values of C
    int*   dcsr_row_ptr_C;
    int*   dcsr_col_ind_C;
    float* dcsr_val_C;
    HIP_CHECK(hipMalloc(&dcsr_row_ptr_C, (m + 1) * sizeof(int)));
    HIP_CHECK(hipMalloc(&dcsr_col_ind_C, sizeof(int32_t) * nnz_C));
    HIP_CHECK(hipMalloc(&dcsr_val_C, sizeof(float) * nnz_C));

    // Create sparse matrix C in CSR format
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matC,
                                               m,
                                               n,
                                               nnz_C,
                                               dcsr_row_ptr_C,
                                               dcsr_col_ind_C,
                                               dcsr_val_C,
                                               itype,
                                               jtype,
                                               index_base,
                                               ttype));

    // Compute phase
    ROCSPARSE_CHECK(rocsparse_spgeam_buffer_size(handle,
                                                 descr,
                                                 matA,
                                                 matB,
                                                 matC,
                                                 rocsparse_spgeam_stage_compute,
                                                 &buffer_size_in_bytes,
                                                 p_error));

    // Set alpha and beta
    ROCSPARSE_CHECK(rocsparse_spgeam_set_input(
        handle, descr, rocsparse_spgeam_input_scalar_alpha, &alpha, sizeof(&alpha), p_error));
    ROCSPARSE_CHECK(rocsparse_spgeam_set_input(
        handle, descr, rocsparse_spgeam_input_scalar_beta, &beta, sizeof(&beta), p_error));

    HIP_CHECK(hipMalloc(&buffer, buffer_size_in_bytes));
    ROCSPARSE_CHECK(rocsparse_spgeam(handle,
                                     descr,
                                     matA,
                                     matB,
                                     matC,
                                     rocsparse_spgeam_stage_compute,
                                     buffer_size_in_bytes,
                                     buffer,
                                     p_error));
    HIP_CHECK(hipFree(buffer));

    // Copy C matrix result back to host
    std::vector<int>   hcsr_row_ptr_C(m + 1);
    std::vector<int>   hcsr_col_ind_C(nnz_C);
    std::vector<float> hcsr_val_C(nnz_C);

    HIP_CHECK(hipMemcpy(
        hcsr_row_ptr_C.data(), dcsr_row_ptr_C, sizeof(int) * (m + 1), hipMemcpyDeviceToHost));
    HIP_CHECK(hipMemcpy(
        hcsr_col_ind_C.data(), dcsr_col_ind_C, sizeof(int) * nnz_C, hipMemcpyDeviceToHost));
    HIP_CHECK(
        hipMemcpy(hcsr_val_C.data(), dcsr_val_C, sizeof(float) * nnz_C, hipMemcpyDeviceToHost));

    std::cout << "C" << std::endl;
    for(int i = 0; i < m; i++)
    {
        int start = hcsr_row_ptr_C[i];
        int end   = hcsr_row_ptr_C[i + 1];

        std::vector<float> htemp(n, 0.0f);
        for(int j = start; j < end; j++)
        {
            htemp[hcsr_col_ind_C[j]] = hcsr_val_C[j];
        }

        for(int j = 0; j < n; j++)
        {
            std::cout << htemp[j] << " ";
        }
        std::cout << "" << std::endl;
    }
    std::cout << "" << std::endl;

    // Destroy matrix descriptors
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA));
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matB));
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matC));
    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));
    ROCSPARSE_CHECK(rocsparse_destroy_error(p_error[0]));

    // Free device arrays
    HIP_CHECK(hipFree(dcsr_row_ptr_A));
    HIP_CHECK(hipFree(dcsr_col_ind_A));
    HIP_CHECK(hipFree(dcsr_val_A));

    HIP_CHECK(hipFree(dcsr_row_ptr_B));
    HIP_CHECK(hipFree(dcsr_col_ind_B));
    HIP_CHECK(hipFree(dcsr_val_B));

    HIP_CHECK(hipFree(dcsr_row_ptr_C));
    HIP_CHECK(hipFree(dcsr_col_ind_C));
    HIP_CHECK(hipFree(dcsr_val_C));
    return 0;
}

Second Example

int main()
{
    // A - m x n
    // B - m x n
    // C - m x n
    int m = 4;
    int n = 6;

    // 1 2 0 0 3 7
    // 0 0 1 4 6 8
    // 0 2 0 4 0 0
    // 9 8 0 0 2 0
    std::vector<int> hcsr_row_ptr_A = {0, 4, 8, 10, 13}; // host A m x n matrix
    std::vector<int> hcsr_col_ind_A
        = {0, 1, 4, 5, 2, 3, 4, 5, 1, 3, 0, 1, 4}; // host A m x n matrix
    std::vector<float> hcsr_val_A = {1, 2, 3, 7, 1, 4, 6, 8, 2, 4, 9, 8, 2}; // host A m x n matrix

    // 0 2 1 0 0 5
    // 0 1 1 3 0 2
    // 0 0 0 0 0 0
    // 1 2 3 4 5 6
    std::vector<int> hcsr_row_ptr_B = {0, 3, 7, 7, 13}; // host B m x n matrix
    std::vector<int> hcsr_col_ind_B
        = {1, 2, 5, 1, 2, 3, 5, 0, 1, 2, 3, 4, 5}; // host B m x n matrix
    std::vector<float> hcsr_val_B = {2, 1, 5, 1, 1, 3, 2, 1, 2, 3, 4, 5, 6}; // host B m x n matrix

    int nnz_A = hcsr_val_A.size();
    int nnz_B = hcsr_val_B.size();

    float alpha = 1.0f;
    float beta  = 1.0f;

    int*   dcsr_row_ptr_A;
    int*   dcsr_col_ind_A;
    float* dcsr_val_A;

    int*   dcsr_row_ptr_B;
    int*   dcsr_col_ind_B;
    float* dcsr_val_B;

    HIP_CHECK(hipMalloc(&dcsr_row_ptr_A, (m + 1) * sizeof(int)));
    HIP_CHECK(hipMalloc(&dcsr_col_ind_A, nnz_A * sizeof(int)));
    HIP_CHECK(hipMalloc(&dcsr_val_A, nnz_A * sizeof(float)));

    HIP_CHECK(hipMalloc(&dcsr_row_ptr_B, (m + 1) * sizeof(int)));
    HIP_CHECK(hipMalloc(&dcsr_col_ind_B, nnz_B * sizeof(int)));
    HIP_CHECK(hipMalloc(&dcsr_val_B, nnz_B * sizeof(float)));

    HIP_CHECK(hipMemcpy(
        dcsr_row_ptr_A, hcsr_row_ptr_A.data(), (m + 1) * sizeof(int), hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(
        dcsr_col_ind_A, hcsr_col_ind_A.data(), nnz_A * sizeof(int), hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dcsr_val_A, hcsr_val_A.data(), nnz_A * sizeof(float), hipMemcpyHostToDevice));

    HIP_CHECK(hipMemcpy(
        dcsr_row_ptr_B, hcsr_row_ptr_B.data(), (m + 1) * sizeof(int), hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(
        dcsr_col_ind_B, hcsr_col_ind_B.data(), nnz_B * sizeof(int), hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dcsr_val_B, hcsr_val_B.data(), nnz_B * sizeof(float), hipMemcpyHostToDevice));

    rocsparse_handle      handle;
    rocsparse_error       p_error[1] = {};
    rocsparse_spmat_descr matA, matB, matC;
    rocsparse_index_base  index_base = rocsparse_index_base_zero;
    rocsparse_indextype   itype      = rocsparse_indextype_i32;
    rocsparse_indextype   jtype      = rocsparse_indextype_i32;
    rocsparse_datatype    ttype      = rocsparse_datatype_f32_r;

    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    hipStream_t stream;
    ROCSPARSE_CHECK(rocsparse_get_stream(handle, &stream));

    // Create sparse matrix A in CSR format
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA,
                                               m,
                                               n,
                                               nnz_A,
                                               dcsr_row_ptr_A,
                                               dcsr_col_ind_A,
                                               dcsr_val_A,
                                               itype,
                                               jtype,
                                               index_base,
                                               ttype));

    // Create sparse matrix B in CSR format
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matB,
                                               m,
                                               n,
                                               nnz_B,
                                               dcsr_row_ptr_B,
                                               dcsr_col_ind_B,
                                               dcsr_val_B,
                                               itype,
                                               jtype,
                                               index_base,
                                               ttype));

    // Create SpGEAM descriptor.
    rocsparse_spgeam_descr descr;
    ROCSPARSE_CHECK(rocsparse_create_spgeam_descr(&descr));

    // Set the algorithm on the descriptor
    const rocsparse_spgeam_alg alg = rocsparse_spgeam_alg_default;
    ROCSPARSE_CHECK(rocsparse_spgeam_set_input(
        handle, descr, rocsparse_spgeam_input_alg, &alg, sizeof(alg), p_error));

    // Set the transpose operation for sparses matrix A and B on the descriptor
    const rocsparse_operation trans_A = rocsparse_operation_none;
    const rocsparse_operation trans_B = rocsparse_operation_none;
    ROCSPARSE_CHECK(rocsparse_spgeam_set_input(
        handle, descr, rocsparse_spgeam_input_operation_A, &trans_A, sizeof(trans_A), p_error));
    ROCSPARSE_CHECK(rocsparse_spgeam_set_input(
        handle, descr, rocsparse_spgeam_input_operation_B, &trans_B, sizeof(trans_B), p_error));

    // Set the scalar type on the descriptor
    const rocsparse_datatype scalar_datatype = rocsparse_datatype_f32_r;
    ROCSPARSE_CHECK(rocsparse_spgeam_set_input(handle,
                                               descr,
                                               rocsparse_spgeam_input_scalar_datatype,
                                               &scalar_datatype,
                                               sizeof(scalar_datatype),
                                               p_error));

    // Set alpha and beta.
    ROCSPARSE_CHECK(rocsparse_spgeam_set_input(
        handle, descr, rocsparse_spgeam_input_scalar_alpha, &alpha, sizeof(&alpha), p_error));
    ROCSPARSE_CHECK(rocsparse_spgeam_set_input(
        handle, descr, rocsparse_spgeam_input_scalar_beta, &beta, sizeof(&beta), p_error));

    // Set the compute type on the descriptor
    const rocsparse_datatype compute_datatype = rocsparse_datatype_f32_r;
    ROCSPARSE_CHECK(rocsparse_spgeam_set_input(handle,
                                               descr,
                                               rocsparse_spgeam_input_compute_datatype,
                                               &compute_datatype,
                                               sizeof(compute_datatype),
                                               p_error));

    // Calculate NNZ phase
    size_t buffer_size_in_bytes;
    void*  buffer;
    ROCSPARSE_CHECK(rocsparse_spgeam_buffer_size(handle,
                                                 descr,
                                                 matA,
                                                 matB,
                                                 nullptr,
                                                 rocsparse_spgeam_stage_symbolic_analysis,
                                                 &buffer_size_in_bytes,
                                                 p_error));

    HIP_CHECK(hipMalloc(&buffer, buffer_size_in_bytes));
    ROCSPARSE_CHECK(rocsparse_spgeam(handle,
                                     descr,
                                     matA,
                                     matB,
                                     nullptr,
                                     rocsparse_spgeam_stage_symbolic_analysis,
                                     buffer_size_in_bytes,
                                     buffer,
                                     p_error));
    HIP_CHECK(hipFree(buffer));

    // Ensure analysis stage is complete before grabbing C non-zero count
    HIP_CHECK(hipStreamSynchronize(stream));

    int64_t nnz_C;
    ROCSPARSE_CHECK(rocsparse_spgeam_get_output(
        handle, descr, rocsparse_spgeam_output_nnz, &nnz_C, sizeof(int64_t), p_error));

    // Compute column indices and values of C
    int*   dcsr_row_ptr_C;
    int*   dcsr_col_ind_C;
    float* dcsr_val_C;
    HIP_CHECK(hipMalloc(&dcsr_row_ptr_C, (m + 1) * sizeof(int)));
    HIP_CHECK(hipMalloc(&dcsr_col_ind_C, sizeof(int32_t) * nnz_C));
    HIP_CHECK(hipMalloc(&dcsr_val_C, sizeof(float) * nnz_C));

    // Create sparse matrix C in CSR format
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matC,
                                               m,
                                               n,
                                               nnz_C,
                                               dcsr_row_ptr_C,
                                               dcsr_col_ind_C,
                                               dcsr_val_C,
                                               itype,
                                               jtype,
                                               index_base,
                                               ttype));

    // Symbolic compute phase
    ROCSPARSE_CHECK(rocsparse_spgeam_buffer_size(handle,
                                                 descr,
                                                 matA,
                                                 matB,
                                                 matC,
                                                 rocsparse_spgeam_stage_symbolic_compute,
                                                 &buffer_size_in_bytes,
                                                 p_error));

    HIP_CHECK(hipMalloc(&buffer, buffer_size_in_bytes));
    ROCSPARSE_CHECK(rocsparse_spgeam(handle,
                                     descr,
                                     matA,
                                     matB,
                                     matC,
                                     rocsparse_spgeam_stage_symbolic_compute,
                                     buffer_size_in_bytes,
                                     buffer,
                                     p_error));
    HIP_CHECK(hipFree(buffer));

    ROCSPARSE_CHECK(rocsparse_spgeam_buffer_size(handle,
                                                 descr,
                                                 matA,
                                                 matB,
                                                 matC,
                                                 rocsparse_spgeam_stage_numeric_analysis,
                                                 &buffer_size_in_bytes,
                                                 p_error));

    HIP_CHECK(hipMalloc(&buffer, buffer_size_in_bytes));
    ROCSPARSE_CHECK(rocsparse_spgeam(handle,
                                     descr,
                                     matA,
                                     matB,
                                     matC,
                                     rocsparse_spgeam_stage_numeric_analysis,
                                     buffer_size_in_bytes,
                                     buffer,
                                     p_error));
    HIP_CHECK(hipFree(buffer));

    // First Numeric compute phase
    ROCSPARSE_CHECK(rocsparse_spgeam_buffer_size(handle,
                                                 descr,
                                                 matA,
                                                 matB,
                                                 matC,
                                                 rocsparse_spgeam_stage_numeric_compute,
                                                 &buffer_size_in_bytes,
                                                 p_error));

    HIP_CHECK(hipMalloc(&buffer, buffer_size_in_bytes));
    ROCSPARSE_CHECK(rocsparse_spgeam(handle,
                                     descr,
                                     matA,
                                     matB,
                                     matC,
                                     rocsparse_spgeam_stage_numeric_compute,
                                     buffer_size_in_bytes,
                                     buffer,
                                     p_error));
    HIP_CHECK(hipFree(buffer));

    // Second numeric compute phase
    hcsr_val_B[0] += 0.125;
    hcsr_val_B[1] += 0.5;
    HIP_CHECK(
        hipMemcpy(dcsr_val_B, hcsr_val_B.data(), nnz_B * sizeof(float), hipMemcpyHostToDevice));
    HIP_CHECK(hipMalloc(&buffer, buffer_size_in_bytes));
    ROCSPARSE_CHECK(rocsparse_spgeam(handle,
                                     descr,
                                     matA,
                                     matB,
                                     matC,
                                     rocsparse_spgeam_stage_numeric_compute,
                                     buffer_size_in_bytes,
                                     buffer,
                                     p_error));
    HIP_CHECK(hipFree(buffer));

    // Copy C matrix result back to host
    std::vector<int>   hcsr_row_ptr_C(m + 1);
    std::vector<int>   hcsr_col_ind_C(nnz_C);
    std::vector<float> hcsr_val_C(nnz_C);

    HIP_CHECK(hipMemcpy(
        hcsr_row_ptr_C.data(), dcsr_row_ptr_C, sizeof(int) * (m + 1), hipMemcpyDeviceToHost));
    HIP_CHECK(hipMemcpy(
        hcsr_col_ind_C.data(), dcsr_col_ind_C, sizeof(int) * nnz_C, hipMemcpyDeviceToHost));
    HIP_CHECK(
        hipMemcpy(hcsr_val_C.data(), dcsr_val_C, sizeof(float) * nnz_C, hipMemcpyDeviceToHost));

    // Destroy matrix descriptors
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA));
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matB));
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matC));
    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));
    ROCSPARSE_CHECK(rocsparse_destroy_error(p_error[0]));

    // Free device arrays
    HIP_CHECK(hipFree(dcsr_row_ptr_A));
    HIP_CHECK(hipFree(dcsr_col_ind_A));
    HIP_CHECK(hipFree(dcsr_val_A));

    HIP_CHECK(hipFree(dcsr_row_ptr_B));
    HIP_CHECK(hipFree(dcsr_col_ind_B));
    HIP_CHECK(hipFree(dcsr_val_B));

    HIP_CHECK(hipFree(dcsr_row_ptr_C));
    HIP_CHECK(hipFree(dcsr_col_ind_C));
    HIP_CHECK(hipFree(dcsr_val_C));

    return 0;
}

Note

The stages rocsparse_spgeam_stage_analysis and rocsparse_spgeam_stage_compute cannot be mixed with the stages rocsparse_spgeam_stage_symbolic_analysis, rocsparse_spgeam_stage_symbolic_compute, rocsparse_spgeam_stage_numeric_analysis, and rocsparse_spgeam_stage_numeric_compute.

Note

The stage rocsparse_spgeam_stage_analysis must precede the stage rocsparse_spgeam_stage_compute.

Note

The stage rocsparse_spgeam_stage_symbolic_analysis must precede the stage rocsparse_spgeam_stage_symbolic_compute.

Note

The stage rocsparse_spgeam_stage_numeric_analysis must precede the stage rocsparse_spgeam_stage_numeric_compute.

Note

The symbolic stages are not required to perform the numeric stages.

Note

The stage rocsparse_spgeam_stage_numeric_analysis must be re-applied if the numeric values of the input matrices mat_A and mat_B have changed between subsquent calls of the stage rocsparse_spgeam_stage_numeric_compute.

Note

Currently only CSR format is supported.

Note

Currently, only trans_A == rocsparse_operation_none is supported.

Note

Currently, only trans_B == rocsparse_operation_none is supported.

Note

This routine does not support execution in a hipGraph context.

Parameters:

handle – [in] handle to the rocsparse library context queue.
descr – [in] SpGEAM descriptor
mat_A – [in] sparse matrix \(A\) descriptor.
mat_B – [in] sparse matrix \(B\) descriptor.
mat_C – [out] sparse matrix \(C\) descriptor.
stage – [in] SpGEAM stage for the SpGEAM computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is determined by calling rocsparse_spgeam_buffer_size.
temp_buffer – [in] temporary storage buffer allocated by the user.
error – [out] error descriptor created if the returned status is not rocsparse_status_success. A null pointer can be passed if the user is not interested in obtaining an error descriptor.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – mat_A, mat_B, mat_C, descr or buffer_size pointer is invalid.

rocsparse_sddmm_buffer_size()#

rocsparse_status rocsparse_sddmm_buffer_size(rocsparse_handle handle, rocsparse_operation opA, rocsparse_operation opB, const void *alpha, rocsparse_const_dnmat_descr mat_A, rocsparse_const_dnmat_descr mat_B, const void *beta, rocsparse_spmat_descr mat_C, rocsparse_datatype compute_type, rocsparse_sddmm_alg alg, size_t *buffer_size)#

rocsparse_sddmm_buffer_size returns the size of the required buffer to execute the SDDMM operation from a given configuration. This routine is used in conjunction with rocsparse_sddmm_preprocess() and rocsparse_sddmm().

Note

This routine does not support execution in a hipGraph context.

Parameters:

handle – [in] handle to the rocsparse library context queue.
opA – [in] dense matrix \(A\) operation type.
opB – [in] dense matrix \(B\) operation type.
alpha – [in] scalar \(\alpha\).
mat_A – [in] dense matrix \(A\) descriptor.
mat_B – [in] dense matrix \(B\) descriptor.
beta – [in] scalar \(\beta\).
mat_C – [inout] sparse matrix \(C\) descriptor.
compute_type – [in] floating point precision for the SDDMM computation.
alg – [in] specification of the algorithm to use.
buffer_size – [out] number of bytes of the temporary storage buffer.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_value – the value of opA or opB is incorrect.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – alpha and beta are invalid, mat_A, mat_B, mat_C or buffer_size pointer is invalid.
rocsparse_status_not_implemented – opA == rocsparse_operation_conjugate_transpose or opB == rocsparse_operation_conjugate_transpose.

rocsparse_sddmm_preprocess()#

rocsparse_status rocsparse_sddmm_preprocess(rocsparse_handle handle, rocsparse_operation opA, rocsparse_operation opB, const void *alpha, rocsparse_const_dnmat_descr mat_A, rocsparse_const_dnmat_descr mat_B, const void *beta, rocsparse_spmat_descr mat_C, rocsparse_datatype compute_type, rocsparse_sddmm_alg alg, void *temp_buffer)#

rocsparse_sddmm_preprocess executes a part of the algorithm that can be calculated once in the context of multiple calls of the rocsparse_sddmm with the same sparsity pattern.

Note

This routine does not support execution in a hipGraph context.

Parameters:

handle – [in] handle to the rocsparse library context queue.
opA – [in] dense matrix \(A\) operation type.
opB – [in] dense matrix \(B\) operation type.
alpha – [in] scalar \(\alpha\).
mat_A – [in] dense matrix \(A\) descriptor.
mat_B – [in] dense matrix \(B\) descriptor.
beta – [in] scalar \(\beta\).
mat_C – [inout] sparse matrix \(C\) descriptor.
compute_type – [in] floating point precision for the SDDMM computation.
alg – [in] specification of the algorithm to use.
temp_buffer – [in] temporary storage buffer allocated by the user. The size must be greater or equal to the size obtained with rocsparse_sddmm_buffer_size.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_value – the value of opA or opB is incorrect.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – alpha and beta are invalid, mat_A, mat_B, mat_C or temp_buffer pointer is invalid.
rocsparse_status_not_implemented – opA == rocsparse_operation_conjugate_transpose or opB == rocsparse_operation_conjugate_transpose.

rocsparse_sddmm()#

rocsparse_status rocsparse_sddmm(rocsparse_handle handle, rocsparse_operation opA, rocsparse_operation opB, const void *alpha, rocsparse_const_dnmat_descr mat_A, rocsparse_const_dnmat_descr mat_B, const void *beta, rocsparse_spmat_descr mat_C, rocsparse_datatype compute_type, rocsparse_sddmm_alg alg, void *temp_buffer)#

Sampled Dense-Dense Matrix Multiplication.

rocsparse_sddmm multiplies the scalar \(\alpha\) with the dense \(m \times k\) matrix \(op(A)\), the dense \(k \times n\) matrix \(op(B)\), filtered by the sparsity pattern of the \(m \times n\) sparse matrix \(C\) and adds the result to \(C\) scaled by \(\beta\). The final result is stored in the sparse \(m \times n\) matrix \(C\), such that

\[ C := \alpha ( op(A) \cdot op(B) ) \circ spy(C) + \beta C, \]

with

\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if op(A) == rocsparse_operation_none} \\ A^T, & \text{if op(A) == rocsparse_operation_transpose} \\ \end{array} \right. \end{split}\]

,

\[\begin{split} op(B) = \left\{ \begin{array}{ll} B, & \text{if op(B) == rocsparse_operation_none} \\ B^T, & \text{if op(B) == rocsparse_operation_transpose} \\ \end{array} \right. \end{split}\]

and

\[\begin{split} spy(C)_{ij} = \left\{ \begin{array}{ll} 1, & \text{ if C_{ij} != 0} \\ 0, & \text{ otherwise} \\ \end{array} \right. \end{split}\]

Computing the above sampled dense-dense multiplication requires three steps to complete. First, the user calls rocsparse_sddmm_buffer_size to determine the size of the required temporary storage buffer. Next, the user allocates this buffer and calls rocsparse_sddmm_preprocess which performs any analysis of the input matrices that may be required. Finally, the user calls rocsparse_sddmm to complete the computation. Once all calls to rocsparse_sddmm are complete, the temporary buffer can be deallocated.

rocsparse_sddmm supports different algorithms which can provide better performance for different matrices.

CSR/CSC Algorithms	Deterministic	Preprocessing	Notes
rocsparse_sddmm_alg_default	Yes	No	Uses the sparsity pattern of matrix C to perform a limited set of dot products
rocsparse_sddmm_alg_dense	Yes	No	Explicitly converts the matrix C into a dense matrix to perform a dense matrix multiply and add

Currently, rocsparse_sddmm only supports the uniform precisions indicated in the table below. For the sparse matrix \(C\), rocsparse_sddmm supports the index types rocsparse_indextype_i32 and rocsparse_indextype_i64.

Uniform Precisions:

A / B / C / compute_type
rocsparse_datatype_f16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c

Mixed precisions:

A / B	C	compute_type
rocsparse_datatype_f16_r	rocsparse_datatype_f32_r	rocsparse_datatype_f32_r
rocsparse_datatype_f16_r	rocsparse_datatype_f16_r	rocsparse_datatype_f32_r
rocsparse_datatype_bf16_r	rocsparse_datatype_f32_r	rocsparse_datatype_f32_r
rocsparse_datatype_bf16_r	rocsparse_datatype_bf16_r	rocsparse_datatype_f32_r

Example

This example performs sampled dense-dense matrix product, \(C := \alpha ( A \cdot B ) \circ spy(C) + \beta C\) where \(\circ\) is the hadamard product

int main()
{
    // rocSPARSE handle
    rocsparse_handle handle;
    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    float halpha = 1.0f;
    float hbeta  = -1.0f;

    // A, B, and C are mxk, kxn, and mxn
    int m    = 4;
    int k    = 3;
    int n    = 2;
    int nnzC = 5;

    //     2  3  -1
    // A = 0  2   1
    //     0  0   5
    //     0 -2 0.5

    //      0  4
    // B =  1  0
    //     -2  0.5

    //      1 0            1 0
    // C =  2 3   spy(C) = 1 1
    //      0 0            0 0
    //      4 5            1 1

    std::vector<float> hA
        = {2.0f, 3.0f, -1.0f, 0.0, 2.0f, 1.0f, 0.0f, 0.0f, 5.0f, 0.0f, -2.0f, 0.5f};
    std::vector<float> hB = {0.0f, 4.0f, 1.0f, 0.0, -2.0f, 0.5f};

    std::vector<int>   hcsr_row_ptrC = {0, 1, 3, 3, 5};
    std::vector<int>   hcsr_col_indC = {0, 0, 1, 0, 1};
    std::vector<float> hcsr_valC     = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f};

    float* dA;
    float* dB;
    HIP_CHECK(hipMalloc(&dA, sizeof(float) * m * k));
    HIP_CHECK(hipMalloc(&dB, sizeof(float) * k * n));

    int*   dcsr_row_ptrC;
    int*   dcsr_col_indC;
    float* dcsr_valC;
    HIP_CHECK(hipMalloc(&dcsr_row_ptrC, sizeof(int) * (m + 1)));
    HIP_CHECK(hipMalloc(&dcsr_col_indC, sizeof(int) * nnzC));
    HIP_CHECK(hipMalloc(&dcsr_valC, sizeof(float) * nnzC));

    HIP_CHECK(hipMemcpy(dA, hA.data(), sizeof(float) * m * k, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dB, hB.data(), sizeof(float) * k * n, hipMemcpyHostToDevice));

    HIP_CHECK(hipMemcpy(
        dcsr_row_ptrC, hcsr_row_ptrC.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dcsr_col_indC, hcsr_col_indC.data(), sizeof(int) * nnzC, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dcsr_valC, hcsr_valC.data(), sizeof(float) * nnzC, hipMemcpyHostToDevice));

    rocsparse_dnmat_descr matA;
    ROCSPARSE_CHECK(rocsparse_create_dnmat_descr(
        &matA, m, k, k, dA, rocsparse_datatype_f32_r, rocsparse_order_row));

    rocsparse_dnmat_descr matB;
    ROCSPARSE_CHECK(rocsparse_create_dnmat_descr(
        &matB, k, n, n, dB, rocsparse_datatype_f32_r, rocsparse_order_row));

    rocsparse_spmat_descr matC;
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matC,
                                               m,
                                               n,
                                               nnzC,
                                               dcsr_row_ptrC,
                                               dcsr_col_indC,
                                               dcsr_valC,
                                               rocsparse_indextype_i32,
                                               rocsparse_indextype_i32,
                                               rocsparse_index_base_zero,
                                               rocsparse_datatype_f32_r));

    size_t buffer_size = 0;
    ROCSPARSE_CHECK(rocsparse_sddmm_buffer_size(handle,
                                                rocsparse_operation_none,
                                                rocsparse_operation_none,
                                                &halpha,
                                                matA,
                                                matB,
                                                &hbeta,
                                                matC,
                                                rocsparse_datatype_f32_r,
                                                rocsparse_sddmm_alg_default,
                                                &buffer_size));

    void* dbuffer;
    HIP_CHECK(hipMalloc(&dbuffer, buffer_size));

    ROCSPARSE_CHECK(rocsparse_sddmm_preprocess(handle,
                                               rocsparse_operation_none,
                                               rocsparse_operation_none,
                                               &halpha,
                                               matA,
                                               matB,
                                               &hbeta,
                                               matC,
                                               rocsparse_datatype_f32_r,
                                               rocsparse_sddmm_alg_default,
                                               dbuffer));

    ROCSPARSE_CHECK(rocsparse_sddmm(handle,
                                    rocsparse_operation_none,
                                    rocsparse_operation_none,
                                    &halpha,
                                    matA,
                                    matB,
                                    &hbeta,
                                    matC,
                                    rocsparse_datatype_f32_r,
                                    rocsparse_sddmm_alg_default,
                                    dbuffer));

    HIP_CHECK(hipMemcpy(
        hcsr_row_ptrC.data(), dcsr_row_ptrC, sizeof(int) * (m + 1), hipMemcpyDeviceToHost));
    HIP_CHECK(
        hipMemcpy(hcsr_col_indC.data(), dcsr_col_indC, sizeof(int) * nnzC, hipMemcpyDeviceToHost));
    HIP_CHECK(hipMemcpy(hcsr_valC.data(), dcsr_valC, sizeof(float) * nnzC, hipMemcpyDeviceToHost));

    std::cout << "hcsr_row_ptrC" << std::endl;
    for(size_t i = 0; i < hcsr_row_ptrC.size(); i++)
    {
        std::cout << hcsr_row_ptrC[i] << " ";
    }
    std::cout << "" << std::endl;

    std::cout << "hcsr_col_indC" << std::endl;
    for(size_t i = 0; i < hcsr_col_indC.size(); i++)
    {
        std::cout << hcsr_col_indC[i] << " ";
    }
    std::cout << "" << std::endl;

    std::cout << "hcsr_valC" << std::endl;
    for(size_t i = 0; i < hcsr_valC.size(); i++)
    {
        std::cout << hcsr_valC[i] << " ";
    }
    std::cout << "" << std::endl;

    ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(matA));
    ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(matB));
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matC));
    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));

    HIP_CHECK(hipFree(dA));
    HIP_CHECK(hipFree(dB));
    HIP_CHECK(hipFree(dcsr_row_ptrC));
    HIP_CHECK(hipFree(dcsr_col_indC));
    HIP_CHECK(hipFree(dcsr_valC));
    HIP_CHECK(hipFree(dbuffer));

    return 0;
}

Note

The sparse matrix formats currently supported are: rocsparse_format_csr.

Note

opA == rocsparse_operation_conjugate_transpose is not supported.

Note

opB == rocsparse_operation_conjugate_transpose is not supported.

Note

This routine supports execution in a hipGraph context only when alg == rocsparse_sddmm_alg_default.

Parameters:

handle – [in] handle to the rocsparse library context queue.
opA – [in] dense matrix \(A\) operation type.
opB – [in] dense matrix \(B\) operation type.
alpha – [in] scalar \(\alpha\).
mat_A – [in] dense matrix \(A\) descriptor.
mat_B – [in] dense matrix \(B\) descriptor.
beta – [in] scalar \(\beta\).
mat_C – [inout] sparse matrix \(C\) descriptor.
compute_type – [in] floating point precision for the SDDMM computation.
alg – [in] specification of the algorithm to use.
temp_buffer – [in] temporary storage buffer allocated by the user. The size must be greater or equal to the size obtained with rocsparse_sddmm_buffer_size.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_value – the value of opA, opB, compute_type or alg is incorrect.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – alpha and beta are invalid, mat_A, mat_B, mat_C or temp_buffer pointer is invalid.
rocsparse_status_not_implemented – opA == rocsparse_operation_conjugate_transpose or opB == rocsparse_operation_conjugate_transpose.

rocsparse_dense_to_sparse()#

rocsparse_status rocsparse_dense_to_sparse(rocsparse_handle handle, rocsparse_const_dnmat_descr mat_A, rocsparse_spmat_descr mat_B, rocsparse_dense_to_sparse_alg alg, size_t *buffer_size, void *temp_buffer)#

Dense matrix to sparse matrix conversion.

rocsparse_dense_to_sparse performs the conversion of a dense matrix to a sparse matrix in CSR, CSC, or COO format.

rocsparse_dense_to_sparse requires multiple steps to complete. First, the user calls rocsparse_dense_to_sparse with nullptr passed into temp_buffer:

// Call dense_to_sparse to get required buffer size
size_t buffer_size = 0;
rocsparse_dense_to_sparse(handle,
                          matA,
                          matB,
                          rocsparse_dense_to_sparse_alg_default,
                          &buffer_size,
                          nullptr);

After this is called, the buffer_size will be filled with the size of the required buffer that must be then allocated by the user. Next the user calls rocsparse_dense_to_sparse with the newly allocated temp_buffer and nullptr passed into buffer_size:

// Call dense_to_sparse to perform analysis
rocsparse_dense_to_sparse(handle,
                          matA,
                          matB,
                          rocsparse_dense_to_sparse_alg_default,
                          nullptr,
                          temp_buffer);

This will determine the number of non-zeros that will exist in the sparse matrix which can be queried using rocsparse_spmat_get_size routine. With this, the user can allocate the sparse matrix device arrays and set them on the sparse matrix descriptor using rocsparse_csr_set_pointers (CSR format), rocsparse_csc_set_pointers (for CSC format), or rocsparse_coo_set_pointers (for COO format). Finally, the conversion is completed by calling rocsparse_dense_to_sparse with both the buffer_size and temp_buffer:

// Call dense_to_sparse to complete conversion
rocsparse_dense_to_sparse(handle,
                          matA,
                          matB,
                          rocsparse_dense_to_sparse_alg_default,
                          &buffer_size,
                          temp_buffer);

Currently, rocsparse_dense_to_sparse only supports the algorithm rocsparse_dense_to_sparse_alg_default. See full example below.

rocsparse_dense_to_sparse supports rocsparse_datatype_f16_r, rocsparse_datatype_f32_r, rocsparse_datatype_f64_r, rocsparse_datatype_f32_c, and rocsparse_datatype_f64_c for values arrays in the sparse matrix (stored in CSR, CSC, or COO format) and the dense matrix. For the row/column offset and row/column index arrays of the sparse matrix, rocsparse_dense_to_sparse supports the precisions rocsparse_indextype_i32 and rocsparse_indextype_i64.

Uniform Precisions:

A / B
rocsparse_datatype_f16_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c

Example

int main()
{
    //     1 4 0 0 0 0
    // A = 0 2 3 0 0 0
    //     5 0 0 7 8 0
    //     0 0 9 0 6 0
    int m = 4;
    int n = 6;

    std::vector<float> hdense
        = {1, 0, 5, 0, 4, 2, 0, 0, 0, 3, 0, 9, 0, 0, 7, 0, 0, 0, 8, 6, 0, 0, 0, 0};

    // Offload data to device
    int*   dcsr_row_ptr;
    float* ddense;
    HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1)));
    HIP_CHECK(hipMalloc(&ddense, sizeof(float) * m * n));

    HIP_CHECK(hipMemcpy(ddense, hdense.data(), sizeof(float) * m * n, hipMemcpyHostToDevice));

    rocsparse_handle      handle;
    rocsparse_dnmat_descr matA;
    rocsparse_spmat_descr matB;

    rocsparse_indextype  row_idx_type = rocsparse_indextype_i32;
    rocsparse_indextype  col_idx_type = rocsparse_indextype_i32;
    rocsparse_datatype   data_type    = rocsparse_datatype_f32_r;
    rocsparse_index_base idx_base     = rocsparse_index_base_zero;

    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    // Create sparse matrix A
    ROCSPARSE_CHECK(
        rocsparse_create_dnmat_descr(&matA, m, n, m, ddense, data_type, rocsparse_order_column));

    // Create dense matrix B
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matB,
                                               m,
                                               n,
                                               0,
                                               dcsr_row_ptr,
                                               nullptr,
                                               nullptr,
                                               row_idx_type,
                                               col_idx_type,
                                               idx_base,
                                               data_type));

    // Call dense_to_sparse to get required buffer size
    size_t buffer_size = 0;
    ROCSPARSE_CHECK(rocsparse_dense_to_sparse(
        handle, matA, matB, rocsparse_dense_to_sparse_alg_default, &buffer_size, nullptr));

    void* temp_buffer;
    HIP_CHECK(hipMalloc(&temp_buffer, buffer_size));

    // Call dense_to_sparse to perform analysis
    ROCSPARSE_CHECK(rocsparse_dense_to_sparse(
        handle, matA, matB, rocsparse_dense_to_sparse_alg_default, nullptr, temp_buffer));

    int64_t num_rows_tmp, num_cols_tmp, nnz;
    ROCSPARSE_CHECK(rocsparse_spmat_get_size(matB, &num_rows_tmp, &num_cols_tmp, &nnz));

    int*   dcsr_col_ind;
    float* dcsr_val;
    HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz));
    HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz));

    ROCSPARSE_CHECK(rocsparse_csr_set_pointers(matB, dcsr_row_ptr, dcsr_col_ind, dcsr_val));

    // Call dense_to_sparse to complete conversion
    ROCSPARSE_CHECK(rocsparse_dense_to_sparse(
        handle, matA, matB, rocsparse_dense_to_sparse_alg_default, &buffer_size, temp_buffer));

    std::vector<int>   hcsr_row_ptr(m + 1, 0);
    std::vector<int>   hcsr_col_ind(nnz, 0);
    std::vector<float> hcsr_val(nnz, 0);

    // Copy result back to host
    HIP_CHECK(
        hipMemcpy(hcsr_row_ptr.data(), dcsr_row_ptr, sizeof(int) * (m + 1), hipMemcpyDeviceToHost));
    HIP_CHECK(
        hipMemcpy(hcsr_col_ind.data(), dcsr_col_ind, sizeof(int) * nnz, hipMemcpyDeviceToHost));
    HIP_CHECK(hipMemcpy(hcsr_val.data(), dcsr_val, sizeof(float) * nnz, hipMemcpyDeviceToHost));

    std::cout << "hcsr_row_ptr" << std::endl;
    for(size_t i = 0; i < hcsr_row_ptr.size(); i++)
    {
        std::cout << hcsr_row_ptr[i] << " ";
    }
    std::cout << "" << std::endl;

    std::cout << "hcsr_col_ind" << std::endl;
    for(size_t i = 0; i < hcsr_col_ind.size(); i++)
    {
        std::cout << hcsr_col_ind[i] << " ";
    }
    std::cout << "" << std::endl;

    std::cout << "hcsr_val" << std::endl;
    for(size_t i = 0; i < hcsr_val.size(); i++)
    {
        std::cout << hcsr_val[i] << " ";
    }
    std::cout << "" << std::endl;

    // Clear rocSPARSE
    ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(matA));
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matB));
    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));

    // Clear device memory
    HIP_CHECK(hipFree(dcsr_row_ptr));
    HIP_CHECK(hipFree(dcsr_col_ind));
    HIP_CHECK(hipFree(dcsr_val));
    HIP_CHECK(hipFree(ddense));

    return 0;
}

Note

This function writes the required allocation size (in bytes) to buffer_size and returns without performing the dense to sparse operation, when a nullptr is passed for temp_buffer.

Note

This function is blocking with respect to the host.

Note

This routine does not support execution in a hipGraph context.

Parameters:

handle – [in] handle to the rocsparse library context queue.
mat_A – [in] dense matrix descriptor.
mat_B – [in] sparse matrix descriptor.
alg – [in] algorithm for the dense to sparse computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when temp_buffer is nullptr.
temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to buffer_size and function returns without performing the dense to sparse operation.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – mat_A, mat_B, or buffer_size pointer is invalid.

rocsparse_sparse_to_dense()#

rocsparse_status rocsparse_sparse_to_dense(rocsparse_handle handle, rocsparse_const_spmat_descr mat_A, rocsparse_dnmat_descr mat_B, rocsparse_sparse_to_dense_alg alg, size_t *buffer_size, void *temp_buffer)#

Sparse matrix to dense matrix conversion.

rocsparse_sparse_to_dense performs the conversion of a sparse matrix in CSR, CSC, or COO format to a dense matrix

rocsparse_sparse_to_dense requires multiple steps to complete. First, the user calls rocsparse_sparse_to_dense with nullptr passed into temp_buffer:

// Call sparse_to_dense to get required buffer size
size_t buffer_size = 0;
rocsparse_sparse_to_dense(handle,
                          matA,
                          matB,
                          rocsparse_sparse_to_dense_alg_default,
                          &buffer_size,
                          nullptr);

After this is called, the buffer_size will be filled with the size of the required buffer that must be then allocated by the user. Finally, the conversion is completed by calling rocsparse_sparse_to_dense with both the buffer_size and temp_buffer:

// Call dense_to_sparse to complete conversion
rocsparse_sparse_to_dense(handle,
                          matA,
                          matB,
                          rocsparse_sparse_to_dense_alg_default,
                          &buffer_size,
                          temp_buffer);

Currently, rocsparse_sparse_to_dense only supports the algorithm rocsparse_sparse_to_dense_alg_default. See full example below.

rocsparse_sparse_to_dense supports rocsparse_datatype_f16_r, rocsparse_datatype_f32_r, rocsparse_datatype_f64_r, rocsparse_datatype_f32_c, and rocsparse_datatype_f64_c for values arrays in the sparse matrix (stored in CSR, CSC, or COO format) and the dense matrix. For the row/column offset and row/column index arrays of the sparse matrix, rocsparse_sparse_to_dense supports the precisions rocsparse_indextype_i32 and rocsparse_indextype_i64.

Uniform Precisions:

A / B
rocsparse_datatype_f16_r
rocsparse_datatype_bf16_r
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c

Example

int main()
{
    //     1 4 0 0 0 0
    // A = 0 2 3 0 0 0
    //     5 0 0 7 8 0
    //     0 0 9 0 6 0
    int m = 4;
    int n = 6;

    std::vector<int>   hcsr_row_ptr = {0, 2, 4, 7, 9};
    std::vector<int>   hcsr_col_ind = {0, 1, 1, 2, 0, 3, 4, 2, 4};
    std::vector<float> hcsr_val     = {1, 4, 2, 3, 5, 7, 8, 9, 6};
    std::vector<float> hdense(m * n, 0.0f);

    int nnz = hcsr_row_ptr[m] - hcsr_row_ptr[0];

    // Offload data to device
    int*   dcsr_row_ptr;
    int*   dcsr_col_ind;
    float* dcsr_val;
    float* ddense;
    HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1)));
    HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz));
    HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz));
    HIP_CHECK(hipMalloc(&ddense, sizeof(float) * m * n));

    HIP_CHECK(
        hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(ddense, hdense.data(), sizeof(float) * m * n, hipMemcpyHostToDevice));

    rocsparse_handle      handle;
    rocsparse_spmat_descr matA;
    rocsparse_dnmat_descr matB;

    rocsparse_indextype  row_idx_type = rocsparse_indextype_i32;
    rocsparse_indextype  col_idx_type = rocsparse_indextype_i32;
    rocsparse_datatype   data_type    = rocsparse_datatype_f32_r;
    rocsparse_index_base idx_base     = rocsparse_index_base_zero;

    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    // Create sparse matrix A
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA,
                                               m,
                                               n,
                                               nnz,
                                               dcsr_row_ptr,
                                               dcsr_col_ind,
                                               dcsr_val,
                                               row_idx_type,
                                               col_idx_type,
                                               idx_base,
                                               data_type));

    // Create dense matrix B
    ROCSPARSE_CHECK(
        rocsparse_create_dnmat_descr(&matB, m, n, m, ddense, data_type, rocsparse_order_column));

    // Call sparse_to_dense
    size_t buffer_size = 0;
    ROCSPARSE_CHECK(rocsparse_sparse_to_dense(
        handle, matA, matB, rocsparse_sparse_to_dense_alg_default, &buffer_size, nullptr));

    void* temp_buffer;
    HIP_CHECK(hipMalloc(&temp_buffer, buffer_size));

    ROCSPARSE_CHECK(rocsparse_sparse_to_dense(
        handle, matA, matB, rocsparse_sparse_to_dense_alg_default, &buffer_size, temp_buffer));

    // Copy result back to host
    HIP_CHECK(hipMemcpy(hdense.data(), ddense, sizeof(float) * m * n, hipMemcpyDeviceToHost));

    // Clear rocSPARSE
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA));
    ROCSPARSE_CHECK(rocsparse_destroy_dnmat_descr(matB));
    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));

    // Clear device memory
    HIP_CHECK(hipFree(dcsr_row_ptr));
    HIP_CHECK(hipFree(dcsr_col_ind));
    HIP_CHECK(hipFree(dcsr_val));
    HIP_CHECK(hipFree(ddense));

    return 0;
}

Note

This function writes the required allocation size (in bytes) to buffer_size and returns without performing the sparse to dense operation, when a nullptr is passed for temp_buffer.

Note

This function is blocking with respect to the host.

Note

This routine does not support execution in a hipGraph context.

Parameters:

handle – [in] handle to the rocsparse library context queue.
mat_A – [in] sparse matrix descriptor.
mat_B – [in] dense matrix descriptor.
alg – [in] algorithm for the sparse to dense computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when temp_buffer is nullptr.
temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to buffer_size and function returns without performing the sparse to dense operation.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – mat_A, mat_B, or buffer_size pointer is invalid.

rocsparse_sparse_to_sparse_buffer_size()#

rocsparse_status rocsparse_sparse_to_sparse_buffer_size(rocsparse_handle handle, rocsparse_sparse_to_sparse_descr descr, rocsparse_const_spmat_descr source, rocsparse_spmat_descr target, rocsparse_sparse_to_sparse_stage stage, size_t *buffer_size_in_bytes)#

rocsparse_sparse_to_sparse_buffer_size calculates the required buffer size in bytes for a given stage stage.

Parameters:

handle – [in] handle to the rocsparse library context queue.
descr – [in] descriptor of the sparse_to_sparse algorithm.
source – [in] source sparse matrix descriptor.
target – [in] target sparse matrix descriptor.
stage – [in] stage of the sparse_to_sparse computation.
buffer_size_in_bytes – [out] size in bytes of the buffer

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_value – if any required enumeration is invalid.
rocsparse_status_invalid_pointer – mat_A, mat_B, or buffer_size_in_bytes pointer is invalid.

rocsparse_sparse_to_sparse()#

rocsparse_status rocsparse_sparse_to_sparse(rocsparse_handle handle, rocsparse_sparse_to_sparse_descr descr, rocsparse_const_spmat_descr source, rocsparse_spmat_descr target, rocsparse_sparse_to_sparse_stage stage, size_t buffer_size_in_bytes, void *buffer)#

Sparse matrix to sparse matrix conversion.

rocsparse_sparse_to_sparse performs the conversion of a sparse matrix to a sparse matrix.

Example

This example converts a CSR matrix into an ELL matrix.

int main()
{
    // 4 2 0 1 0
    // 2 4 2 0 1
    // 0 2 4 2 0
    // 1 0 2 4 2
    // 0 1 0 2 4
    int m   = 5;
    int n   = 5;
    int nnz = 17;

    std::vector<int>    hcsr_row_ptr = {0, 3, 7, 10, 14, 17};
    std::vector<int>    hcsr_col_ind = {0, 1, 3, 0, 1, 2, 4, 1, 2, 3, 0, 2, 3, 4, 1, 3, 4};
    std::vector<double> hcsr_val
        = {4.0, 2.0, 1.0, 2.0, 4.0, 2.0, 1.0, 2.0, 4.0, 2.0, 1.0, 2.0, 4.0, 2.0, 1.0, 2.0, 4.0};

    // rocSPARSE handle
    rocsparse_handle handle;
    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    int*    dcsr_row_ptr = nullptr;
    int*    dcsr_col_ind = nullptr;
    double* dcsr_val     = nullptr;
    HIP_CHECK(hipMalloc((void**)&dcsr_row_ptr, sizeof(int) * (m + 1)));
    HIP_CHECK(hipMalloc((void**)&dcsr_col_ind, sizeof(int) * nnz));
    HIP_CHECK(hipMalloc((void**)&dcsr_val, sizeof(double) * nnz));

    HIP_CHECK(
        hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(double) * nnz, hipMemcpyHostToDevice));

    // It assumes the CSR arrays (ptr, ind, val) have already been allocated and filled.
    // Build Source
    rocsparse_spmat_descr source;
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&source,
                                               m,
                                               n,
                                               nnz,
                                               dcsr_row_ptr,
                                               dcsr_col_ind,
                                               dcsr_val,
                                               rocsparse_indextype_i32,
                                               rocsparse_indextype_i32,
                                               rocsparse_index_base_zero,
                                               rocsparse_datatype_f64_r));

    // Build target
    void *                dell_ind, *dell_val;
    int64_t               ell_width = 0;
    rocsparse_spmat_descr target;
    ROCSPARSE_CHECK(rocsparse_create_ell_descr(&target,
                                               m,
                                               n,
                                               dell_ind,
                                               dell_val,
                                               ell_width,
                                               rocsparse_indextype_i32,
                                               rocsparse_index_base_zero,
                                               rocsparse_datatype_f64_r));

    // Create descriptor
    rocsparse_sparse_to_sparse_descr descr;
    ROCSPARSE_CHECK(rocsparse_create_sparse_to_sparse_descr(
        &descr, source, target, rocsparse_sparse_to_sparse_alg_default));

    // Analysis phase
    size_t buffer_size;
    ROCSPARSE_CHECK(rocsparse_sparse_to_sparse_buffer_size(
        handle, descr, source, target, rocsparse_sparse_to_sparse_stage_analysis, &buffer_size));

    void* buffer = nullptr;
    HIP_CHECK(hipMalloc(&buffer, buffer_size));

    ROCSPARSE_CHECK(rocsparse_sparse_to_sparse(handle,
                                               descr,
                                               source,
                                               target,
                                               rocsparse_sparse_to_sparse_stage_analysis,
                                               buffer_size,
                                               buffer));
    HIP_CHECK(hipFree(buffer));

    // the user is responsible to allocate target arrays after the analysis phase.
    int64_t              rows, cols;
    void *               ind, *val;
    rocsparse_indextype  idx_type;
    rocsparse_index_base idx_base;
    rocsparse_datatype   data_type;

    // Get ell_width
    ROCSPARSE_CHECK(rocsparse_ell_get(
        target, &rows, &cols, &ind, &val, &ell_width, &idx_type, &idx_base, &data_type));

    std::cout << "rows: " << rows << " cols: " << cols << " ell_width: " << ell_width << std::endl;

    // Allocate device arrays for ELL format
    HIP_CHECK(hipMalloc(&dell_ind, sizeof(int) * ell_width * m));
    HIP_CHECK(hipMalloc(&dell_val, sizeof(double) * ell_width * m));

    ROCSPARSE_CHECK(rocsparse_ell_set_pointers(target, dell_ind, dell_val));

    // Calculation phase
    ROCSPARSE_CHECK(rocsparse_sparse_to_sparse_buffer_size(
        handle, descr, source, target, rocsparse_sparse_to_sparse_stage_compute, &buffer_size));
    HIP_CHECK(hipMalloc(&buffer, buffer_size));
    ROCSPARSE_CHECK(rocsparse_sparse_to_sparse(handle,
                                               descr,
                                               source,
                                               target,
                                               rocsparse_sparse_to_sparse_stage_compute,
                                               buffer_size,
                                               buffer));
    HIP_CHECK(hipFree(buffer));

    std::vector<int>    hell_ind(ell_width * m);
    std::vector<double> hell_val(ell_width * m);

    HIP_CHECK(
        hipMemcpy(hell_ind.data(), dell_ind, sizeof(int) * ell_width * m, hipMemcpyDeviceToHost));
    HIP_CHECK(hipMemcpy(
        hell_val.data(), dell_val, sizeof(double) * ell_width * m, hipMemcpyDeviceToHost));

    std::cout << "hell_ind" << std::endl;
    for(size_t i = 0; i < hell_ind.size(); i++)
    {
        std::cout << hell_ind[i] << " ";
    }
    std::cout << "" << std::endl;

    std::cout << "hell_val" << std::endl;
    for(size_t i = 0; i < hell_val.size(); i++)
    {
        std::cout << hell_val[i] << " ";
    }
    std::cout << "" << std::endl;

    HIP_CHECK(hipFree(dcsr_row_ptr));
    HIP_CHECK(hipFree(dcsr_col_ind));
    HIP_CHECK(hipFree(dcsr_val));

    HIP_CHECK(hipFree(dell_ind));
    HIP_CHECK(hipFree(dell_val));

    return 0;
}

Note

The required allocation size (in bytes) to buffer_size_in_bytes must be obtained from rocsparse_sparse_to_sparse_buffer_size for each stage, indeed the required buffer size can be different between stages.

Note

The format rocsparse_format_bell is not supported.

Parameters:

handle – [in] handle to the rocsparse library context queue.
descr – [in] descriptor of the sparse_to_sparse algorithm.
source – [in] sparse matrix descriptor.
target – [in] sparse matrix descriptor.
stage – [in] stage of the sparse_to_sparse computation.
buffer_size_in_bytes – [in] size in bytes of the buffer
buffer – [in] temporary storage buffer allocated by the user.

Return values:

rocsparse_status_success – the operation completed successfully.

rocsparse_extract_buffer_size()#

rocsparse_status rocsparse_extract_buffer_size(rocsparse_handle handle, rocsparse_extract_descr descr, rocsparse_const_spmat_descr source, rocsparse_spmat_descr target, rocsparse_extract_stage stage, size_t *buffer_size_in_bytes)#

rocsparse_extract_buffer_size calculates the required buffer size in bytes for a given stage stage. This routine is used in conjunction with rocsparse_extract_nnz and rocsparse_extract to extract a lower or upper triangular sparse matrix from an input sparse matrix. See rocsparse_extract for more details.

Note

This routine is asynchronous with respect to the host. This routine does support execution in a hipGraph context.

Parameters:

handle – [in] handle to the rocsparse library context queue.
descr – [in] descriptor of the extract algorithm.
source – [in] source sparse matrix descriptor.
target – [in] target sparse matrix descriptor.
stage – [in] stage of the extract computation.
buffer_size_in_bytes – [out] size in bytes of the buffer.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_value – if stage is invalid.
rocsparse_status_invalid_pointer – descr, source, target, or buffer_size_in_bytes pointer is invalid.

rocsparse_extract_nnz#

rocsparse_status rocsparse_extract_nnz(rocsparse_handle handle, rocsparse_extract_descr descr, int64_t *nnz)#

rocsparse_extract_nnz returns the number of non-zeros in the extracted matrix. The value is available after the analysis phase rocsparse_extract_stage_analysis has been executed. This routine is used in conjunction with rocsparse_extract_buffer_size and rocsparse_extract to extract a lower or upper triangular sparse matrix from an input sparse matrix. See rocsparse_extract for more details.

Note

This routine is asynchronous with respect to the host. This routine does support execution in a hipGraph context.

Parameters:

handle – [in] handle to the rocsparse library context queue.
descr – [in] descriptor of the extract algorithm.
nnz – [out] the number of non-zeros.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – descr or nnz pointer is invalid.

rocsparse_extract()#

rocsparse_status rocsparse_extract(rocsparse_handle handle, rocsparse_extract_descr descr, rocsparse_const_spmat_descr source, rocsparse_spmat_descr target, rocsparse_extract_stage stage, size_t buffer_size_in_bytes, void *buffer)#

Sparse matrix extraction.

rocsparse_extract performs the extraction of the lower or upper part of a sparse matrix into a new matrix.

rocsparse_extract requires multiple steps to complete. First, the user creates the source and target sparse matrix descriptors. For example, in the case of CSR matrix format this might look like:

// Build Source
rocsparse_spmat_descr source;
rocsparse_create_csr_descr(&source,
                           M,
                           N,
                           nnz,
                           dsource_row_ptr,
                           dsource_col_ind,
                           dsource_val,
                           rocsparse_indextype_i32,
                           rocsparse_indextype_i32,
                           rocsparse_index_base_zero,
                           rocsparse_datatype_f32_r);

// Build target
void * dtarget_row_ptr;
hipMalloc(&dtarget_row_ptr, sizeof(int32_t) * (M + 1));
rocsparse_spmat_descr target;
rocsparse_create_csr_descr(&target,
                           M,
                           N,
                           0,
                           dtarget_row_ptr,
                           nullptr,
                           nullptr,
                           rocsparse_indextype_i32,
                           rocsparse_indextype_i32,
                           rocsparse_index_base_zero,
                           rocsparse_datatype_f32_r);

Next, the user creates the extraction descriptor and calls rocsparse_extract_buffer_size with the stage rocsparse_extract_stage_analysis in order to determine the amount of temporary storage required. The user allocates this temporary storage buffer and passes it to rocsparse_extract with the stage rocsparse_extract_stage_analysis

// Create descriptor
rocsparse_extract_descr descr;
rocsparse_create_extract_descr(&descr,
                               source,
                               target,
                               rocsparse_extract_alg_default);

// Analysis phase
size_t buffer_size;
rocsparse_extract_buffer_size(handle,
                              descr,
                              source,
                              target,
                              rocsparse_extract_stage_analysis,
                              &buffer_size);
void* dbuffer = nullptr;
hipMalloc(&dbuffer, buffer_size);
rocsparse_extract(handle,
                  descr,
                  source,
                  target,
                  rocsparse_extract_stage_analysis,
                  buffer_size,
                  dbuffer);
hipFree(dbuffer);

The user then calls rocsparse_extract_nnz in order to determine the number of non-zeros that will exist in the target matrix. Once determined, the user can allocate the column indices and values arrays of the target sparse matrix:

int64_t target_nnz;
rocsparse_extract_nnz(handle, descr, &target_nnz);

void* dtarget_col_ind,
void* dtarget_val;
hipMalloc(&dtarget_col_ind, sizeof(int32_t) * target_nnz);
hipMalloc(&dtarget_val, sizeof(float) * target_nnz);
rocsparse_csr_set_pointers(target, dtarget_row_ptr, dtarget_col_ind, dtarget_val);

Finally, the user calls rocsparse_extract_buffer_size with the stage rocsparse_extract_stage_compute in order to determine the size of the temporary user allocated storage needed for the computation of the column indices and values in the sparse target. The user allocates this buffer and completes the conversion by calling rocsparse_extract using the rocsparse_extract_stage_compute stage:

// Calculation phase
rocsparse_extract_buffer_size(handle,
                              descr,
                              source,
                              target,
                              rocsparse_extract_stage_compute,
                              &buffer_size);
hipMalloc(&dbuffer, buffer_size);
rocsparse_extract(handle,
                  descr,
                  source,
                  target,
                  rocsparse_extract_stage_compute,
                  buffer_size,
                  dbuffer);
hipFree(dbuffer);

The target row pointer, column indices, and values arrays will now be filled with the upper or lower part of the source matrix.

The source and the target matrices must have the same format (see rocsparse_format) and the same storage mode (see rocsparse_storage_mode). The attributes of the target matrix, the fill mode rocsparse_fill_mode and the diagonal type rocsparse_diag_type are used to parametrise the algorithm. These can be set on the target matrix using rocsparse_spmat_set_attribute. See full example below.

Example

This example extracts the lower part of CSR matrix into a CSR matrix.

int main()
{
    // 1 2 3 0 0 0 4 5
    // 0 1 3 5 7 0 0 0
    // 0 0 0 1 0 3 0 9
    // 1 2 3 0 0 0 0 4
    // 0 0 0 0 0 0 0 0
    // 1 2 1 0 0 5 8 0
    // 0 1 2 3 0 0 0 4
    // 0 0 0 1 2 0 1 2
    int32_t M   = 8;
    int32_t N   = 8;
    int32_t nnz = 29;

    std::vector<int32_t> hsource_row_ptr = {0, 5, 9, 12, 16, 16, 21, 25, 29};
    std::vector<int32_t> hsource_col_ind
        = {0, 1, 2, 6, 7, 1, 2, 3, 4, 3, 5, 7, 0, 1, 2, 7, 0, 1, 2, 5, 6, 1, 2, 3, 7, 3, 4, 6, 7};
    std::vector<float> hsource_val
        = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 1.0f, 3.0f, 5.0f, 7.0f, 1.0f, 3.0f, 9.0f, 1.0f, 2.0f, 3.0f,
           4.0f, 1.0f, 2.0f, 1.0f, 5.0f, 8.0f, 1.0f, 2.0f, 3.0f, 4.0f, 1.0f, 2.0f, 1.0f, 2.0f};

    int32_t* dsource_row_ptr;
    int32_t* dsource_col_ind;
    float*   dsource_val;
    HIP_CHECK(hipMalloc(&dsource_row_ptr, sizeof(int32_t) * (M + 1)));
    HIP_CHECK(hipMalloc(&dsource_col_ind, sizeof(int32_t) * nnz));
    HIP_CHECK(hipMalloc(&dsource_val, sizeof(float) * nnz));

    HIP_CHECK(hipMemcpy(
        dsource_row_ptr, hsource_row_ptr.data(), sizeof(int32_t) * (M + 1), hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(
        dsource_col_ind, hsource_col_ind.data(), sizeof(int32_t) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dsource_val, hsource_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice));

    rocsparse_handle handle;
    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    // Build Source
    rocsparse_spmat_descr source;
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&source,
                                               M,
                                               N,
                                               nnz,
                                               dsource_row_ptr,
                                               dsource_col_ind,
                                               dsource_val,
                                               rocsparse_indextype_i32,
                                               rocsparse_indextype_i32,
                                               rocsparse_index_base_zero,
                                               rocsparse_datatype_f32_r));

    // Build target
    void* dtarget_row_ptr;
    HIP_CHECK(hipMalloc(&dtarget_row_ptr, sizeof(int32_t) * (M + 1)));
    rocsparse_spmat_descr target;
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&target,
                                               M,
                                               N,
                                               0,
                                               dtarget_row_ptr,
                                               nullptr,
                                               nullptr,
                                               rocsparse_indextype_i32,
                                               rocsparse_indextype_i32,
                                               rocsparse_index_base_zero,
                                               rocsparse_datatype_f32_r));

    const rocsparse_fill_mode fill_mode = rocsparse_fill_mode_lower;
    const rocsparse_diag_type diag_type = rocsparse_diag_type_non_unit;

    ROCSPARSE_CHECK(rocsparse_spmat_set_attribute(
        target, rocsparse_spmat_fill_mode, &fill_mode, sizeof(fill_mode)));
    ROCSPARSE_CHECK(rocsparse_spmat_set_attribute(
        target, rocsparse_spmat_diag_type, &diag_type, sizeof(diag_type)));

    // Create descriptor
    rocsparse_extract_descr descr;
    ROCSPARSE_CHECK(
        rocsparse_create_extract_descr(&descr, source, target, rocsparse_extract_alg_default));

    // Analysis phase
    size_t buffer_size;
    ROCSPARSE_CHECK(rocsparse_extract_buffer_size(
        handle, descr, source, target, rocsparse_extract_stage_analysis, &buffer_size));
    void* dbuffer;
    HIP_CHECK(hipMalloc(&dbuffer, buffer_size));
    ROCSPARSE_CHECK(rocsparse_extract(
        handle, descr, source, target, rocsparse_extract_stage_analysis, buffer_size, dbuffer));
    HIP_CHECK(hipFree(dbuffer));

    // The user is responsible to allocate target arrays after the analysis phase.
    int64_t target_nnz;
    ROCSPARSE_CHECK(rocsparse_extract_nnz(handle, descr, &target_nnz));

    std::cout << "target_nnz: " << target_nnz << std::endl;

    void* dtarget_col_ind;
    void* dtarget_val;
    HIP_CHECK(hipMalloc(&dtarget_col_ind, sizeof(int32_t) * target_nnz));
    HIP_CHECK(hipMalloc(&dtarget_val, sizeof(float) * target_nnz));
    ROCSPARSE_CHECK(
        rocsparse_csr_set_pointers(target, dtarget_row_ptr, dtarget_col_ind, dtarget_val));

    // Calculation phase
    ROCSPARSE_CHECK(rocsparse_extract_buffer_size(
        handle, descr, source, target, rocsparse_extract_stage_compute, &buffer_size));
    HIP_CHECK(hipMalloc(&dbuffer, buffer_size));
    ROCSPARSE_CHECK(rocsparse_extract(
        handle, descr, source, target, rocsparse_extract_stage_compute, buffer_size, dbuffer));
    HIP_CHECK(hipFree(dbuffer));

    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));

    HIP_CHECK(hipFree(dsource_row_ptr));
    HIP_CHECK(hipFree(dsource_col_ind));
    HIP_CHECK(hipFree(dsource_val));

    HIP_CHECK(hipFree(dtarget_row_ptr));
    HIP_CHECK(hipFree(dtarget_col_ind));
    HIP_CHECK(hipFree(dtarget_val));

    return 0;
}

Note

This routine is asynchronous with respect to the host. This routine does support execution in a hipGraph context.

Note

Supported formats are rocsparse_format_csr and rocsparse_format_csc.

Parameters:

handle – [in] handle to the rocsparse library context queue.
descr – [in] descriptor of the extract algorithm.
source – [in] sparse matrix descriptor.
target – [in] sparse matrix descriptor.
stage – [in] stage of the extract computation.
buffer_size_in_bytes – [in] size in bytes of the buffer
buffer – [in] temporary storage buffer allocated by the user.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_value – if stage is invalid.
rocsparse_status_invalid_pointer – descr, source, target, or buffer pointer is invalid.

rocsparse_check_spmat#

rocsparse_status rocsparse_check_spmat(rocsparse_handle handle, rocsparse_const_spmat_descr mat, rocsparse_data_status *data_status, rocsparse_check_spmat_stage stage, size_t *buffer_size, void *temp_buffer)#

Check matrix to see if it is valid.

rocsparse_check_spmat checks if the input matrix is valid.

rocsparse_check_spmat requires two steps to complete. First the user calls rocsparse_check_spmat with the stage parameter set to rocsparse_check_spmat_stage_buffer_size which determines the size of the temporary buffer needed in the second step. The user allocates this buffer and calls rocsparse_check_spmat with the stage parameter set to rocsparse_check_spmat_stage_compute which checks the input matrix for errors. Any detected errors in the input matrix are reported in the data_status (passed to the function as a host pointer).

Uniform Precisions:

A
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c

Example

In this example we want to check whether a matrix is upper triangular. The matrix passed to rocsparse_check_spmat is invalid because it contains an entry in the lower triangular part of the matrix.

int main()
{
    // 1 2 0 0
    // 3 0 4 0  // <-------contains a "3" in the lower part of matrix
    // 0 0 1 1
    // 0 0 0 2
    std::vector<int>   hcsr_row_ptr = {0, 2, 4, 6, 7};
    std::vector<int>   hcsr_col_ind = {0, 1, 0, 2, 2, 3, 3};
    std::vector<float> hcsr_val     = {1, 2, 3, 4, 1, 1, 2};

    int M   = 4;
    int N   = 4;
    int nnz = 7;

    int*   dcsr_row_ptr;
    int*   dcsr_col_ind;
    float* dcsr_val;
    HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (M + 1)));
    HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz));
    HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz));

    HIP_CHECK(
        hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (M + 1), hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice));

    rocsparse_handle handle;
    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    rocsparse_spmat_descr matA;
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA,
                                               M,
                                               N,
                                               nnz,
                                               dcsr_row_ptr,
                                               dcsr_col_ind,
                                               dcsr_val,
                                               rocsparse_indextype_i32,
                                               rocsparse_indextype_i32,
                                               rocsparse_index_base_zero,
                                               rocsparse_datatype_f32_r));

    const rocsparse_fill_mode   fill_mode   = rocsparse_fill_mode_upper;
    const rocsparse_matrix_type matrix_type = rocsparse_matrix_type_triangular;

    ROCSPARSE_CHECK(rocsparse_spmat_set_attribute(
        matA, rocsparse_spmat_fill_mode, &fill_mode, sizeof(fill_mode)));
    ROCSPARSE_CHECK(rocsparse_spmat_set_attribute(
        matA, rocsparse_spmat_matrix_type, &matrix_type, sizeof(matrix_type)));

    rocsparse_data_status data_status;

    size_t buffer_size;
    ROCSPARSE_CHECK(rocsparse_check_spmat(handle,
                                          matA,
                                          &data_status,
                                          rocsparse_check_spmat_stage_buffer_size,
                                          &buffer_size,
                                          nullptr));

    void* dbuffer;
    HIP_CHECK(hipMalloc(&dbuffer, buffer_size));

    ROCSPARSE_CHECK(rocsparse_check_spmat(
        handle, matA, &data_status, rocsparse_check_spmat_stage_compute, &buffer_size, dbuffer));

    std::cout << "data_status: " << data_status << std::endl;

    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA));

    HIP_CHECK(hipFree(dbuffer));
    HIP_CHECK(hipFree(dcsr_row_ptr));
    HIP_CHECK(hipFree(dcsr_col_ind));
    HIP_CHECK(hipFree(dcsr_val));

    return 0;
}

Note

This function writes the required allocation size (in bytes) to buffer_size and returns without performing the checking operation, when stage is equal to rocsparse_check_spmat_stage_buffer_size.

Note

The sparse matrix formats currently supported are: rocsparse_format_coo, rocsparse_format_csr, rocsparse_format_csc and rocsparse_format_ell.

Note

check_spmat requires two stages to complete. The first stage rocsparse_check_spmat_stage_buffer_size will return the size of the temporary storage buffer that is required for subsequent calls to rocsparse_check_spmat. In the final stage rocsparse_check_spmat_stage_compute, the actual computation is performed.

Note

This routine does not support execution in a hipGraph context.

Parameters:

handle – [in] handle to the rocsparse library context queue.
mat – [in] matrix descriptor.
data_status – [out] modified to indicate the status of the data
stage – [in] check_matrix stage for the matrix computation.
buffer_size – [out] number of bytes of the temporary storage buffer. buffer_size is set when temp_buffer is nullptr.
temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to buffer_size and function returns without performing the checking operation.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – mat, buffer_size, temp_buffer or data_status pointer is invalid.
rocsparse_status_invalid_value – the value of stage is incorrect.

rocsparse_spitsv#

rocsparse_status rocsparse_spitsv(rocsparse_handle handle, rocsparse_int *host_nmaxiter, const void *host_tol, void *host_history, rocsparse_operation trans, const void *alpha, const rocsparse_spmat_descr mat, const rocsparse_dnvec_descr x, const rocsparse_dnvec_descr y, rocsparse_datatype compute_type, rocsparse_spitsv_alg alg, rocsparse_spitsv_stage stage, size_t *buffer_size, void *temp_buffer)#

Sparse iterative triangular solve.

rocsparse_spitsv solves, using the Jacobi iterative method, a sparse triangular linear system of a sparse \(m \times m\) matrix, defined in CSR format, a dense solution vector \(y\) and the right-hand side \(x\) that is multiplied by \(\alpha\), such that

\[ op(A) y = \alpha x, \]

with

\[\begin{split} op(A) = \left\{ \begin{array}{ll} A, & \text{if trans == rocsparse_operation_none} \\ A^T, & \text{if trans == rocsparse_operation_transpose} \\ A^H, & \text{if trans == rocsparse_operation_conjugate_transpose} \end{array} \right. \end{split}\]

The Jacobi method applied to the sparse triangular linear system above gives

\[ y_{k+1} = y_{k} + D^{-1} ( \alpha x - (D + T) y_{k} ) \]

with \(A = D + T\), \(D\) the diagonal of \(A\) and \(T\) the strict triangular part of \(A\).

The above equation can be also written as

\[ y_{k+1} = y_{k} + D^{-1} r_k \]

where

\[ r_k = \alpha x - (D + T) y_k. \]

Starting with \(y_0 = \) y, the method iterates while \( k \lt \) host_nmaxiter and until

\[ \Vert r_k \Vert_{\infty} \le \epsilon, \]

with \(\epsilon\) = host_tol.

rocsparse_spitsv requires three stages to complete. First, the user passes the rocsparse_spitsv_stage_buffer_size stage to determine the size of the required temporary storage buffer. Next, the user allocates this buffer and calls rocsparse_spitsv again with the rocsparse_spitsv_stage_preprocess stage which will preprocess data and store it in the temporary buffer. Finally, the user calls rocsparse_spitsv with the rocsparse_spitsv_stage_compute stage to perform the actual computation. Once all calls to rocsparse_spitsv are complete, the temporary buffer can be deallocated.

rocsparse_spitsv supports rocsparse_indextype_i32 and rocsparse_indextype_i64 index precisions for storing the row pointer and column indices arrays of the sparse matrix. rocsparse_spitsv supports the following data types for \(op(A)\), \(x\), \(y\) and compute types for \(\alpha\):

Uniform Precisions:

A / X / Y / compute_type
rocsparse_datatype_f32_r
rocsparse_datatype_f64_r
rocsparse_datatype_f32_c
rocsparse_datatype_f64_c

Example

int main()
{
    //     1 0 0 0
    // A = 0 2 0 0
    //     5 0 3 0
    //     0 0 9 4
    int   m      = 4;
    int   n      = 4;
    int   nnz    = 6;
    float halpha = 1.0f;

    std::vector<int>   hcsr_row_ptr = {0, 1, 2, 4, 6};
    std::vector<int>   hcsr_col_ind = {0, 1, 0, 2, 2, 3};
    std::vector<float> hcsr_val     = {1.0f, 2.0f, 5.0f, 3.0f, 9.0f, 4.0f};
    std::vector<float> hx(m, 1.0f);
    std::vector<float> hy(m, 1.0f);

    // Offload data to device
    int*   dcsr_row_ptr;
    int*   dcsr_col_ind;
    float* dcsr_val;
    float* dx;
    float* dy;
    HIP_CHECK(hipMalloc(&dcsr_row_ptr, sizeof(int) * (m + 1)));
    HIP_CHECK(hipMalloc(&dcsr_col_ind, sizeof(int) * nnz));
    HIP_CHECK(hipMalloc(&dcsr_val, sizeof(float) * nnz));
    HIP_CHECK(hipMalloc(&dx, sizeof(float) * m));
    HIP_CHECK(hipMalloc(&dy, sizeof(float) * m));

    HIP_CHECK(
        hipMemcpy(dcsr_row_ptr, hcsr_row_ptr.data(), sizeof(int) * (m + 1), hipMemcpyHostToDevice));
    HIP_CHECK(
        hipMemcpy(dcsr_col_ind, hcsr_col_ind.data(), sizeof(int) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dcsr_val, hcsr_val.data(), sizeof(float) * nnz, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dx, hx.data(), sizeof(float) * m, hipMemcpyHostToDevice));
    HIP_CHECK(hipMemcpy(dy, hy.data(), sizeof(float) * m, hipMemcpyHostToDevice));

    rocsparse_handle      handle;
    rocsparse_spmat_descr matA;
    rocsparse_dnvec_descr vecX;
    rocsparse_dnvec_descr vecY;

    rocsparse_indextype  row_ptr_type = rocsparse_indextype_i32;
    rocsparse_indextype  col_idx_type = rocsparse_indextype_i32;
    rocsparse_datatype   data_type    = rocsparse_datatype_f32_r;
    rocsparse_datatype   compute_type = rocsparse_datatype_f32_r;
    rocsparse_index_base idx_base     = rocsparse_index_base_zero;

    ROCSPARSE_CHECK(rocsparse_create_handle(&handle));

    // Create sparse matrix A
    ROCSPARSE_CHECK(rocsparse_create_csr_descr(&matA,
                                               m,
                                               m,
                                               nnz,
                                               dcsr_row_ptr,
                                               dcsr_col_ind,
                                               dcsr_val,
                                               row_ptr_type,
                                               col_idx_type,
                                               idx_base,
                                               data_type));
    ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecX, m, dx, data_type));
    ROCSPARSE_CHECK(rocsparse_create_dnvec_descr(&vecY, m, dy, data_type));

    rocsparse_int host_nmaxiter[1] = {200};
    float         host_tol[1]      = {1.0e-6};
    float         host_history[200];

    size_t buffer_size = 0;
    ROCSPARSE_CHECK(rocsparse_spitsv(handle,
                                     &host_nmaxiter[0],
                                     &host_tol[0],
                                     &host_history[0],
                                     rocsparse_operation_none,
                                     &halpha,
                                     matA,
                                     vecX,
                                     vecY,
                                     compute_type,
                                     rocsparse_spitsv_alg_default,
                                     rocsparse_spitsv_stage_buffer_size,
                                     &buffer_size,
                                     nullptr));

    void* temp_buffer;
    HIP_CHECK(hipMalloc(&temp_buffer, buffer_size));

    ROCSPARSE_CHECK(rocsparse_spitsv(handle,
                                     &host_nmaxiter[0],
                                     &host_tol[0],
                                     &host_history[0],
                                     rocsparse_operation_none,
                                     &halpha,
                                     matA,
                                     vecX,
                                     vecY,
                                     compute_type,
                                     rocsparse_spitsv_alg_default,
                                     rocsparse_spitsv_stage_preprocess,
                                     nullptr,
                                     temp_buffer));

    ROCSPARSE_CHECK(rocsparse_spitsv(handle,
                                     &host_nmaxiter[0],
                                     &host_tol[0],
                                     &host_history[0],
                                     rocsparse_operation_none,
                                     &halpha,
                                     matA,
                                     vecX,
                                     vecY,
                                     compute_type,
                                     rocsparse_spitsv_alg_default,
                                     rocsparse_spitsv_stage_compute,
                                     &buffer_size,
                                     temp_buffer));

    HIP_CHECK(hipMemcpy(hy.data(), dy, sizeof(float) * m, hipMemcpyDeviceToHost));

    // Clear rocSPARSE
    ROCSPARSE_CHECK(rocsparse_destroy_spmat_descr(matA));
    ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecX));
    ROCSPARSE_CHECK(rocsparse_destroy_dnvec_descr(vecY));

    ROCSPARSE_CHECK(rocsparse_destroy_handle(handle));

    // Clear device memory
    HIP_CHECK(hipFree(dcsr_row_ptr));
    HIP_CHECK(hipFree(dcsr_col_ind));
    HIP_CHECK(hipFree(dcsr_val));
    HIP_CHECK(hipFree(dx));
    HIP_CHECK(hipFree(dy));

    return 0;
}

Note

This routine does not support execution in a hipGraph context.

Parameters:

handle – [in] handle to the rocsparse library context queue.
host_nmaxiter – [inout] maximum number of iteration on input and number of iteration on output. If the output number of iterations is strictly less than the input maximum number of iterations, then the algorithm converged.
host_tol – [in] if the pointer is null then loop will execute nmaxiter[0] iterations. The precision is float for f32 based calculation (including the complex case) and double for f64 based calculation (including the complex case).
host_history – [out] Optional array to record the norm of the residual before each iteration. The precision is float for f32 based calculation (including the complex case) and double for f64 based calculation (including the complex case).
trans – [in] matrix operation type.
alpha – [in] scalar \(\alpha\).
mat – [in] matrix descriptor.
x – [in] vector descriptor.
y – [inout] vector descriptor.
compute_type – [in] floating point precision for the SpITSV computation.
alg – [in] SpITSV algorithm for the SpITSV computation.
stage – [in] SpITSV stage for the SpITSV computation.
buffer_size – [out] number of bytes of the temporary storage buffer.
temp_buffer – [in] temporary storage buffer allocated by the user. When a nullptr is passed, the required allocation size (in bytes) is written to buffer_size and function returns without performing the SpITSV operation.

Return values:

rocsparse_status_success – the operation completed successfully.
rocsparse_status_invalid_handle – the library context was not initialized.
rocsparse_status_invalid_pointer – alpha, mat, x, y, descr or buffer_size pointer is invalid.
rocsparse_status_not_implemented – trans, compute_type, stage or alg is currently not supported.

Sparse generic functions

Contents

Sparse generic functions#

rocsparse_axpby()#

rocsparse_gather()#

rocsparse_scatter()#

rocsparse_rot()#

rocsparse_spvv()#

rocsparse_spmv()#

rocsparse_v2_spmv_buffer_size()#

rocsparse_v2_spmv()#

rocsparse_spsv()#

rocsparse_spsm()#

rocsparse_spmm()#

rocsparse_spgemm()#

rocsparse_spgeam_buffer_size()#

rocsparse_spgeam()#

rocsparse_sddmm_buffer_size()#

rocsparse_sddmm_preprocess()#

rocsparse_sddmm()#

rocsparse_dense_to_sparse()#

rocsparse_sparse_to_dense()#

rocsparse_sparse_to_sparse_buffer_size()#

rocsparse_sparse_to_sparse()#

rocsparse_extract_buffer_size()#

rocsparse_extract_nnz#

rocsparse_extract()#

rocsparse_check_spmat#

rocsparse_spitsv#