Design Documentation#

Design and Philosophy#

The rocSPARSE library is developed using the Hourglass API approach. This is especially helpful to offer a thin C89 API to the user and still get all the convenience of C++. As a side effect, ABI related binary compatibility issues can be avoided. Furthermore, this approach allows rocSPARSE routines to be used by other programming languages.

In public API header files, rocSPARSE only relies on functions, pointers, forward declared structs, enumerations and type defs. Additionally, rocSPARSE introduces multiple library and object handles by using opaque types to hide layout and implementation details from the user.

Temporary Device Memory#

Many routines exposed by the rocSPARSE API require a temporary storage buffer on the device. rocSPARSE notion is that the user is responsible for such buffer allocation and deallocation. Hence, allocated buffers can be re-used and do not need to be regularly (de)allocated on every single API call. For this purpose, routines that require a temporary storage buffer offer a special API function to query for the storage buffer size, e.g. rocsparse_scsrsv_buffer_size().

Contributing#

Contribution License Agreement#

  1. The code I am contributing is mine, and I have the right to license it.

  2. By submitting a pull request for this project I am granting you a license to distribute said code under the MIT License for the project.

How to contribute#

Our code contriubtion guidelines closely follows the model of GitHub pull-requests. This repository follows the git flow workflow, which dictates a /master branch where releases are cut, and a /develop branch which serves as an integration branch for new code.

A git extention has been developed to ease the use of the ‘git flow’ methodology, but requires manual installation by the user. Please refer to the projects wiki.

Pull-request guidelines#

  • Target the develop branch for integration.

  • Ensure code builds successfully.

  • Do not break existing test cases

  • New functionality will only be merged with new unit tests.

    • New unit tests should integrate within the existing googletest framework.

    • Tests must have good code coverage.

    • Code must also have benchmark tests, and performance must approach the compute bound limit or memory bound limit.

StyleGuide#

Interface

  • All public APIs are C89 compatible; all other library code should use C++14.

  • Our minimum supported compiler is clang 3.6.

  • Avoid CamelCase.

  • This rule applies specifically to publicly visible APIs, but is also encouraged (not mandated) for internal code.

Format

C and C++ code is formatted using clang-format. To format a file, use

/opt/rocm/llvm/bin/clang-format -style=file -i <file>

To format all files, run the following script in rocSPARSE directory:

#!/bin/bash

find . -iname '*.h' \
-o -iname '*.hpp' \
-o -iname '*.cpp' \
-o -iname '*.h.in' \
-o -iname '*.hpp.in' \
-o -iname '*.cpp.in' \
-o -iname '*.cl' \
| grep -v 'build' \
| xargs -n 1 -P 8 -I{} /opt/rocm/llvm/bin/clang-format -style=file -i {}

Library Source Organization#

The library/include directory#

This directory contains all files that are exposed to the user. The rocSPARSE API, is declared here.

File

Description

rocsparse.h

Includes all other API related rocSPARSE header files.

rocsparse-auxiliary.h

Declares all rocSPARSE auxiliary functions, such as handle and descriptor management.

rocsparse-complex-types.h

Defines the rocSPARSE complex data types rocsparse_float_complex and rocsparse_double_complex.

rocsparse-functions.h

Declares all rocSPARSE Sparse Linear Algebra Subroutines of Level1, 2, 3, Extra, Preconditioner and Format Conversion.

rocsparse-types.h

Defines all data types used by rocSPARSE.

rocsparse-version.h.in

Provides the configured version and settings that is initially set by CMake during compilation.

The library/src/ directory#

This directory contains all rocSPARSE library source files. The root of the library/src/ directory hosts the implementation of the library handle and auxiliary functions. Furthermore, each sub-directory is responsible for the specific class of sparse linear algebra subroutines. Finally, the library/src/include directory defines Commonly Shared Device-Code, Status-Flag Macros, The rocsparse_mat_info Structure and Logging.

File

Description

handle.cpp

Implementation of opaque handle structures.

rocsparse_auxiliary.cpp

Implementation of auxiliary functions, e.g. create and destroy handles.

status.cpp

Implementation of hipError_t to rocsparse_status conversion function.

include/common.h

Commonly used functions among several rocSPARSE routines, see Commonly Shared Device-Code.

include/definitions.h

Status-flag macros are defined here, see Status-Flag Macros.

include/handle.h

Declaration of opaque handle structures.

include/logging.h

Implementation of different rocSPARSE logging helper functions.

include/status.h

Declaration of hipError_t to rocsparse_status conversion function.

include/utility.h

Implementation of different rocSPARSE logging functionality.

The clients/ directory#

This directory contains all clients, e.g. samples, unit tests and benchmarks. Further details are given in Clients.

Sparse Linear Algebra Subroutines#

Each sparse linear algebra subroutine is implemented in a set of source files in the corresponding directory: rocsparse_subroutine.cpp, rocsparse_subroutine.hpp and subroutine_device.h.

rocsparse_subroutine.cpp implements the C wrapper and the API functionality for each precision supported. Furthermore, rocsparse_subroutine.hpp implements the API functionality, using the precision as template parameter. Finally, subroutine_device.h implements the device code, required for the computation of the subroutine.

Note

Each API exposed subroutine is expected to return a rocsparse_status.

Note

Additionally, each device function is expected to use the user given stream which is accessible through the libraries handle.

Below is a sample for rocsparse_subroutine.cpp, rocsparse_subroutine.hpp and subroutine_device.h.

Listing 1 rocsparse_subroutine.cpp#
#include "rocsparse.h"
#include "rocsparse_subroutine.hpp"

/*
 * ===========================
 *    C wrapper
 * ===========================
 */

extern "C" rocsparse_status rocsparse_ssubroutine(rocsparse_handle handle,
                                                  rocsparse_int    m,
                                                  const float*     alpha,
                                                  float*           val)
{
    return rocsparse_subroutine_template(handle, m, alpha, val);
}

extern "C" rocsparse_status rocsparse_dsubroutine(rocsparse_handle handle,
                                                  rocsparse_int    m,
                                                  const double*    alpha,
                                                  double*          val)
{
    return rocsparse_subroutine_template(handle, m, alpha, val);
}

extern "C" rocsparse_status rocsparse_csubroutine(rocsparse_handle               handle,
                                                  rocsparse_int                  m,
                                                  const rocsparse_float_complex* alpha,
                                                  rocsparse_float_complex*       val)
{
    return rocsparse_subroutine_template(handle, m, alpha, val);
}

extern "C" rocsparse_status rocsparse_zsubroutine(rocsparse_handle                handle,
                                                  rocsparse_int                   m,
                                                  const rocsparse_double_complex* alpha,
                                                  rocsparse_double_complex*       val)
{
    return rocsparse_subroutine_template(handle, m, alpha, val);
}
Listing 2 rocsparse_subroutine.hpp#
#pragma once
#ifndef ROCSPARSE_SUBROUTINE_HPP
#define ROCSPARSE_SUBROUTINE_HPP

#include "definitions.h"
#include "handle.h"
#include "rocsparse.h"
#include "subroutine_device.h"
#include "utility.h"

#include <hip/hip_runtime.h>

template <typename T>
__global__ void subroutine_kernel_host_pointer(rocsparse_int m, T alpha, T* val)
{
    subroutine_device(m, alpha, val);
}

template <typename T>
__global__ void subroutine_kernel_device_pointer(rocsparse_int m, const T* alpha, T* val)
{
    subroutine_device(m, *alpha, val);
}

template <typename T>
rocsparse_status rocsparse_subroutine_template(rocsparse_handle handle,
                                               rocsparse_int    m,
                                               const T*         alpha,
                                               T*               val)
{
    // Check for valid handle
    if(handle == nullptr)
    {
        return rocsparse_status_invalid_handle;
    }

    // Logging
    if(handle->pointer_mode == rocsparse_pointer_mode_host)
    {
        log_trace(handle,
                  replaceX<T>("rocsparse_Xsubroutine"),
                  m,
                  *alpha,
                  (const void*&)val);

        log_bench(handle,
                  "./rocsparse-bench -f subroutine -r",
                  replaceX<T>("X"),
                  "-m",
                  m,
                  "--alpha",
                  *alpha);
    }
    else
    {
        log_trace(handle,
                  replaceX<T>("rocsparse_Xsubroutine"),
                  m,
                  (const void*&)alpha,
                  (const void*&)val);
    }

    // Check size
    if(m < 0)
    {
        return rocsparse_status_invalid_size;
    }

    // Quick return if possible
    if(m == 0)
    {
        return rocsparse_status_success;
    }

    // Check pointer arguments
    if(alpha == nullptr || val == nullptr)
    {
        return rocsparse_status_invalid_pointer;
    }

    // Differentiate between the pointer modes
    if(handle->pointer_mode == rocsparse_pointer_mode_device)
    {
        // Launch kernel
        hipLaunchKernelGGL((subroutine_kernel_device_pointer<T>),
                           dim3(...),
                           dim3(...),
                           0,
                           handle->stream,
                           m,
                           alpha,
                           val);
    }
    else
    {
        // Launch kernel
        hipLaunchKernelGGL((subroutine_kernel_host_pointer<T>),
                           dim3(...),
                           dim3(...),
                           0,
                           handle->stream,
                           m,
                           *alpha,
                           val);
    }

    return rocsparse_status_success;
}

#endif // ROCSPARSE_SUBROUTINE_HPP
Listing 3 subroutine_device.h#
#pragma once
#ifndef SUBROUTINE_DEVICE_H
#define SUBROUTINE_DEVICE_H

#include <hip/hip_runtime.h>

template <typename T>
__device__ void subroutine_device(rocsparse_int m, T alpha, T* val)
{
    ...
}

#endif // SUBROUTINE_DEVICE_H

Important Functions and Data Structures#

This section describes important rocSPARSE functions and data structures.

Commonly Shared Device-Code#

The following table lists multiple device functions that are shared among several rocSPARSE functions.

Device function

Description

rocsparse_clz()

Computes the leftmost significant bit position for int and int64 types.

rocsparse_one()

Returns a pointer to 1 for the specified precision.

rocsparse_ldg()

Wrapper to __ldg() for int, int64, single, double real and complex types.

rocsparse_nontemporal_load()

Non-temporal memory load access for int, int64, single, double real and complex types.

rocsparse_nontemporal_store()

Non-temporal memory store access for int, int64, single, double real and complex types.

rocsparse_mul24()

Multiply 24-bit integer values.

rocsparse_mad24()

Multiply 24-bit integers and add a 32-bit value.

rocsparse_blockreduce_sum()

Block-wide reduction sum for int, int64, single, double real and complex types.

rocsparse_blockreduce_max()

Block-wide reduction max for int, int64, single, double real and complex types.

rocsparse_blockreduce_min()

Block-wide reduction min for int, int64, single, double real and complex types.

rocsparse_wfreduce_max()

DPP based wavefront reduction max for int type.

rocsparse_wfreduce_min()

DPP based wavefront reduction min for int and int64 types.

rocsparse_wfreduce_sum()

DPP based wavefront reduction sum for int, int64, single, double real and complex types.

Status-Flag Macros#

The following table lists the status-flag macros available in rocSPARSE and their purpose.

Macro

Description

RETURN_IF_HIP_ERROR(stat)

Returns, if stat is not equal to hipSuccess

THROW_IF_HIP_ERROR(stat)

Throws an exception, if stat is not equal to hipSuccess

PRINT_IF_HIP_ERROR(stat)

Prints an error message, if stat is not equal to hipSuccess

RETURN_IF_ROCSPARSE_ERROR(stat)

Returns, if stat is not equal to rocsparse_status_success

The rocsparse_mat_info Structure#

The rocSPARSE rocsparse_mat_info is a structure holding all matrix meta information that is gathered during analysis routines.

The following table lists all currently available internal meta data structures:

Meta data structure

Description

rocsparse_csrmv_info

Structure to hold analysis meta data for sparse matrix vector multiplication in CSR format.

rocsparse_csrtr_info

Structure to hold analysis meta data for operations on sparse triangular matrices, e.g. dependency graph.

rocsparse_csrgemm_info

Structure to hold analysis meta data for sparse matrix sparse matrix multiplication in CSR format.

Cross-Routine Data Sharing#

Already collected meta data, such as the dependency graph of a sparse matrix, can be shared among multiple routines. For example, if the incomplete LU factorization of a sparse matrix is computed, the gathered analysis data can be shared for subsequent lower triangular solves of the same matrix. This behavior can be specified by the rocsparse_analysis_policy parameter.

The following table lists subroutines that can in some cases share meta data:

Note

It is important to note, that on rocSPARSE extensions, this functionality can be further expanded to improve meta data collection performance significantly.

Clients#

rocSPARSE clients host a variety of different examples as well as a unit test and benchmarking package. For detailed instructions on how to build rocSPARSE with clients, see Building and Installing.

Examples#

The examples collection offers sample implementations of the rocSPARSE API. In the following table, available examples with description, are listed.

Example

Description

example_coomv

Perform sparse matrix vector multiplication in COO format

example_csrmv

Perform sparse matrix vector multiplication in CSR format

example_ellmv

Perform sparse matrix vector multiplication in ELL format

example_handle

Show rocSPARSE handle initialization and finalization

example_hybmv

Perform sparse matrix vector multiplication in HYB format

Unit Tests#

Multiple unit tests are available to test for bad arguments, invalid parameters and sparse routine functionality. The unit tests are based on googletest. The tests cover all routines that are exposed by the API, including all available floating-point precision.

Benchmarks#

rocSPARSE offers a benchmarking tool that can be compiled with the clients package. The benchmark tool can perform any API exposed routine combined with time measurement. To set up a benchmark run, multiple options are available.

Command-line option

Description

help, h

Prints the help message

sizem, m

Specify the m parameter, e.g. the number of rows of a sparse matrix

sizen, n

Specify the n parameter, e.g. the number of columns of a sparse matrix or the length of a dense vector

sizek, k

Specify the k parameter, e.g. the number of rows of a dense matrix

sizennz, z

Specify the nnz parameter, e.g. the number of non-zero entries of a sparse vector

blockdim

Specify the blockdim parameter, e.g. the block dimension in BSR matrices

row-blockdimA

Specify the row-blockdimA parameter, e.g. the row block dimension in GEBSR matrices

col-blockdimA

Specify the col-blockdimA parameter, e.g. the column block dimension in GEBSR matrices

row-blockdimB

Specify the row-blockdimB parameter, e.g. the row block dimension in GEBSR matrices

col-blockdimB

Specify the col-blockdimB parameter, e.g. the column block dimension in GEBSR matrices

mtx

Read from MatrixMarket (.mtx) format. This will override parameters m, n and z

rocalution

Read from rocALUTION format. This will override parameters m, n, z, mtx and laplacian-dim

laplacian-dim

Assemble a 2D/3D Laplacian matrix with dimensions dimx, dimy and dimz. dimz is optional. This will override parameters m, n, z and mtx

alpha

Specify the scalar \(\alpha\)

beta

Specify the scalar \(\beta\)

transposeA

Specify whether matrix A is (conjugate) transposed or not, see rocsparse_operation

transposeB

Specify whether matrix B is (conjugate) transposed or not, see rocsparse_operation

indexbaseA

Specify the index base of matrix A, see rocsparse_index_base

indexbaseB

Specify the index base of matrix B, see rocsparse_index_base

indexbaseC

Specify the index base of matrix C, see rocsparse_index_base

indexbaseD

Specify the index base of matrix D, see rocsparse_index_base

action

Specify whether the operation is performed symbolically or numerically, see rocsparse_action

hybpart

Specify the HYB partitioning type, see rocsparse_hyb_partition

diag

Specify the diagonal type of a sparse matrix, see rocsparse_diag_type

uplo

Specify the fill mode of a sparse matrix, see rocsparse_fill_mode

storage

Specify the storage mode of a sparse matrix, see rocsparse_storage_mode

apolicy

Specify the analysis policy, see rocsparse_analysis_policy

function, f

Specify the API exposed subroutine to benchmark

indextype

Index precision: integer 32 bit, integer 64 bit

precision, r

Floating-point precision: single real, double real, single complex, double complex

verify, v

Specify whether the results should be validated with the host reference implementation

iters, i

Iterations to run inside the timing loop

device, d

Set the device to be used for subsequent benchmark runs

direction

Specify whether BSR blocks should be laid out in row-major storage or by column-major storage

order

Specify whether a dense matrix is laid out in column-major or row-major storage

format

Specify whether a sparse matrix is laid out in coo, coo_aos, csr, csc, or ell format

denseld

Specify the leading dimension of a dense matrix

batch_count

Specify the batch count for batched routines

batch_count_A

Specify the batch count for batched routines

batch_count_B

Specify the batch count for batched routines

batch_count_C

Specify the batch count for batched routines

batch_stride

Specify the batch stride for batched routines

memstat-report

Specify the output filename for memory report

spmv_alg

Specify the algorithm to use when running SpMV

spmm_alg

Specify the algorithm to use when running SpMM

gtsv_interleaved_alg

Specify the algorithm to use when running gtsv interleaved batch routine