Functionality extension guidelines

Functionality extension guidelines#

This document provides information about the different ways to implement user-specific routines, solvers, or preconditioners to the rocALUTION library package. Additional features can be added in multiple ways. Additional solver and preconditioner functionality that uses the existing backend functionality performs well on accelerator devices without the need for expert GPU programming knowledge. Also, those not interested in using accelerators are not required to perform HIP and GPU-related programming tasks to add additional functionality.

In the following sections, different levels of functionality enhancements are illustrated. These examples can be used as guidelines to extend rocALUTION step by step with your own routines. Please note, that user-added routines can also be added to the main GitHub repository using pull requests.

`LocalMatrix` functionality extension#

This section demonstrates how to extend the LocalMatrix class with an additional routine. The routine supports both Host and Accelerator backend. Furthermore, the routine requires the matrix to be in CSR format. Here are the steps to extend the LocalMatrix functionality:

1. API enhancement#

To make the new routine available through the API, modify the LocalMatrix class. The corresponding header file local_matrix.hpp is located in src/base/. The new routines can be added as public member functions as shown below:

...
void ConvertTo(unsigned int matrix_format, int blockdim);

void MyNewFunctionality(void);

virtual void Apply(const LocalVector<ValueType>& in, LocalVector<ValueType>* out) const;
virtual void ApplyAdd(const LocalVector<ValueType>& in,
...

For the implementation of the new API function, it is important to know the location of the availability of this functionality. To add support for any backend and matrix format, format conversions are required if MyNewFunctionality() is only supported for CSR metrices. This is subject to the API function implementation:

template <typename ValueType>
void LocalMatrix<ValueType>::MyNewFunctionality(void)
{
    // Debug logging
    log_debug(this, "LocalMatrix::MyNewFunctionality()");

#ifdef DEBUG_MODE
    // If we are in debug mode, perform an additional matrix sanity check
    this->Check();
#endif

    // If no non-zero entries, do nothing
    if(this->GetNnz() > 0)
    {
        // As we want to implement our function only for CSR, we first need to convert
        // the matrix to CSR format
        unsigned int format = this->GetFormat();
        int blockdim = this->GetBlockDimension();
        this->ConvertToCSR();

        // Call the corresponding base matrix implementation
        bool err = this->matrix_->MyNewFunctionality();

        // Check its return type
        if((err == false) && (this->is_host_() == true))
        {
            // If our matrix is on the host, the function call failed.
            LOG_INFO("Computation of LocalMatrix::MyNewFunctionality() failed");
            this->Info();
            FATAL_ERROR(__FILE__, __LINE__);
        }

        // Run backup algorithm on host, in case the accelerator version failed
        if(err == false)
        {
            // Move matrix to host
            bool is_accel = this->is_accel_();
            this->MoveToHost();

            // Try again
            if(this->matrix_->MyNewFunctionality() == false)
            {
                LOG_INFO("Computation of LocalMatrix::MyNewFunctionality() failed");
                this->Info();
                FATAL_ERROR(__FILE__, __LINE__);
            }

            // On a successful host call, move the data back to the accelerator
            // if initial data was on the accelerator
            if(is_accel == true)
            {
                // Print a warning, that the algorithm was performed on the host
                // even though the initial data was on the device
                LOG_VERBOSE_INFO(2, "*** warning: LocalMatrix::MyNewFunctionality() was performed on the host");

                this->MoveToAccelerator();
            }
        }

        // Convert the matrix back to CSR format
        if(format != CSR)
        {
            // Print a warning, that the algorithm was performed in CSR format
            // even though the initial matrix format was different
            LOG_VERBOSE_INFO(2, "*** warning: LocalMatrix::MyNewFunctionality() was performed in CSR format");

            this->ConvertTo(format, blockdim);
        }
    }

#ifdef DEBUG_MODE
    // Perform additional sanity check in debug mode, because this is a non-const function
    this->Check();
#endif
}

Similarly, you can implement host-only functions. In this case, initial data explicitly needs to be moved to the host backend using the API implementation.

The next step is to implement the actual functionality in the BaseMatrix class.

2. Enhancement of the `BaseMatrix` class#

To make the new routine available in the base class, first modify the BaseMatrix class. The corresponding header file base_matrix.hpp is located in src/base/. The new routines can be added as public member functions, e.g.

...
virtual bool ILU0Factorize(void);

/// Perform MyNewFunctionality algorithm
virtual bool MyNewFunctionality(void);

/// Perform LU factorization
...

We don’t implement the purely virtual MyNewFunctionality() as we don’t supply an implementation for all base classes. We decided to implement it only for CSR format and hence need to return an error flag, so that the LocalMatrix class is aware of the failure and can convert it to CSR.

template <typename ValueType>
bool MyNewFunctionality(void)
{
    return false;
}

3. Platform-specific host implementation#

To satisfy the rocALUTION host backup philosophy, there must be a host implementation available. Hence, for the new function to succeed, there must be backend implementation available. Place the host implementation in src/base/host/host_matrix_csr.cpp as we decided to make it available for CSR format.

...
virtual bool ILUTFactorize(double t, int maxrow);

virtual bool MyNewFunctionality(void);

virtual void LUAnalyse(void);
...

template <typename ValueType>
bool HostMatrixCSR<ValueType>::MyNewFunctionality(void)
{
    // Place some asserts to verify sanity of input data

    // Our algorithm works only for squared metrices
    assert(this->nrow_ == this->ncol_);
    assert(this->nnz_ > 0);

    // place the actual host based algorithm here:
    // for illustration, we scale the matrix by its inverse diagonal
    for(int i = 0; i < this->nrow_; ++i)
    {
        int row_begin = this->mat_.row_offset[i];
        int row_end   = this->mat_.row_offset[i + 1];

        bool diag_found = false;
        ValueType inv_diag;

        // Find the diagonal entry
        for(int j = row_begin; j < row_end; ++j)
        {
            if(this->mat_.col[j] == i)
            {
                diag_found = true;
                inv_diag = static_cast<ValueType>(1) / this->mat_.val[j];
            }
        }

        // Our algorithm works only with full rank
        assert(diag_found == true);

        // Scale the row
        for(int j = row_begin; j < row_end; ++j)
        {
            this->mat_.val[j] *= inv_diag;
        }
    }

    return true;
}

4. Platform-specific HIP implementation#

You can now add an additional implementation for the HIP backend using HIP programming framework. This is required to make your algorithm available on accelerators so that rocALUTION doesn’t need to switch to the host backend on function calls anymore. Add the HIP implementation src/base/hip/hip_matrix_csr.cpp in this case.

...
virtual bool ILU0Factorize(void);

virtual bool MyNewFunctionality(void);

virtual bool ICFactorize(BaseVector<ValueType>* inv_diag = NULL);
...

template <typename ValueType>
bool HIPAcceleratorMatrixCSR<ValueType>::MyNewFunctionality(void)
{
    // Place some asserts to verify sanity of input data

    // Our algorithm works only for squared metrices
    assert(this->nrow_ == this->ncol_);
    assert(this->nnz_ > 0);

    // Enqueue the HIP kernel
    hipLaunchKernelGGL((kernel_csr_mynewfunctionality),
                       dim3((this->nrow_ - 1) / this->local_backend_.HIP_block_size + 1),
                       dim3(this->local_backend_.HIP_block_size),
                       0,
                       0,
                       this->mat_.row_offset,
                       this->mat_.col,
                       this->mat_.val);

    // Check for HIP execution error before successfully returning
    CHECK_HIP_ERROR(__FILE__, __LINE__);

    return true;
}

Place the corresponding HIP kernel in src/base/hip/hip_kernels_csr.hpp.

Adding a solver#

This section demonstrates how to add a new solver to rocALUTION. Here are the steps:

Define the API for the new solver

As an example, we add a new IterativeLinearSolver. To achieve this, we use CG as a template. Thus, we first copy src/solvers/krylov/cg.hpp to src/solvers/krylov/mysolver.hpp and src/solvers/krylov.cg.cpp to src/solvers/krylov/mysolver.cpp (assuming we add a krylov subspace solvers).

Modify the cg.hpp and cg.cpp as per your requirement (e.g. change the solver name from CG to MySolver)

Implement each of the following virtual functions present in the class. Follow the implementation details given below:

MySolver(): The constructor of the new solver class.
~MySolver(): The destructor of the new solver class. It calls the Clear() function.
void Print(void) const: Prints some informations about the solver.
void Build(void): Creates all required structures of the solver, e.g. allocates memory and sets the backend of temporary objects.
void BuildMoveToAcceleratorAsync(void): Moves all solver-related objects asynchronously to the accelerator device.
void Sync(void): Synchronizes all solver related objects.
void ReBuildNumeric(void): Rebuilds the solver only numerically.
void Clear(void): Cleans up all solver-relevant structures that have been created using Build().
void SolveNonPrecond_(const VectorType& rhs, VectorType* x): Performs the solving phase Ax=y without the use of a preconditioner.
void SolvePrecond_(const VectorType& rhs, VectorType* x): Performs the solving phase Ax=y with the use of a preconditioner.
void PrintStart_(void) const: Protected function. Called when the solver starts.
void PrintEnd_(void) const: Protected function. Called when the solver ends.
void MoveToHostLocalData_(void): Protected function. Moves all local solver objects to the host.
void MoveToAcceleratorLocalData_(void): Protected function. Moves all local solver objects to the accelerator.

You can also introduce any additional solver-specific member functions.

Make the new solver visible

To make the new solver visible, add it to the src/rocalution.hpp header:

...
#include "solvers/krylov/cg.hpp"
#include "solvers/krylov/mysolver.hpp"
#include "solvers/krylov/cr.hpp"
...

Add the new solver to the CMake compilation list

The CMake compilation list is found in src/solvers/CMakeLists.txt:

...
set(SOLVERS_SOURCES
  solvers/krylov/cg.cpp
  solvers/krylov/mysolver.cpp
  solvers/krylov/fcg.cpp
...