Library source code organization#
The rocALUTION library is split into three major directories:
src/base/
: Contains all source code that is built on top of theBaseRocalution
object as well as the backend structure.src/solvers/
: Contains all solvers, preconditioners, and its control classes.src/utils/
: Contains memory (de)allocation, logging, communication, timing, and math helper functions.
src/base/
directory#
The source files in the src/base/
directory are listed below.
Backend Manager#
The support of accelerator devices is embedded in the structure of rocALUTION. The primary goal is to use this technology whenever possible to decrease the computational time.
Each technology has its own backend implementation, dealing with platform-specific functionalities such as initialization, synchronization, reservation, etc. The backends are currently available for CPU (naive, OpenMP, MPI) and GPU (HIP).
Note
Not all functions are ported and present on the accelerator backend. This limited functionality is natural, since all operations can’t be performed efficiently on the accelerators (e.g. sequential algorithms, I/O from the file system, etc.).
The Operator and Vector classes#
The Operator
and Vector
classes and their derived local and global classes are the classes available through the rocALUTION API.
While granting access to all relevant functionalities, all hardware-relevant implementation details are hidden.
Those linear operators and vectors are the main objects in rocALUTION.
They can be moved to an accelerator at run-time.
The linear operators are defined as local or global metrices (i.e. on a single node or distributed/multi-node) and local stencils (i.e. matrix-free linear operations). The only template parameter of the operators and vectors is the data type (ValueType). The figure below provides an overview of supported operators and vectors.
Each object contains a local copy of the hardware descriptor created by the init_rocalution
function.
Additionally, each local object that is derived from an operator or vector, contains a pointer to a Base-class, a Host-class and an Accelerator-class of same type (e.g. a LocalMatrix
contains pointers to BaseMatrix
, HostMatrix
and AcceleratorMatrix
).
The Base
class pointer always points either towards the Host
class or the Accelerator
class pointer depending on the runtime decision of the local object.
Base
classes and their derivatives are further explained in The BaseMatrix and BaseVector classes.
Furthermore, each global object derived from an operator or vector embeds two Local
classes of the same type to store the interior and ghost part of the global object (e.g. a GlobalVector
contains two LocalVector
).
For more details on distributed data structures, see the API reference section.
The BaseMatrix and BaseVector classes#
The data
is an object pointing to the BaseMatrix
class from either a HostMatrix
or an AcceleratorMatrix
.
The AcceleratorMatrix
is created by an object with an implementation in the backend and a matrix format.
Switching between host and accelerator metrices is performed in the LocalMatrix
class.
The LocalVector
is organized in the same way.
Each matrix format has its own class for the host and the accelerator backend.
All matrix classes are derived from the BaseMatrix
, which provides the base interface for computation as well as for accessing the data.
Each local object contains a pointer to a Base
class object.
While the Base
classes are purely virtual, their derivatives implement all platform-specific functionalities.
Each of them is coupled to a rocALUTION backend descriptor.
While the HostMatrix
, HostStencil
and HostVector
classes implement all host functionalities, AcceleratorMatrix
, AcceleratorStencil
and AcceleratorVector
contain accelerator-related device code.
Each backend specialization is located in a different directory, e.g. src/base/host
for host-related classes and src/base/hip
for accelerator/HIP-related classes.
ParallelManager#
The parallel manager class handles the communication and the mapping of the global operators.
Each global operator and vector needs to be initialized with a valid parallel manager to perform any operation.
For many distributed simulations, the underlying operator is already distributed.
This information must be passed to the parallel manager.
All communication-related functionalities for the implementation of global algorithms is available in the rocALUTION communicator in src/utils/communicator.hpp
.
For more details on distributed data structures, see the API Reference section.
src/solvers/
directory#
The Solver
and its derived classes can be found in src/solvers
.
The directory structure is further split into the sub-classes DirectLinearSolver
in src/solvers/direct
, IterativeLinearSolver
in src/solvers/krylov
, BaseMultiGrid
in src/solvers/multigrid
and Preconditioner
in src/solvers/preconditioners
.
Each solver uses an Operator
, Vector
and data type as template parameters to solve a linear system of equations.
The actual solver algorithm is implemented by the Operator
and Vector
functionality.
Most of the solvers can be performed on linear operators, e.g. LocalMatrix
, LocalStencil
and GlobalMatrix
- i.e. the solvers can be performed locally (on a shared memory system) or in a distributed manner (on a cluster) via MPI.
All solvers and preconditioners need three template parameters - Operators, Vectors and Scalar type.
The Solver class is purely virtual and provides an interface for:
SetOperator
to set the operator, which allows you to pass the matrix here.Build
to build the solver (including preconditioners, sub-solvers, etc.). You must specify the operator before building the solver.Solve
to solve the sparse linear system. You need to pass a right-hand side and a solution / initial guess vector.Print
to show solver information.ReBuildNumeric
to only rebuild the solver numerically (if possible).MoveToHost
andMoveToAccelerator
to offload the solver (including preconditioners and sub-solvers) to the host / accelerator.
src/utils/
directory#
In the src/utils
directory, all commonly used host (de)allocation, timing, math, communication, and logging functionalities are gathered.
Furthermore, the rocALUTION GlobalType
, which is the indexing type for global and distributed structures, can be adjusted in src/utils/types.hpp
.
By default, rocALUTION uses 64-bit wide global indexing.
Note
It is not recommended to switch to 32-bit global indexing.
In src/utils/def.hpp
:
Verbosity level
VERBOSE_LEVEL
can be adjusted, see Verbose output,Debug mode
DEBUG_MODE
can be enabled, see Debug output,MPI logging
LOG_MPI_RANK
can be modified, see Verbose output and MPI,Object tracking
OBJ_TRACKING_OFF
can be enabled, see Automatic object tracking.