Using hipSOLVER#
hipSOLVER is an open-source marshalling library for LAPACK routines on the GPU. It sits between a backend library and the user application, marshalling inputs to and outputs from the backend library so that the user application doesn’t have to change when using different backends. hipSOLVER supports two backend libraries: The NVIDIA CUDA cuSOLVER library and the open-source AMD ROCm rocSOLVER library.
The regular hipSOLVER API is a thin wrapper layer around the different backends that does not typically introduce significant overhead. However, its main purpose is portability, so when performance is critical, it’s recommended to use the library backend directly.
After it is installed, hipSOLVER can be used like any other library with a C API. The header file has to be included in the user code, which means the shared library becomes a link-time and run-time dependency for the user application. The user application code can be ported with no changes to any system with hipSOLVER installed, regardless of the backend library.
For more details on how to use the API methods, see the client code samples on the hipSOLVER GitHub or the documentation for the corresponding backend libraries.
Porting cuSOLVER applications to hipSOLVER#
The hipSOLVER design facilitates the translation of cuSOLVER applications to the AMD open-source ROCm platform and ecosystem. This makes it easy for you to port your existing cuSOLVER applications to hipSOLVER. hipSOLVER provides two separate but interchangeable API patterns to facilitate a two-stage transition process. Users are encouraged to start with the hipSOLVER compatibility APIs, which use the hipsolverDn, hipsolverSp, and hipsolverRf prefixes and have method signatures that are fully consistent with cuSOLVER functions.
However, the compatibility APIs might introduce some performance drawbacks, especially when using the rocSOLVER backend. So, as a second
stage, it’s best to switch to using the hipSOLVER regular API when possible. The regular API uses the hipsolver
prefix.
It makes minor adjustments to the API to get the best performance out of the rocSOLVER backend. In most cases, switching to
the regular API is as simple as removing Dn
from the hipsolverDn
prefix.
Note
Methods with the hipsolverSp
and hipsolverRf
prefixes are not currently supported by the regular API.
No matter which API is used, a hipSOLVER application can be executed without modifications to the code on systems with cuSOLVER or rocSOLVER installed. However, using the regular API ensures the best performance out of both backends.
Using the hipsolverDn API#
The hipsolverDn API is intended as a 1:1 translation of the cusolverDn API, but not all functionality is equally supported in rocSOLVER. The following considerations apply when using this compatibility API.
Arguments not referenced by rocSOLVER#
Unlike cuSOLVER, rocSOLVER functions do not provide information on invalid arguments in the
info
parameter, although they do provide info on singularities and algorithm convergence. Therefore, when using the rocSOLVER backend,info
always returns a value >= 0. In cases where a rocSOLVER function does not acceptinfo
as an argument, hipSOLVER sets it to zero.The
niters
argument for hipsolverDnXXgels and hipsolverDnXXgesv is not referenced by the rocSOLVER backend. rocSOLVER does not implement an iterative refinement.The
hRnrmF
argument for hipsolverDnXgesvdaStridedBatched is not referenced by the rocSOLVER backend.
Performance implications of the hipsolverDn API#
To calculate the workspace required by the
gesvd
function in rocSOLVER, the values ofjobu
andjobv
are needed. However, the function hipsolverDnXgesvd_bufferSize does not accept these arguments. When using the rocSOLVER backend,hipsolverDnXgesvd_bufferSize
has to internally calculate the workspace for all possible values ofjobu
andjobv
and return the maximum.Note
hipsolverDnXgesvd_bufferSize
is slower thanhipsolverXgesvd_bufferSize
, and the workspace size it returns might be slightly larger than what is actually required.To properly use a user-provided workspace, rocSOLVER requires both the allocated pointer and its size. However, the function hipsolverDnXgetrf does not accept
lwork
as an argument. Consequently, when using the rocSOLVER backend,hipsolverDnXgetrf
has to internally callhipsolverDnXgetrf_bufferSize
to obtain the workspace size.Note
In practice,
hipsolverDnXgetrf_bufferSize
is called twice, once by the user before allocating the workspace and once by hipSOLVER internally when executing thehipsolverDnXgetrf
function.hipsolverDnXgetrf
can be slightly slower thanhipsolverXgetrf
because of the extra call to thebufferSize
helper.The functions hipsolverDnXgetrs, hipsolverDnXpotrs, hipsolverDnXpotrsBatched, and hipsolverDnXpotrfBatched do not accept
work
andlwork
as arguments. However, this functionality does require a non-zero workspace in rocSOLVER. As a result, these functions switch to the automatic workspace management model when using the rocSOLVER backend. For more information, see the memory model information.Note
Even though the compatibility API does not provide
bufferSize
helpers for these functions, the functions still require a workspace to use rocSOLVER. This workspace is automatically managed, but it might result in device memory reallocations with a corresponding overhead.
Using the hipsolverSp API#
The hipsolverSp API is intended as a 1:1 translation of the cusolverSp API, but not all functionality is equally supported in rocSOLVER. The following considerations apply when using this compatibility API.
Unsupported methods#
RCM reordering is not supported by rocSOLVER, rocSPARSE, and SuiteSparse. The following methods use AMD reordering instead when RCM is requested.
hipsolverSpXcsrlsvcholHost with
reorder = 1
hipsolverSpXcsrlsvchol with
reorder = 1
The function hipsolverSpScsrlsvqr is implemented by converting the sparse input matrix to a dense matrix. It therefore does not support any reordering method. The host path is also unsupported.
Arguments not referenced by rocSOLVER#
The
reorder
andtolerance
arguments of hipsolverSpScsrlsvqr are not referenced by the rocSOLVER backend.
Performance implications of the hipsolverSp API#
The third-party SuiteSparse library is used to provide host-side functionality for hipsolverSpXcsrlsvchol when using the rocSOLVER backend. SuiteSparse does not support single-precision arrays, so hipSOLVER must allocate temporary double-precision arrays and copy the values one-by-one to and from the user-provided arguments.
Note
Single precision hipsolverSpScsrlsvchol is expected to have slower performance and use more memory than the double-precision version.
A fully-featured, GPU-accelerated Cholesky factorization for sparse matrices is not implemented in either rocSOLVER or rocSPARSE. These components rely on SuiteSparse to provide this functionality. The hipsolverSpXcsrlsvchol functions allocate space for sparse matrices on the host, copy the data to the host, use SuiteSparse to perform the symbolic factorization, and then copy the resulting data back to the device.
Note
hipsolverSpXcsrlsvchol might show slower performance and use more memory than hipsolverSpXcsrlsvcholHost.
The function hipsolverSpScsrlsvqr converts the sparse input matrix to a dense matrix, then runs the dense factorization and linear solver on the result. This might result in slower-than-expected performance and significant memory usage for large matrices.
Note
hipsolverSpXcsrlsvqr must allocate enough memory to hold a dense matrix. It performs similarly to hipsolverXXgels.
Using the hipsolverRf API#
The hipsolverRf API is intended as a 1:1 translation of the cusolverRf API, but not all functionality is equally supported in rocSOLVER. The following considerations apply when using this compatibility API.
Unsupported methods#
Batched refactorization methods are currently unsupported with the rocSOLVER backend. They return a
HIPSOLVER_STATUS_NOT_SUPPORTED
status code.Parameter-setting methods are currently unsupported with the rocSOLVER backend. They return a
HIPSOLVER_STATUS_NOT_SUPPORTED
status code.
Using the regular hipSOLVER API#
hipSOLVER’s regular API is similar to cuSOLVER. However, due to differences in the implementation and design between cuSOLVER and rocSOLVER, some minor adjustments were introduced to ensure the best performance out of both backends.
Different signatures and additional API methods#
The methods to obtain the size of the workspace needed by the
gels
andgesv
functions in cuSOLVER requiredwork
as an argument. However, this argument is never used and can be null. On the rocSOLVER side,dwork
is not needed to calculate the workspace size. As a consequence, hipsolverXXgels_bufferSize and hipsolverXXgesv_bufferSize do not requiredwork
as an argument.Note
These wrappers pass
dwork = nullptr
when calling cuSOLVER.To calculate the workspace required by the function
gesvd
in rocSOLVER, the values ofjobu
andjobv
are needed. As a result, hipsolverXgesvd_bufferSize requiresjobu
andjobv
as arguments.Note
These arguments are ignored when the wrapper calls cuSOLVER because they are not needed.
To properly use a user-provided workspace, rocSOLVER requires both the allocated pointer and its size. Consequently, hipsolverXgetrf requires
lwork
as an argument.Note
lwork
is ignored when the wrapper calls cuSOLVER because it is not needed.All rocSOLVER functions called by hipSOLVER require a workspace. To allow the user to specify one, hipsolverXgetrs, hipsolverXpotrfBatched, hipsolverXpotrs, and hipsolverXpotrsBatched require
work
andlwork
as arguments.Note
These arguments are ignored when these wrappers call cuSOLVER because they are not needed.
To support these changes, the regular API adds the following functions:
Note
These methods return
lwork = 0
when using the cuSOLVER backend, because the corresponding functions in cuSOLVER do not need a workspace.
Arguments not referenced by rocSOLVER#
Unlike cuSOLVER, rocSOLVER functions do not provide information on invalid arguments in the
info
parameter, although they provide info on singularities and algorithm convergence. Therefore, when using the rocSOLVER backend,info
always returns a value >= 0. In cases where a rocSOLVER function does not acceptinfo
as an argument, hipSOLVER sets it to zero.The
niters
argument for hipsolverXXgels and hipsolverXXgesv is not referenced by the rocSOLVER backend. rocSOLVER does not implement any type of iterative refinement.
Using the rocSOLVER memory model#
Most hipSOLVER functions take a workspace pointer and size as arguments, allowing the user to manage the device memory used
internally by the backends. rocSOLVER, however, can maintain the device workspace automatically by default
(see the rocSOLVER memory model for more details). To take
advantage of this feature, users can pass a null pointer for the work
argument or a zero size for the lwork
argument of any function
when using the rocSOLVER backend. The workspace will then be automatically managed behind-the-scenes. However, it’s best to use
a consistent workspace management strategy because performance issues might arise if the internal workspace is forced to frequently switch between
user-provided and automatically allocated workspaces.
Warning
This feature should not be used with the cuSOLVER backend. hipSOLVER does not guarantee a defined behavior when passing a null workspace to cuSOLVER functions that require one.
Using the rocSOLVER in-place functions#
In cuSOLVER, the solvers gesv
and gels
are out-of-place in the sense that the solution vectors X
do not overwrite the input matrix B
.
In rocSOLVER, this is not the case. When hipsolverXXgels
or hipsolverXXgesv
call rocSOLVER, some data
movements must be done internally to restore B
and copy the results back to X
. These copies might introduce noticeable
overhead, depending on the size of the matrices. To avoid this potential problem, pass X = B
to hipsolverXXgels
or hipsolverXXgesv
when using the rocSOLVER backend. In this case, no data movements are required, and the solution
vectors can be retrieved using either B
or X
.
Warning
This feature should not be used with the cuSOLVER backend. hipSOLVER does not guarantee a defined behavior when passing
X = B
to these functions in cuSOLVER.