rocSOLVER client applications#
rocSOLVER provides infrastructure for testing and benchmarking similar to the rocBLAS utilities, as well as sample code illustrating basic use of the library.
Client binaries are not built by default. They require specific flags to be passed to the install script
or CMake system. If the -c
flag is passed to install.sh
, the client binaries can be found in the
<rocsolverDIR>/build/release/clients/staging
directory. If both the -c
and -g
flags are passed to
install.sh
, the client binaries can be found in <rocsolverDIR>/build/debug/clients/staging
.
If you pass any combination of the -DBUILD_CLIENTS_TESTS=ON
, -DBUILD_CLIENTS_BENCHMARKS=ON
, or
-DBUILD_CLIENTS_SAMPLES=ON
flags to the CMake system, the relevant client binaries can normally
be found in the <rocsolverDIR>/build/clients/staging
directory. See the Installation guide
for more information on building the library and its clients.
Testing rocSOLVER#
The rocsolver-test
client executes a GoogleTest (gtest) suite to
verify the correct functioning of the library. The results computed by rocSOLVER on random input data
are normally compared with the results computed by NETLib LAPACK on the CPU or tested implicitly
in the context of the solved problem. This client is built if the -c
flag is passed to install.sh
or if the -DBUILD_CLIENTS_TESTS=ON
flag is
passed to the CMake system.
Call the rocSOLVER test client with the --help
flag to get information on the different flags that control the test behavior.
./rocsolver-test --help
One of the most useful flags is the --gtest_filter
flag, which lets you choose which tests to run
from the suite. For example, the following command only runs the tests for geqrf
:
./rocsolver-test --gtest_filter=*GEQRF*
The rocSOLVER tests are divided into two separate groups: checkin_lapack
and daily_lapack
.
Tests in the checkin_lapack
group are small and quick to execute. They verify basic correctness and error
handling. Tests in the daily_lapack
group are large and slow to execute. They verify the correctness of
large problem sizes. You can run one test group or the other using --gtest_filter
, for example:
./rocsolver-test --gtest_filter=*checkin_lapack*
./rocsolver-test --gtest_filter=*daily_lapack*
Benchmarking rocSOLVER#
The rocsolver-bench
client runs any rocSOLVER function with random data of the specified dimensions. It compares basic
performance information, such as execution times, between NETLib LAPACK on the
CPU and rocSOLVER on the GPU. The client is built if the -c
flag is passed to install.sh
or if the
-DBUILD_CLIENTS_BENCHMARKS=ON
flag is passed to the CMake system.
Call the rocSOLVER bench client with the --help
flag to obtain information on the different parameters and flags that control the behavior of the benchmark client.
./rocsolver-bench --help
Two of the most important flags for rocsolver-bench
are the -f
and -r
flags. The -f
(or
--function
) flag lets you select which function to benchmark. The -r
(or --precision
)
flag lets you select the data precision for the function. It can be one of s
(single precision),
d
(double precision), c
(single precision complex), or z
(double precision complex).
The non-pointer arguments for a function can be passed to rocsolver-bench
by using the argument name as
a flag. See the Reference sections for more information on the function arguments and
names. For example, the function rocsolver_dgeqrf_strided_batched
has the following method signature:
rocblas_status
rocsolver_dgeqrf_strided_batched(rocblas_handle handle,
const rocblas_int m,
const rocblas_int n,
double* A,
const rocblas_int lda,
const rocblas_stride strideA,
double* ipiv,
const rocblas_stride strideP,
const rocblas_int batch_count);
A call to rocsolver-bench
to run this function on a batch of one hundred 30x30 matrices might look like this:
./rocsolver-bench -f geqrf_strided_batched -r d -m 30 -n 30 --lda 30 --strideA 900 --strideP 30 --batch_count 100
rocsolver-bench
generally attempts to provide or calculate a suitable default value for these arguments,
although you must always specify at least one size argument. Functions that take m
and n
as arguments
typically require that m
be provided and assume a square matrix. For example, the previous command is
equivalent to:
./rocsolver-bench -f geqrf_strided_batched -r d -m 30 --batch_count 100
Other useful benchmarking options include:
--perf
: Disables the LAPACK computation and only times and prints the rocSOLVER performance result.-i
: (or--iters
) Specifies the number of times to run the GPU timing loop. The performance result is the average of all runs.--profile
: Enables profile logging, indicating the maximum depth of the nested output.
./rocsolver-bench -f geqrf_strided_batched -r d -m 30 --batch_count 100 --perf 1
./rocsolver-bench -f geqrf_strided_batched -r d -m 30 --batch_count 100 --iters 20
./rocsolver-bench -f geqrf_strided_batched -r d -m 30 --batch_count 100 --profile 5
In addition to the benchmarking functionality, the rocSOLVER bench client can also provide the norm of the error in the
computations when the -v
(or --verify
) flag is used. If the
--mem_query
flag is passed, it returns the amount of device memory required for the workspace by the specified function.
./rocsolver-bench -f geqrf_strided_batched -r d -m 30 --batch_count 100 --verify 1
./rocsolver-bench -f geqrf_strided_batched -r d -m 30 --batch_count 100 --mem_query 1
rocSOLVER sample code#
The rocSOLVER sample programs provide examples of how to work with the rocSOLVER library. They are
built if the -c
flag is passed to install.sh
or if the -DBUILD_CLIENTS_SAMPLES=ON
flag is passed to the
CMake system.
Currently, sample code is available to demonstrate the following:
Basic use of rocSOLVER in C and C++ using rocsolver_geqrf as an example
Use of batched and strided_batched functions using rocsolver_geqrf_batched and rocsolver_geqrf_strided_batched as examples
Use of rocSOLVER with the Heterogeneous Memory Management (HMM) model
Use of the rocSOLVER multi-level logging functionality