rocSOLVER Client Applications#

rocSOLVER has an infrastructure for testing and benchmarking similar to that of rocBLAS’s testing and benchmarking, as well as sample code illustrating basic use of the library.

Client binaries are not built by default; they require specific flags to be passed to the install script or CMake system. If the -c flag is passed to install.sh, the client binaries will be located in the directory <rocsolverDIR>/build/release/clients/staging. If both the -c and -g flags are passed to install.sh, the client binaries will be located in <rocsolverDIR>/build/debug/clients/staging. If the -DBUILD_CLIENTS_TESTS=ON flag, the -DBUILD_CLIENTS_BENCHMARKS=ON flag, and/or the -DBUILD_CLIENTS_SAMPLES=ON flag are passed to the CMake system, the relevant client binaries will normally be located in the directory <rocsolverDIR>/build/clients/staging. See the Installation section for more information on building the library and its clients.

Testing rocSOLVER#

The rocsolver-test client executes a suite of Google tests (gtest) that verifies the correct functioning of the library. The results computed by rocSOLVER, given random input data, are normally compared with the results computed by NETLib LAPACK on the CPU, or tested implicitly in the context of the solved problem. It will be built if the -c flag is passed to install.sh or if the -DBUILD_CLIENTS_TESTS=ON flag is passed to the CMake system.

Calling the rocSOLVER gtest client with the --help flag

./rocsolver-test --help

returns information on different flags that control the behavior of the gtests.

One of the most useful flags is the --gtest_filter flag, which allows the user to choose which tests to run from the suite. For example, the following command will run the tests for only geqrf:

./rocsolver-test --gtest_filter=*GEQRF*

Note that rocSOLVER’s tests are divided into two separate groupings: checkin_lapack and daily_lapack. Tests in the checkin_lapack group are small and quick to execute, and verify basic correctness and error handling. Tests in the daily_lapack group are large and slower to execute, and verify correctness of large problem sizes. Users may run one test group or the other using --gtest_filter, e.g.

./rocsolver-test --gtest_filter=*checkin_lapack*
./rocsolver-test --gtest_filter=*daily_lapack*

Benchmarking rocSOLVER#

The rocsolver-bench client runs any rocSOLVER function with random data of the specified dimensions. It compares basic performance information (i.e. execution times) between NETLib LAPACK on the CPU and rocSOLVER on the GPU. It will be built if the -c flag is passed to install.sh or if the -DBUILD_CLIENTS_BENCHMARKS=ON flag is passed to the CMake system.

Calling the rocSOLVER bench client with the --help flag

./rocsolver-bench --help

returns information on the different parameters and flags that control the behavior of the benchmark client.

Two of the most important flags for rocsolver-bench are the -f and -r flags. The -f (or --function) flag allows the user to select which function to benchmark. The -r (or --precision) flag allows the user to select the data precision for the function, and can be one of s (single precision), d (double precision), c (single precision complex), or z (double precision complex).

The non-pointer arguments for a function can be passed to rocsolver-bench by using the argument name as a flag (see the Reference sections for more information on the function arguments and their names). For example, the function rocsolver_dgeqrf_strided_batched has the following method signature:

rocblas_status
rocsolver_dgeqrf_strided_batched(rocblas_handle handle,
                                 const rocblas_int m,
                                 const rocblas_int n,
                                 double* A,
                                 const rocblas_int lda,
                                 const rocblas_stride strideA,
                                 double* ipiv,
                                 const rocblas_stride strideP,
                                 const rocblas_int batch_count);

A call to rocsolver-bench that runs this function on a batch of one hundred 30x30 matrices could look like this:

./rocsolver-bench -f geqrf_strided_batched -r d -m 30 -n 30 --lda 30 --strideA 900 --strideP 30 --batch_count 100

Generally, rocsolver-bench will attempt to provide or calculate a suitable default value for these arguments, though at least one size argument must always be specified by the user. Functions that take m and n as arguments typically require m to be provided, and a square matrix will be assumed. For example, the previous command is equivalent to:

./rocsolver-bench -f geqrf_strided_batched -r d -m 30 --batch_count 100

Other useful benchmarking options include the --perf flag, which will disable the LAPACK computation and only time and print the rocSOLVER performance result; the -i (or --iters) flag, which indicates the number of times to run the GPU timing loop (the performance result would be the average of all the runs); and the --profile flag, which enables profile logging indicating the maximum depth of the nested output.

./rocsolver-bench -f geqrf_strided_batched -r d -m 30 --batch_count 100 --perf 1
./rocsolver-bench -f geqrf_strided_batched -r d -m 30 --batch_count 100 --iters 20
./rocsolver-bench -f geqrf_strided_batched -r d -m 30 --batch_count 100 --profile 5

In addition to the benchmarking functionality, the rocSOLVER bench client can also provide the norm of the error in the computations when the -v (or --verify) flag is used; and return the amount of device memory required as workspace for the given function, if the --mem_query flag is passed.

./rocsolver-bench -f geqrf_strided_batched -r d -m 30 --batch_count 100 --verify 1
./rocsolver-bench -f geqrf_strided_batched -r d -m 30 --batch_count 100 --mem_query 1

rocSOLVER sample code#

rocSOLVER’s sample programs provide illustrative examples of how to work with the rocSOLVER library. They will be built if the -c flag is passed to install.sh or if the -DBUILD_CLIENTS_SAMPLES=ON flag is passed to the CMake system.

Currently, sample code exists to demonstrate the following: