Building and installing RCCL from source code#

To build RCCL directly from the source code, follow these steps. This guide also includes instructions explaining how to test the build. For information on using the quick start install script to build RCCL, see Installing RCCL using the install script.

Requirements#

The following prerequisites are required to build RCCL:

  1. ROCm-supported GPUs

  2. Having the ROCm stack installed on the system, including the HIP runtime and the HIP-Clang compiler.

Building the library using CMake:#

To build the library from source, follow these steps:

git clone --recursive https://github.com/ROCm/rccl.git
cd rccl
mkdir build
cd build
cmake ..
make -j 16      # Or some other suitable number of parallel jobs

If you have already cloned the repository, you can checkout the external submodules manually.

git submodule update --init --recursive --depth=1

You can substitute a different installation path by providing the path as a parameter to CMAKE_INSTALL_PREFIX, for example:

cmake -DCMAKE_INSTALL_PREFIX=$PWD/rccl-install -DCMAKE_BUILD_TYPE=Release ..

Note

Ensure ROCm CMake is installed using the command apt install rocm-cmake. By default, CMake builds the component in debug mode unless DCMAKE_BUILD_TYPE is specified.

Building the RCCL package and install package:#

After you have cloned the repository and built the library as described in the previous section, use this command to build the package:

cd rccl/build
make package
sudo dpkg -i *.deb

Note

The RCCL package install process requires sudo or root access because it creates a directory named rccl in /opt/rocm/. This is an optional step. RCCL can be used directly by including the path containing librccl.so.

Testing RCCL#

The RCCL unit tests are implemented using the Googletest framework in RCCL. These unit tests require Googletest 1.10 or higher to build and run (this dependency can be installed using the -d option for install.sh). To run the RCCL unit tests, go to the build folder and the test subfolder, then run the appropriate RCCL unit test executables.

The RCCL unit test names follow this format:

CollectiveCall.[Type of test]

Filtering of the RCCL unit tests can be done using environment variables and by passing the --gtest_filter command line flag:

UT_DATATYPES=ncclBfloat16 UT_REDOPS=prod ./rccl-UnitTests --gtest_filter="AllReduce.C*"

This command runs only the AllReduce correctness tests with the float16 datatype. A list of the available environment variables for filtering appears at the top of every run. See the Googletest documentation for more information on how to form advanced filters.

There are also other performance and error-checking tests for RCCL. They are maintained separately at ROCm/rccl-tests.

Note

For more information on how to build and run rccl-tests, see the rccl-tests README file .