Running RCCL using Docker#
To use Docker to run RCCL, Docker must already be installed on the system. To build the Docker image and run the container, follow these steps.
Build the Docker image
By default, the Dockerfile uses
docker.io/rocm/dev-ubuntu-22.04:latest
as the base Docker image. It then installs RCCL and rccl-tests (in both cases, it uses the version from the RCCLdevelop
branch).Use this command to build the Docker image:
docker build -t rccl-tests -f Dockerfile.ubuntu --pull .
The base Docker image, rccl repository, and rccl-tests repository can be modified by using
--build-args
in thedocker build
command above. For example, to use a different base Docker image, use this command:docker build -t rccl-tests -f Dockerfile.ubuntu --build-arg="ROCM_IMAGE_NAME=rocm/dev-ubuntu-20.04" --build-arg="ROCM_IMAGE_TAG=6.2" --pull .
Launch an interactive Docker container on a system with AMD GPUs:
docker run -it --rm --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --network=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined rccl-tests /bin/bash
To run, for example, the all_reduce_perf
test from rccl-tests on 8 AMD GPUs from inside the Docker container, use this command:
mpirun --allow-run-as-root -np 8 --mca pml ucx --mca btl ^openib -x NCCL_DEBUG=VERSION /workspace/rccl-tests/build/all_reduce_perf -b 1 -e 16G -f 2 -g 1
For more information on the rccl-tests options, see the Usage guidelines in the GitHub repository.