Running RCCL using Docker

Running RCCL using Docker#

To use Docker to run RCCL, Docker must already be installed on the system. To build the Docker image and run the container, follow these steps.

  1. Build the Docker image

    By default, the Dockerfile uses docker.io/rocm/dev-ubuntu-22.04:latest as the base Docker image. It then installs RCCL and rccl-tests (in both cases, it uses the version from the RCCL develop branch).

    Use this command to build the Docker image:

    docker build -t rccl-tests -f Dockerfile.ubuntu --pull .
    

    The base Docker image, rccl repository, and rccl-tests repository can be modified by using --build-args in the docker build command above. For example, to use a different base Docker image, use this command:

    docker build -t rccl-tests -f Dockerfile.ubuntu --build-arg="ROCM_IMAGE_NAME=rocm/dev-ubuntu-20.04" --build-arg="ROCM_IMAGE_TAG=6.2" --pull .
    
  2. Launch an interactive Docker container on a system with AMD GPUs:

    docker run -it --rm --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --network=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined rccl-tests /bin/bash
    

To run, for example, the all_reduce_perf test from rccl-tests on 8 AMD GPUs from inside the Docker container, use this command:

mpirun --allow-run-as-root -np 8 --mca pml ucx --mca btl ^openib -x NCCL_DEBUG=VERSION /workspace/rccl-tests/build/all_reduce_perf -b 1 -e 16G -f 2 -g 1

For more information on the rccl-tests options, see the Usage guidelines in the GitHub repository.