Running ROCm Docker containers#

2024-10-24

5 min read time

Applies to Linux

Using Docker to run your ROCm applications is one of the best ways to get consistent and reproducible environments.

Prerequisites#

  • amdgpu-dkms: Docker containers share the kernel with the host OS. Therefore, the ROCm kernel-mode driver (amdgpu-dkms) must be installed on the host. If you’ve already installed ROCm, you probably already have amdgpu-dkms.

Accessing GPUs in containers#

In order to grant access to GPUs from within a container, run your container with the following options:

docker run --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined <image>

The purpose of each option is as follows:

  • --device /dev/kfd

    This is the main compute interface, shared by all GPUs.

  • --device /dev/dri

    This directory contains the Direct Rendering Interface (DRI) for each GPU. To restrict access to specific GPUs, see Restricting GPU access.

  • --security-opt seccomp=unconfined (optional)

    This option enables memory mapping, and is recommended for containers running in HPC environments.

    The performance of an application can vary depending on the assignment of GPUs and CPUs to the task. Typically, numactl is installed as part of many HPC applications to provide GPU/CPU mappings. This Docker runtime option supports memory mapping and can improve performance.

Docker compose#

You can also use docker compose to launch your containers, even when launching a single container. This can be a convenient way to run complex Docker commands without having to remember all the CLI arguments. Here is a docker-compose file, which is equivalent to the preceding docker run command:

version: "3.7"
services:
  my-service:
    image: <image>
    devices:
      - /dev/kfd
      - /dev/dri
    security_opt:
      - seccomp:unconfined

You can then run this using docker compose run my-service.

Restricting GPU access#

By passing --device /dev/dri, you are granting access to all GPUs on the system. In order to limit access to a subset of GPUs, you can pass each device individually using one or more -device /dev/dri/renderD<node>, where <node> is the card index, starting from 128.

For example, to expose the first and second GPU:

docker run --device /dev/kfd --device /dev/dri/renderD128 --device /dev/dri/renderD129 ..

Verifying the amdgpu driver has been loaded on GPUs#

rocminfo is an application for reporting information about the HSA system attributes and agents. rocm-smi is a tool that acts as a command line interface for manipulating and monitoring the amdgpu kernel.

Running rocminfo and rocm-smi inside the container will only enumerate the GPUs passed into the docker container. Running rocminfo and rocm-smi on bare metal will enumerate all ROCm-capable GPUs on the machine.

Docker images in the ROCm ecosystem#

The ROCm Docker repository hosts images useful for building your own containers, leveraging ROCm. The built images are available on Docker Hub. In particular:

  • rocm/rocm-terminal is a small image with the prerequisites to build HIP applications, but does not include any libraries.

  • ROCm dev images provide a variety of OS + ROCm versions, and are a great starting place for building applications

Applications#

AMD provides pre-built images for various GPU-ready applications through Infinity Hub. There, you’ll also find examples for invoking each application and suggested parameters used for benchmarking.