Running ROCm Docker containers#
2024-10-24
5 min read time
Using Docker to run your ROCm applications is one of the best ways to get consistent and reproducible environments.
Prerequisites#
amdgpu-dkms
: Docker containers share the kernel with the host OS. Therefore, the ROCm kernel-mode driver (amdgpu-dkms
) must be installed on the host. If you’ve already installed ROCm, you probably already haveamdgpu-dkms
.If you don’t have
amdgpu-dkms
, follow the standard install instructions (which comes withamdgpu-dkms
) or install amdgpu-dkms only.
Accessing GPUs in containers#
In order to grant access to GPUs from within a container, run your container with the following options:
docker run --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined <image>
The purpose of each option is as follows:
--device /dev/kfd
This is the main compute interface, shared by all GPUs.
--device /dev/dri
This directory contains the Direct Rendering Interface (DRI) for each GPU. To restrict access to specific GPUs, see Restricting GPU access.
--security-opt seccomp=unconfined
(optional)This option enables memory mapping, and is recommended for containers running in HPC environments.
The performance of an application can vary depending on the assignment of GPUs and CPUs to the task. Typically,
numactl
is installed as part of many HPC applications to provide GPU/CPU mappings. This Docker runtime option supports memory mapping and can improve performance.
Docker compose#
You can also use docker compose
to launch your containers, even when launching a single
container. This can be a convenient way to run complex Docker commands without having to
remember all the CLI arguments. Here is a docker-compose file, which is equivalent to the preceding
docker run
command:
version: "3.7"
services:
my-service:
image: <image>
devices:
- /dev/kfd
- /dev/dri
security_opt:
- seccomp:unconfined
You can then run this using docker compose run my-service
.
Restricting GPU access#
By passing --device /dev/dri
, you are granting access to all GPUs on the system. In order to limit
access to a subset of GPUs, you can pass each device individually using one or more
-device /dev/dri/renderD<node>
, where <node>
is the card index, starting from 128.
For example, to expose the first and second GPU:
docker run --device /dev/kfd --device /dev/dri/renderD128 --device /dev/dri/renderD129 ..
Verifying the amdgpu driver has been loaded on GPUs#
rocminfo
is an application for reporting information about the HSA system attributes and agents.
rocm-smi
is a tool that acts as a command line interface for manipulating and monitoring the amdgpu kernel.
Running rocminfo
and rocm-smi
inside the container will only enumerate the GPUs passed into the docker container.
Running rocminfo
and rocm-smi
on bare metal will enumerate all ROCm-capable GPUs on the machine.
Docker images in the ROCm ecosystem#
The ROCm Docker repository hosts images useful for building your own containers, leveraging ROCm. The built images are available on Docker Hub. In particular:
rocm/rocm-terminal
is a small image with the prerequisites to build HIP applications, but does not include any libraries.ROCm dev images provide a variety of OS + ROCm versions, and are a great starting place for building applications
Applications#
AMD provides pre-built images for various GPU-ready applications through Infinity Hub. There, you’ll also find examples for invoking each application and suggested parameters used for benchmarking.