PyTorch on ROCm#
2024-12-17
17 min read time
PyTorch is an open-source tensor library designed for deep learning. PyTorch on ROCm provides mixed-precision and large-scale training using our MIOpen and RCCL libraries.
To install PyTorch for ROCm, you have the following options:
Using a Docker image with PyTorch pre-installed (recommended)
For hardware, software, and third-party framework compatibility between ROCm and PyTorch, refer to:
Using a Docker image with PyTorch pre-installed#
To install ROCm on bare metal, follow ROCm installation overview. The recommended option to get a PyTorch environment is through Docker.
Using Docker provides portability and access to a prebuilt Docker image that has been rigorously tested within AMD. This can also save compilation time and should perform as tested and mitigate potential installation issues. See Docker image support
Download the latest public PyTorch Docker image.
docker pull rocm/pytorch:latest
You can also download a specific and supported configuration with different user-space ROCm versions, PyTorch versions, and operating systems by filtering through the tags.
Important
As of ROCm 6.2.1,
rocm/pytorch:latest
points to a docker image with the latest ROCm tested release version of PyTorch (for example, version 2.3), similar torocm/pytorch:latest-release
tag. Before ROCm 6.2.1,rocm/pytorch:latest
pointed to a development version of PyTorch, which didn’t correspond to a specific PyTorch release.Description
6.3.0 and later
6.2.1 and later
6.2.0 and earlier
Latest PyTorch tested release
rocm/pytorch:latest
rocm/pytorch:latest-releaserocm/pytorch:latest
rocm/pytorch:latest-releaserocm/pytorch:latest-release
Latest PyTorch preview release [Limited testing]
rocm/pytorch:latest-release-preview
Latest PyTorch dev version
rocm/pytorch:latest-internal
rocm/pytorch:latest-internal
rocm/pytorch:latest
Start a Docker container using the image.
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \ --device=/dev/kfd --device=/dev/dri --group-add video \ --ipc=host --shm-size 8G rocm/pytorch:latest
Note
This will automatically download the image if it does not exist on the host. You can also pass the
-v
argument to mount any data directories from the host onto the container.
Docker image support#
AMD validates and publishes ready-made PyTorch images with ROCm backends on Docker Hub. The following Docker image tags and associated inventories are validated for ROCm 6.3.
Note
As of ROCm 6.2.1, rocm/pytorch:latest
points to a Docker image with the latest ROCm tested
release version of PyTorch (for example, version 2.4), similar to rocm/pytorch:latest-release
tag. See
Using a Docker image with PyTorch pre-installed for more information.
Using a wheels package#
PyTorch supports the ROCm platform by providing tested wheels packages. To access this feature, go to pytorch.org/get-started/locally/. For the correct wheels command, you must select Linux, Python, pip, and ROCm in the matrix.
Note
The available ROCm release varies between the PyTorch Build of Stable
or Nightly
.
More recent releases are generally available through the Nightly builds.
Choose one of the following three options:
Option 1:
Download a base Docker image with the correct user-space ROCm version.
Base OS
Docker Image
Ubuntu 22.04
Ubuntu 24.04
Pull the selected image.
docker pull rocm/dev-ubuntu-22.04:latest
Start a Docker container using the downloaded image.
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/dev-ubuntu-22.04:latest
Option 2:
Select a base OS Docker image. Check System requirements (Linux).
Pull selected base OS image (Ubuntu 22.04, for example).
docker pull ubuntu:22.04
Start a Docker container using the downloaded image.
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video ubuntu:22.04
Install ROCm using the directions in the ROCm installation overview section.
Option 3:
Install on bare metal. Check System requirements (Linux) and install ROCm using the directions in the ROCm installation overview section.
Install the required dependencies for the wheels package.
sudo apt update sudo apt install libjpeg-dev python3-dev python3-pip pip3 install wheel setuptools
Install
torch
,torchvision
, andtorchaudio
, as specified in the installation matrix.Note
The following command uses the ROCm 6.2 PyTorch wheel. If you want a different version of ROCm, modify the command accordingly.
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.2/
(Optional) Use MIOpen kdb files with ROCm PyTorch wheels.
PyTorch uses MIOpen for machine learning primitives, which are compiled into kernels at runtime. Runtime compilation causes a small warm-up phase when starting PyTorch, and MIOpen kdb files contain precompiled kernels that can speed up application warm-up phases.
MIOpen kdb files can be used with ROCm PyTorch wheels. However, the kdb files need to be placed in a specific location with respect to the PyTorch installation path. A helper script simplifies this task by taking the ROCm version and GPU architecture as inputs. This works for Ubuntu.
You can download the helper script here: install_kdb_files_for_pytorch_wheels.sh, or use:
wget https://raw.githubusercontent.com/wiki/ROCm/pytorch/files/install_kdb_files_for_pytorch_wheels.sh
After installing ROCm PyTorch wheels, run the following code:
#Optional: replace 'gfx90a' with your architecture and 6.2 with your preferred ROCm version export GFX_ARCH=gfx90a #Optional export ROCM_VERSION=6.2 ./install_kdb_files_for_pytorch_wheels.sh
Using the PyTorch ROCm base Docker image#
The pre-built base Docker image has all dependencies installed, including:
ROCm
torchvision
Conda packages
The compiler toolchain
Additionally, a particular environment flag (BUILD_ENVIRONMENT
) is set, which is used by the build
scripts to determine the configuration of the build environment.
Download the Docker image. This is the base image, which does not contain PyTorch.
docker pull rocm/pytorch:latest-base
Start a Docker container using the downloaded image.
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch:latest-base
You can also pass the
-v
argument to mount any data directories from the host onto the container.
Inside the docker container, run the following steps:
Clone the PyTorch repository.
cd ~ git clone https://github.com/pytorch/pytorch.git cd pytorch git submodule update --init --recursive
Set ROCm architecture (optional).
Note
By default in the
rocm/pytorch:latest-base
image, PyTorch builds simultaneously for the following architectures:gfx900
gfx906
gfx908
gfx90a
gfx1030
gfx1100
gfx1101
gfx940
gfx941
gfx942
If you want to compile only for your microarchitecture (uarch), run:
export PYTORCH_ROCM_ARCH=<uarch>
Where
<uarch>
is the architecture reported by therocminfo
command.To find your uarch, run:
rocminfo | grep gfx
Build PyTorch.
.ci/pytorch/build.sh
This converts PyTorch sources for HIP compatibility and builds the PyTorch framework.
To check if your build is successful, run:
echo $? # should return 0 if success
Using the PyTorch upstream Dockerfile#
If you don’t want to use a prebuilt base Docker image, you can build a custom base Docker image using scripts from the PyTorch repository. This uses a standard Docker image from operating system maintainers and installs all the required dependencies, including:
ROCm
torchvision
Conda packages
The compiler toolchain
Clone the PyTorch repository.
cd ~ git clone https://github.com/pytorch/pytorch.git cd pytorch git submodule update --init --recursive
Build the PyTorch Docker image.
cd .ci/docker ./build.sh pytorch-linux-<os-version>-rocm<rocm-version>-py<python-version> -t rocm/pytorch:build_from_dockerfile
Where:
<os-version>
=ubuntu20.04
(orfocal
),ubuntu22.04
(orjammy
)<rocm-version>
=6.0
,6.1
,6.2
<python-version>
=3.8
-3.11
To verify that your image was successfully created, run:
docker image ls rocm/pytorch:build_from_dockerfile
If successful, the output looks like this:
REPOSITORY TAG IMAGE ID CREATED SIZE rocm/pytorch build_from_dockerfile 17071499be47 2 minutes ago 32.8GB
Start a Docker container using the image with the mounted PyTorch folder.
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \ --user root --device=/dev/kfd --device=/dev/dri \ --group-add video --ipc=host --shm-size 8G \ -v ~/pytorch:/pytorch rocm/pytorch:build_from_dockerfile
You can also pass the
-v
argument to mount any data directories from the host onto the container.Go to the PyTorch directory.
cd /pytorch
Set ROCm architecture.
To determine your AMD architecture, run:
rocminfo | grep gfx
The result looks like this (for
gfx1030
architecture):Name: gfx1030 Name: amdgcn-amd-amdhsa--gfx1030
Set the
PYTORCH_ROCM_ARCH
environment variable to specify the architectures you want to build PyTorch for.export PYTORCH_ROCM_ARCH=<uarch>
where
<uarch>
is the architecture reported by therocminfo
command.Build PyTorch.
.ci/pytorch/build.sh
This converts PyTorch sources for HIP compatibility and builds the PyTorch framework.
To check if your build is successful, run:
echo $? # should return 0 if success
Testing the PyTorch installation#
You can use PyTorch unit tests to validate your PyTorch installation. If you used a prebuilt PyTorch Docker image from AMD ROCm Docker Hub or installed an official wheels package, validation tests are not necessary.
If you want to manually run unit tests to validate your PyTorch installation fully, follow these steps:
Import the torch package in Python to test if PyTorch is installed and accessible.
Note
Do not run the following command from the PyTorch home directory.
python3 -c 'import torch' 2> /dev/null && echo 'Success' || echo 'Failure'
Check if the GPU is accessible from PyTorch. In the PyTorch framework,
torch.cuda
is a generic way to access the GPU. This can only access an AMD GPU if one is available.python3 -c 'import torch; print(torch.cuda.is_available())'
Run unit tests to validate the PyTorch installation fully.
Note
You must run the following command from the PyTorch home directory.
PYTORCH_TEST_WITH_ROCM=1 python3 test/run_test.py --verbose \ --include test_nn test_torch test_cuda test_ops \ test_unary_ufuncs test_binary_ufuncs test_autograd
This command ensures that the required environment variable is set to skip certain unit tests for ROCm. This also applies to wheel installs in a non-controlled environment.
Note
Make sure your PyTorch source code corresponds to the PyTorch wheel or the installation in the Docker image. Incompatible PyTorch source code can give errors when running unit tests.
Some tests may be skipped, as appropriate, based on your system configuration. ROCm doesn’t support all PyTorch features; tests that evaluate unsupported features are skipped. Other tests might be skipped, depending on the host or GPU memory and the number of available GPUs.
If the compilation and installation are correct, all tests will pass.
(Optional) Run individual unit tests.
PYTORCH_TEST_WITH_ROCM=1 python3 test/test_nn.py --verbose
You can replace
test_nn.py
with any other test set.
Running a basic PyTorch example#
The PyTorch examples repository provides basic examples that exercise the functionality of your framework.
Two of our favorite testing databases are:
MNIST (Modified National Institute of Standards and Technology): A database of handwritten digits that can be used to train a Convolutional Neural Network for handwriting recognition.
ImageNet: A database of images that can be used to train a network for visual object recognition.
MNIST PyTorch example#
Clone the PyTorch examples repository.
git clone https://github.com/pytorch/examples.git
Go to the MNIST example folder.
cd examples/mnist
Follow the instructions in the
README.md
file in this folder to install the requirements. Then run:python3 main.py
This generates the following output:
... Train Epoch: 14 [58240/60000 (97%)] Loss: 0.010128 Train Epoch: 14 [58880/60000 (98%)] Loss: 0.001348 Train Epoch: 14 [59520/60000 (99%)] Loss: 0.005261 Test set: Average loss: 0.0252, Accuracy: 9921/10000 (99%)
ImageNet PyTorch example#
Clone the PyTorch examples repository (if you didn’t already do this in the preceding MNIST example).
git clone https://github.com/pytorch/examples.git
Go to the ImageNet example folder.
cd examples/imagenet
Follow the instructions in the
README.md
file in this folder to install the Requirements. Then run:python3 main.py
Troubleshooting#
What to do if you get the following error when trying to run PyTorch:
hipErrorNoBinaryForGPU: Unable to find code object for all current devices!
The error denotes that the installation of PyTorch and/or other dependencies or libraries do not support the current GPU. To workaround this issue, use the following steps:
Confirm that the hardware supports the ROCm stack. Refer to System requirements (Linux) and System requirements (Windows).
Determine the gfx target.
rocminfo | grep gfx
Check if PyTorch is compiled with the correct gfx target.
TORCHDIR=$( dirname $( python3 -c 'import torch; print(torch.__file__)' ) ) roc-obj-ls -v $TORCHDIR/lib/libtorch_hip.so # check for gfx target
Note
Recompile PyTorch with the right gfx target if compiling from the source if the hardware is not supported.
What if you are unable to access Docker or GPU in user accounts?
Ensure that the user is added to docker, video, and render Linux groups as described in Configuring permissions for GPU access.
Can you install PyTorch directly on bare metal?
Bare-metal installation of PyTorch is supported through wheels. For more information, see Using a wheels package.
How do you profile PyTorch workloads?
Use the PyTorch Profiler as described in PyTorch Profiler to profile GPU kernels on ROCm.