Ray on ROCm installation#

2025-09-09

6 min read time

Applies to Linux

Ray is a unified framework for scaling AI and Python applications from your laptop to a full cluster, without changing your code. Ray consists of a core distributed runtime and a set of AI libraries for simplifying machine learning computations.

Ray is a general-purpose framework that runs many types of workloads efficiently. Any Python application can be scaled with Ray, without extra infrastructure.

For hardware, software, and third-party framework compatibility between ROCm and Ray, see the following resources:

Note

Ray is supported on ROCm 6.4.1.

Install Ray#

To install Ray on ROCm, you have the following options:

Using a prebuilt Docker image with Ray pre-installed#

Docker is the recommended method to set up a Ray environment, and it avoids potential installation issues. The tested, prebuilt image includes Ray, ROCm, and other dependencies.

  1. Pull the Docker image

    docker pull rocm/ray:ray-2.48.0.post0_rocm6.4.1_ubuntu24.04_py3.12_pytorch2.6.0
    

    Note

    For specific versions of Ray, review the periodically pushed Docker images at ROCm Ray on Docker Hub.

    Additional Docker images are available at ROCm Ray on Docker Hub. These contain the latest ROCm version but might use an older version of Ray.

  2. Launch and connect to the container

    docker run -it -d --network=host --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size 64G \
    --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/host_dir \
    -w /app --name rocm_ray rocm/ray:ray-2.48.0.post0_rocm6.4.1_ubuntu24.04_py3.12_pytorch2.6.0 /bin/bash
    
    docker attach rocm_ray
    

    Tip

    • The --shm-size parameter allocates shared memory for the container. Adjust it based on your system’s resources if needed.

    • Replace $(pwd) with the absolute path to the directory you want to mount inside the container.

Build your own Docker image#

If you prefer to use the ROCm Ubuntu image or already have a ROCm Ubuntu container, follow these steps to install Ray in the container.

  1. Pull the ROCm Ubuntu Docker image. For example, use the following command to pull the ROCm Ubuntu image:

    docker pull rocm/pytorch:rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.6.0
    
  2. Launch the Docker container. After pulling the image, launch a container using this command:

    docker run -it -d --network=host --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size 64G \
    --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/host_dir \
    --name rocm_ray rocm/pytorch:rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.6.0 /bin/bash
    docker attach rocm_ray
    
  3. Activate the conda environment

  4. Install from Ray nightly wheels. Inside the running container, install the required version of Ray with ROCm support using pip:

    pip install -U "ray[all] @ https://s3-us-west-2.amazonaws.com/ray-wheels/master/005c372262e050d5745f475e22e64305fa07f8b8/ray-3.0.0.dev0-cp312-cp312-manylinux2014_x86_64.whl"
    
  1. Verify the installed Ray version. Check whether the correct version of Ray is installed.

    pip3 freeze | grep ray
    

    Expected output:

    memray==1.17.2
    ray @ https://s3-us-west-2.amazonaws.com/ray-wheels/master/005c372262e050d5745f475e22e64305fa07f8b8/ray-3.0.0.dev0-cp312-cp312-manylinux2014_x86_64.whl#sha256=e8f457f1bb8009b1e2744733c269fc54f3ec78e3705e16a2f88a8305720efe1b
    
  1. Verify the installation of ROCm Ray. See Test the Ray installation.

Install Ray on bare metal or a custom container#

Follow these steps if you prefer to install ROCm manually on your host system or in a custom container.

  1. Install ROCm. Follow the ROCm installation guide to install ROCm on your system.

    Once installed, verify your ROCm installation using:

    rocm-smi
    
     ========================================== ROCm System Management Interface ==========================================
     ==================================================== Concise Info ====================================================
    Device  [Model : Revision]    Temp        Power     Partitions      SCLK     MCLK     Fan  Perf  PwrCap  VRAM%  GPU%
              Name (20 chars)       (Junction)  (Socket)  (Mem, Compute)
      ======================================================================================================================
      0       [0x74a1 : 0x00]       50.0°C      170.0W    NPS1, SPX       131Mhz   900Mhz   0%   auto  750.0W    0%   0%
              AMD Instinct MI300X
      1       [0x74a1 : 0x00]       51.0°C      176.0W    NPS1, SPX       132Mhz   900Mhz   0%   auto  750.0W    0%   0%
              AMD Instinct MI300X
      2       [0x74a1 : 0x00]       50.0°C      177.0W    NPS1, SPX       132Mhz   900Mhz   0%   auto  750.0W    0%   0%
              AMD Instinct MI300X
      3       [0x74a1 : 0x00]       53.0°C      176.0W    NPS1, SPX       132Mhz   900Mhz   0%   auto  750.0W    0%   0%
              AMD Instinct MI300X
      ======================================================================================================================
      ================================================ End of ROCm SMI Log =================================================
    
  2. Install the required version of Ray with ROCm support using pip:

    pip install -U "ray[all] @ https://s3-us-west-2.amazonaws.com/ray-wheels/master/005c372262e050d5745f475e22e64305fa07f8b8/ray-3.0.0.dev0-cp312-cp312-manylinux2014_x86_64.whl"
    
  3. Verify the installed Ray version. Check whether the correct version of Ray and its ROCm plugins are installed.

    pip3 freeze | grep ray
    

Build Ray from source#

Follow the Building Ray from Source guide to build Ray with ROCm support from source.

Test the Ray installation#

Ray unit tests are optional for validating your installation if you used a prebuilt Docker image from AMD ROCm Docker Hub.

To run unit tests manually and validate your installation fully, follow these steps:

  1. After launching the container, test whether Ray detects ROCm devices as expected.

python3 -c "import ray; ray.init(); print(ray.cluster_resources())"
  1. If the setup is successful, the output should list all available ROCm devices.

Expected output (e.g. on MI300x node):

{'memory': 1420360912896.0, 'GPU': 8.0, 'accelerator_type:AMD-Instinct-MI300X-OAM': 1.0, 'node:10.7.39.110': 1.0, 'CPU': 384.0, 'node:__internal_head__': 1.0, 'object_store_memory': 200000000000.0}