vLLM inference and serving on ROCm#
vLLM is an open-source library for fast, memory-efficient LLM inference and serving. This page describes how to set up and run vLLM on AMD GPUs and APUs using either a prebuilt Docker image (recommended) or pip. It applies to supported AMD GPUs and platforms.
Prerequisites#
Ensure your system has the AMD GPU Driver (amdgpu) installed. See the ROCm 7.13.0 compatibility matrix for driver support information. For installation instructions, see the AMD GPU Driver documentation.
Ensure the host system has Docker Engine and the AMD GPU Driver (amdgpu) installed.
Ensure the host system has Docker Engine installed.
Ensure your system has the AMD GPU Driver (amdgpu) installed. See the ROCm 7.13.0 compatibility matrix for driver support information. For installation instructions, see the AMD GPU Driver documentation.
Ensure your system has Python 3.13 installed and accessible. Review the ROCm 7.13.0 compatibility matrix for more support details.
Install uv, a drop-in replacement for pip that handles custom package indexes more predictably.
Note
It’s recommended to use uv to install the vLLM wheel. Installing from custom package indexes with pip can be cumbersome because pip resolves packages from both
--extra-index-urland the default index, then selects the highest available version. This makes it difficult to install a wheel from a custom index when all dependency versions are pinned exactly.
Get started#
Pull the ROCm vLLM 0.19.1 Docker image.
docker pull rocm/vllm:rocm7.13.0_gfx950-dcgpu_ubuntu24.04_py3.13_pytorch_2.10.0_vllm_0.19.1
Start the Docker container.
docker run -it --rm \ --device /dev/kfd \ --device /dev/dri \ --network=host \ --ipc=host \ --group-add=video \ --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ -v <path/to/your/models>:/app/models \ -e HF_HOME="/app/models" \ rocm/vllm:rocm7.13.0_gfx950-dcgpu_ubuntu24.04_py3.13_pytorch_2.10.0_vllm_0.19.1 \ bash
Pull the ROCm vLLM 0.19.1 Docker image.
docker pull rocm/vllm:rocm7.13.0_gfx94X-dcgpu_ubuntu24.04_py3.13_pytorch_2.10.0_vllm_0.19.1
Start the Docker container.
docker run -it --rm \ --device /dev/kfd \ --device /dev/dri \ --network=host \ --ipc=host \ --group-add=video \ --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ -v <path/to/your/models>:/app/models \ -e HF_HOME="/app/models" \ rocm/vllm:rocm7.13.0_gfx94X-dcgpu_ubuntu24.04_py3.13_pytorch_2.10.0_vllm_0.19.1 \ bash
Pull the ROCm vLLM Docker image.
docker pull rocm/vllm:rocm7.13.0_gfx120X-all_ubuntu24.04_py3.13_pytorch_2.10.0_vllm_0.19.1
Start the Docker container.
docker run -it --rm \ --device /dev/kfd \ --device /dev/dri \ --network=host \ --ipc=host \ --group-add=video \ --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ -v <path/to/your/models>:/app/models \ -e HF_HOME="/app/models" \ rocm/vllm:rocm7.13.0_gfx120X-all_ubuntu24.04_py3.13_pytorch_2.10.0_vllm_0.19.1 \ bash
Pull the ROCm vLLM Docker image.
docker pull rocm/vllm:rocm7.13.0_gfx110X-all_ubuntu24.04_py3.13_pytorch_2.10.0_vllm_0.19.1
Start the Docker container.
docker run -it --rm \ --device /dev/kfd \ --device /dev/dri \ --network=host \ --ipc=host \ --group-add=video \ --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ -v <path/to/your/models>:/app/models \ -e HF_HOME="/app/models" \ rocm/vllm:rocm7.13.0_gfx110X-all_ubuntu24.04_py3.13_pytorch_2.10.0_vllm_0.19.1 \ bash
Pull the ROCm vLLM Docker image.
docker pull rocm/vllm:rocm7.13.0_gfx1151_ubuntu24.04_py3.13_pytorch_2.10.0_vllm_0.19.1
Start the Docker container.
docker run -it --rm \ --device /dev/kfd \ --device /dev/dri \ --network=host \ --ipc=host \ --group-add=video \ --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ -v <path/to/your/models>:/app/models \ -e HF_HOME="/app/models" \ rocm/vllm:rocm7.13.0_gfx1151_ubuntu24.04_py3.13_pytorch_2.10.0_vllm_0.19.1 \ bash
Pull the ROCm vLLM Docker image.
docker pull rocm/vllm:rocm7.13.0_gfx1150_ubuntu24.04_py3.13_pytorch_2.10.0_vllm_0.19.1
Start the Docker container.
docker run -it --rm \ --device /dev/kfd \ --device /dev/dri \ --network=host \ --ipc=host \ --group-add=video \ --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ -v <path/to/your/models>:/app/models \ -e HF_HOME="/app/models" \ rocm/vllm:rocm7.13.0_gfx1150_ubuntu24.04_py3.13_pytorch_2.10.0_vllm_0.19.1 \ bash
Pull the ROCm vLLM Docker image.
docker pull rocm/vllm:rocm7.13.0_gfx1152_ubuntu24.04_py3.13_pytorch_2.10.0_vllm_0.19.1
Start the Docker container.
docker run -it --rm \ --device /dev/kfd \ --device /dev/dri \ --network=host \ --ipc=host \ --group-add=video \ --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ -v <path/to/your/models>:/app/models \ -e HF_HOME="/app/models" \ rocm/vllm:rocm7.13.0_gfx1152_ubuntu24.04_py3.13_pytorch_2.10.0_vllm_0.19.1 \ bash
See also
After setting up your environment, follow the vLLM 0.19.1 usage documentation to get started: Using vLLM.
Install vLLM using pip#
Set up your Python virtual environment.
python -m venv .venv
Activate your Python virtual environment.
source .venv/bin/activate
Install PyTorch 2.10 in your virtual environment. This should also install the ROCm core libraries as a dependency.
Note
torchvision0.25 must be installed alongside PyTorch — vLLM requires it and will fail without it.python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx950-dcgpu/ \ "torch==2.10.0+rocm7.13.0" \ "torchvision==0.25.0+rocm7.13.0" \ "torchaudio==2.10.0+rocm7.13.0"
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx94X-dcgpu/ \ "torch==2.10.0+rocm7.13.0" \ "torchvision==0.25.0+rocm7.13.0" \ "torchaudio==2.10.0+rocm7.13.0"
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx120X-all/ \ "torch==2.10.0+rocm7.13.0" \ "torchvision==0.25.0+rocm7.13.0" \ "torchaudio==2.10.0+rocm7.13.0"
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx110X-all/ \ "torch==2.10.0+rocm7.13.0" \ "torchvision==0.25.0+rocm7.13.0" \ "torchaudio==2.10.0+rocm7.13.0"
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx1151/ \ "torch==2.10.0+rocm7.13.0" \ "torchvision==0.25.0+rocm7.13.0" \ "torchaudio==2.10.0+rocm7.13.0"
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx1150/ \ "torch==2.10.0+rocm7.13.0" \ "torchvision==0.25.0+rocm7.13.0" \ "torchaudio==2.10.0+rocm7.13.0"
python -m pip install --index-url https://repo.amd.com/rocm/whl/gfx1152/ \ "torch==2.10.0+rocm7.13.0" \ "torchvision==0.25.0+rocm7.13.0" \ "torchaudio==2.10.0+rocm7.13.0"
Install Flash Attention.
python -m pip install https://rocm.frameworks.amd.com/whl/gfx950-dcgpu/flash_attn-2.8.3-cp313-cp313-linux_x86_64.whl
python -m pip install https://rocm.frameworks.amd.com/whl/gfx94X-dcgpu/flash_attn-2.8.3-cp313-cp313-linux_x86_64.whl
python -m pip install https://rocm.frameworks.amd.com/whl/gfx120X-all/flash_attn-2.8.3-py3-none-any.whl
python -m pip install https://rocm.frameworks.amd.com/whl/gfx110X-all/flash_attn-2.8.3-py3-none-any.whl
python -m pip install https://rocm.frameworks.amd.com/whl/gfx1151/flash_attn-2.8.3-py3-none-any.whl
python -m pip install https://rocm.frameworks.amd.com/whl/gfx1150/flash_attn-2.8.3-py3-none-any.whl
python -m pip install https://rocm.frameworks.amd.com/whl/gfx1152/flash_attn-2.8.3-py3-none-any.whl
Install the vLLM 0.19.1 wheel using
uv pip.uv pip install https://rocm.frameworks.amd.com/whl/gfx950-dcgpu/vllm-0.19.1.dev3%2Brocm7.13.0.g24efb8904.d20260514-cp313-cp313-linux_x86_64.whl
uv pip install https://rocm.frameworks.amd.com/whl/gfx94X-dcgpu/vllm-0.19.1.dev3%2Brocm7.13.0.g24efb8904.d20260514-cp313-cp313-linux_x86_64.whl
uv pip install https://rocm.frameworks.amd.com/whl/gfx120X-all/vllm-0.19.1.dev3%2Brocm7.13.0.g24efb8904.d20260514-cp313-cp313-linux_x86_64.whl
uv pip install https://rocm.frameworks.amd.com/whl/gfx110X-all/vllm-0.19.1.dev3%2Brocm7.13.0.g24efb8904.d20260514-cp313-cp313-linux_x86_64.whl
uv pip install https://rocm.frameworks.amd.com/whl/gfx1151/vllm-0.19.1.dev3%2Brocm7.13.0.g24efb8904.d20260514-cp313-cp313-linux_x86_64.whl
uv pip install https://rocm.frameworks.amd.com/whl/gfx1150/vllm-0.19.1.dev3%2Brocm7.13.0.g24efb8904.d20260514-cp313-cp313-linux_x86_64.whl
uv pip install https://rocm.frameworks.amd.com/whl/gfx1152/vllm-0.19.1.dev3%2Brocm7.13.0.g24efb8904.d20260514-cp313-cp313-linux_x86_64.whl
Set the following environment variables to prevent errors related to ROCm platform and Flash Attention availability when running vLLM.
export PYTHONPATH=$VIRTUAL_ENV/lib/python3.13/site-packages/_rocm_sdk_core/share/amd_smi export FLASH_ATTENTION_TRITON_AMD_ENABLE=TRUE
Check your installation.
echo "=== vLLM ===" && python -c "import vllm; print('vLLM version:', vllm.__version__)" echo "=== PyTorch ===" && python -c "import torch; print('PyTorch:', torch.__version__); print('HIP available:', torch.cuda.is_available()); print('HIP built:', torch.backends.hip.is_built() if hasattr(torch.backends, 'hip') else 'N/A')" echo "=== flash-attn ===" && python -c "import flash_attn; print('flash-attn:', flash_attn.__version__)"
See also
After setting up your environment, follow the vLLM 0.19.1 usage documentation to get started: Using vLLM.