vLLM Linux Docker Image#
Virtual Large Language Model (vLLM) is a fast and easy-to-use library for LLM inference and serving, providing greater optimizations and performance.
For additional information, visit the AMD vLLM GitHub page.
Note that this is a benchmarking demo/example. Installation for other vLLM models/configurations may differ.
Additional information#
Ensure Docker is installed on your system. Refer to https://docs.docker.com/engine/install/ for more information.
This docker supports gfx1151 and gfx1150. Refer to the compatibility matrix for more information.
This example highlights use of the AMD vLLM Docker using deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. Other models vLLM supported models can be used too.
Download and install Docker image#
Download Docker image#
Select the applicable Ubuntu version to download the compatible Docker image before starting.
docker pull rocm/vllm-dev:rocm7.0.2_navi_ubuntu24.04_py3.12_pytorch_2.8_vllm_0.10.2rc1
Note
For more information, see rocm/vllm-dev.
Installation#
Follow these steps to build a vLLM Docker image and benchmark a model.
Start the Docker container.
docker run -it \ --privileged \ --device=/dev/kfd \ --device=/dev/dri \ --network=host \ --group-add sudo \ -w /app/vllm/ \ --name <container_name> \ <image_name> \ /bin/bash
Note
You can find the<image_name>by runningdocker images. Thecontainer_nameis user defined. Ensure to name your Docker using this value.Run benchmarks with the Docker container.
cd benchmarks python3 benchmark_latency.py --model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Additional Usage#
vLLM is optimized to serve LLMs faster and more efficiently, especially for applications requiring high throughput and scalability. See Quickstart - vLLM for more information.
To run offline inference, see Quickstart - vLLM for more information.