vLLM Linux Docker Image#
Virtual Large Language Model (vLLM) is a fast and easy-to-use library for LLM inference and serving, providing greater optimizations and performance.
For additional information, visit the AMD vLLM GitHub page.
Note
This is a benchmarking demo/example. Installation for other vLLM models/configurations may differ.
Additional information
Ensure Docker is installed on your system. Refer to this link for more information.
This docker image supports gfx1151 and gfx1150.
This example highlights use of the AMD vLLM Docker using deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. Other models vLLM supported models can be used too.
Download and install Docker image#
Download Docker image#
Select the applicable Ubuntu version to download the compatible Docker image before starting.
docker pull rocm/vllm-dev:rocm7.1.1_navi_ubuntu24.04_py3.12_pytorch_2.8_vllm_0.10.2rc1
Note
For more information, see rocm/vllm-dev.
Installation#
Follow these steps to build a vLLM Docker image and benchmark a model.
Start the Docker container.
docker run -it \ --privileged \ --device=/dev/kfd \ --device=/dev/dri \ --network=host \ --group-add sudo \ -w /app/vllm/ \ --name <container_name> \ <image_name> \ /bin/bash
Note
You can find the<image_name>by runningdocker images. Thecontainer_nameis user defined. Ensure to name your Docker using this value.Run benchmarks with the Docker container.
cd benchmarks python3 benchmark_latency.py --model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Additional Usage#
vLLM is optimized to serve LLMs faster and more efficiently, especially for applications requiring high throughput and scalability. See Quickstart - OpenAI Compatible Server for more information.
To run offline inference, see Quickstart - Offline Batched Inference for more information.
Note
If you experience errors withtorch.distributed, runningexport GLOO_SOCKET_IFNAME=lomay resolve the issue.