vLLM Linux Docker Image#

Virtual Large Language Model (vLLM) is a fast and easy-to-use library for LLM inference and serving, providing greater optimizations and performance.

For additional information, visit the AMD vLLM GitHub page.

Note that this is a benchmarking demo/example. Installation for other vLLM models/configurations may differ.

Additional information#

  • Ensure Docker is installed on your system. Refer to https://docs.docker.com/engine/install/ for more information.

  • This docker supports gfx1151 and gfx1150. Refer to the compatibility matrix for more information.

  • This example highlights use of the AMD vLLM Docker using deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. Other models vLLM supported models can be used too.

Download and install Docker image#

Download Docker image#

Select the applicable Ubuntu version to download the compatible Docker image before starting.

docker pull rocm/vllm-dev:rocm7.0.2_navi_ubuntu24.04_py3.12_pytorch_2.8_vllm_0.10.2rc1

Note
For more information, see rocm/vllm-dev.

Installation#

Follow these steps to build a vLLM Docker image and benchmark a model.

  1. Start the Docker container.

    docker run -it \
      --privileged \
      --device=/dev/kfd \
      --device=/dev/dri \
      --network=host \
      --group-add sudo \
      -w /app/vllm/ \
      --name <container_name> \
    <image_name> \
      /bin/bash
    

    Note
    You can find the <image_name> by running docker images. The container_name is user defined. Ensure to name your Docker using this value.

  2. Run benchmarks with the Docker container.

    cd benchmarks
    python3 benchmark_latency.py --model deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
    

Additional Usage#

  • vLLM is optimized to serve LLMs faster and more efficiently, especially for applications requiring high throughput and scalability. See Quickstart - vLLM for more information.

  • To run offline inference, see Quickstart - vLLM for more information.