vLLM inference and serving on ROCm

Contents

vLLM inference and serving on ROCm#

vLLM is an open-source library for fast, memory-efficient LLM inference and serving. This page describes how to set up and run vLLM on AMD GPUs and APUs using either a prebuilt Docker image (recommended) or pip. It applies to supported AMD GPUs and platforms.

Device family
vLLM version
Installation method

Prerequisites#