llama.cpp on ROCm installation

llama.cpp on ROCm installation#

2025-11-10

11 min read time

Applies to Linux

llama.cpp is an open-source framework for Large Language Model (LLM) inference that runs on both central processing units (CPUs) and graphics processing units (GPUs).

This topic covers setup instructions and the necessary files to build, test, and run llama.cpp with ROCm support in a Docker environment. To learn more about llama.cpp on ROCm, including its use cases, recommendations, as well as hardware and software compatibility, see llama.cpp compatibility.

Note

llama.cpp is supported on ROCm 7.0.0 and 6.4.x. This topic provides the latest installation instructions for ROCm 7.0.0. For previous versions, see llama.cpp version history.

Install llama.cpp#

To install llama.cpp for ROCm, you have the following options:

Use a prebuilt Docker image with llama.cpp pre-installed (recommended)
Build your own Docker image

Note

Release b6652.amd0 includes specific optimizations for AMD Instinct™ MI300X and ROCm 7.0.0. Refer to the AMD ROCm blog, to search for llama.cpp examples and best practices to optimize your workloads on AMD GPUs.

Use a prebuilt Docker image with llama.cpp pre-installed#

Docker is the recommended method to set up a llama.cpp environment, and it avoids potential installation issues. The tested, prebuilt image includes llama.cpp, ROCm, and other dependencies.

Important

To follow these instructions, input your chosen tag into <TAG>. Example: llama.cpp-b6652.amd0_rocm7.0.0_ubuntu24.04.

Tag endings of _full, _server, and _light serve different purposes for entrypoints as follows:

Full: This image includes both the main executable file and the tools to convert LLaMA models into ggml and apply 4-bit quantization.
Server: This image only includes the server executable file.
Light: This image only includes the main executable file.

You can download Docker images with specific ROCm, llama.cpp, and operating system versions. See the available tags on Docker Hub and see Docker image support below.

Download your required public llama.cpp Docker image:

docker pull rocm/llama.cpp:<TAG>_full
docker pull rocm/llama.cpp:<TAG>_server
docker pull rocm/llama.cpp:<TAG>_light

Launch and connect to the container with the respective entrypoints of your image:

export MODEL_PATH='<your_model_path>'

# Multi-GPU Setup (for example, an 8-GPU configuration) is required to load DeepSeek-V3-Q4_K_M model and prevent out-of-memory errors
# Loading the model may take several minutes depending on the hardware configuration

# To run the 'full' docker image with main executable (--run) and other options
docker run --privileged \
           --network=host \
           --device=/dev/kfd \
           --device=/dev/dri \
           --group-add video \
           --cap-add=SYS_PTRACE \
           --security-opt seccomp=unconfined \
           --ipc=host \
           --shm-size 16G \
           -v $MODEL_PATH:/data \
           rocm/llama.cpp:<TAG>_full \
             --run -m /data/DeepSeek-V3-Q4_K_M-00001-of-00009.gguf \
             -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 999

# To run the 'server' docker image with the server executable
docker run --privileged \
           --network=host \
           --device=/dev/kfd \
           --device=/dev/dri \
           --group-add video \
           --cap-add=SYS_PTRACE \
           --security-opt seccomp=unconfined \
           --ipc=host \
           --shm-size 16G \
           -v $MODEL_PATH:/data \
           rocm/llama.cpp:<TAG>_server \
             -m /data/DeepSeek-V3-Q4_K_M-00001-of-00009.gguf \
             --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 999

# To run the 'light' docker image with only the main executable
docker run --privileged \
           --network=host \
           --device=/dev/kfd \
           --device=/dev/dri \
           --group-add video \
           --cap-add=SYS_PTRACE \
           --security-opt seccomp=unconfined \
           --ipc=host \
           --shm-size 16G \
           -v $MODEL_PATH:/data \
           rocm/llama.cpp:<TAG>_light \
             -m /data/DeepSeek-V3-Q4_K_M-00001-of-00009.gguf \
             -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 999

Note

This will automatically download the image if it does not exist on the host. You can also pass the -v argument to mount any data directories from the host onto the container.

Docker image support#

AMD validates and publishes ready-made llama.cpp images with ROCm backends on Docker Hub. The following Docker image tags and associated inventories are validated for their respective ROCm versions listed below. For ROCm 6.4.x support and version history, see llama.cpp version history.

ROCm 7.0.0 - Ubuntu 24.04

Full Docker

Tag

rocm/llama.cpp:llama.cpp-b6652.amd0_rocm7.0.0_ubuntu24.04_full

Inventory

ROCm 7.0.0

Server Docker

Tag

rocm/llama.cpp:llama.cpp-b6652.amd0_rocm7.0.0_ubuntu24.04_server

Inventory

ROCm 7.0.0

Light Docker

Tag

rocm/llama.cpp:llama.cpp-b6652.amd0_rocm7.0.0_ubuntu24.04_light

Inventory

ROCm 7.0.0

ROCm 7.0.0 - Ubuntu 22.04

Full Docker

Tag

rocm/llama.cpp:llama.cpp-b6652.amd0_rocm7.0.0_ubuntu22.04_full

Inventory

ROCm 7.0.0

Server Docker

Tag

rocm/llama.cpp:llama.cpp-b6652.amd0_rocm7.0.0_ubuntu22.04_server

Inventory

ROCm 7.0.0

Light Docker

Tag

rocm/llama.cpp:llama.cpp-b6652.amd0_rocm7.0.0_ubuntu22.04_light

Inventory

ROCm 7.0.0

Build your own Docker image#

If you want to explore llama.cpp capabilities without being limited to the entrypoints from the prebuilt Docker images, you have the option to build directly from source inside a ROCm Ubuntu base Docker image.

The prebuilt base Docker image has all dependencies installed, including:

ROCm
hipBlas
hipBlasLt
rocWMMA

Choose your base Ubuntu Docker image with the correct ROCm version.

Ubuntu Version	ROCm Version	Base Image
24.04	7.0.0	`rocm/dev-ubuntu-24.04:7.0-complete`
22.04	7.0.0	`rocm/dev-ubuntu-22.04:7.0-complete`

Start your local container from the base image. The following example uses rocm/dev-ubuntu-24.04:7.0-complete:

export MODEL_PATH='./models'

docker run -it \
      --name=$(whoami)_llamacpp \
      --privileged --network=host \
      --device=/dev/kfd --device=/dev/dri \
      --group-add video --cap-add=SYS_PTRACE \
      --security-opt seccomp=unconfined \
      --ipc=host --shm-size 16G \
      -v $MODEL_PATH:/data
      rocm/dev-ubuntu-24.04:7.0-complete

Once inside the docker container, run the following steps:

Setup your workspace:

apt-get update && apt-get install -y nano libcurl4-openssl-dev cmake git
mkdir -p /workspace && cd /workspace

Clone the ROCm/llama.cpp repository:

git clone https://github.com/ROCm/llama.cpp
cd llama.cpp

Set your ROCm architecture:

To compile for supported microarchitectures, run:

export LLAMACPP_ROCM_ARCH=gfx942,gfx90a

Note

To compile for a wide range of microarchitectures, run:

export LLAMACPP_ROCM_ARCH=gfx803,gfx900,gfx906,gfx908,gfx90a,gfx942,gfx1010,gfx1030,gfx1032,gfx1100,gfx1101,gfx1102

Build and install llama.cpp:

HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=$LLAMACPP_ROCM_ARCH \
-DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON \
&& cmake --build build --config Release -j$(nproc)

Test the llama.cpp installation#

llama.cpp unit tests are optional for validating your installation if you used a prebuilt Docker image from AMD ROCm Docker Hub.

To run unit tests manually and validate your installation fully, follow these steps:

To verify that llama.cpp has been successfully installed, run the Docker container as described in Build your own Docker image.
Once inside the container, ensure you have access to the Bash shell.
```
cd /workspace/llama.cpp
./build/bin/test-backend-ops
```

Note

Running unit tests requires at least one AMD GPU.

Run a llama.cpp example#

The ROCm/llama.cpp repository provides the necessary examples that exercise the functionality of your framework.

You can also search for llama.cpp examples on the AMD ROCm blog, to find instructions to prepare your model and test the containers.

Two most popular use-cases are:

llama-cli: The main executable to run the model interactively or get a response to a prompt.
llama-bench: Run a benchmark of your model with different configurations.

Main Application: `llama-cli`#

Use the CLI tool to start the client:

./build/bin/llama-cli -m /data/DeepSeek-V3-Q4_K_M/DeepSeek-V3-Q4_K_M-00001-of-00009.gguf -ngl 999

A prompt will appear when the client is ready, and you can start interacting with the model using the client:

> hi, who are you?
Hi! I’m an AI assistant here to help answer your questions, provide information, or just chat with you.
How can I assist you today? 😊

> What are the main causes of heart failure?
Heart failure is a condition in which the heart cannot pump blood effectively to meet the body's needs.
It can result from various underlying causes or contributing factors.
The **main causes of heart failure** include:

---

### 1. **Coronary Artery Disease (CAD)**
   - Narrowing or blockage of the coronary arteries reduces blood flow to the heart muscle, weakening it over time.
   - A heart attack (myocardial infarction) can cause significant damage to the heart muscle, leading to heart failure.

---

### 2. **High Blood Pressure (Hypertension)**
   - Chronic high blood pressure forces the heart to work harder to pump blood, eventually causing the heart muscle to thicken or weaken.

---

### 3. **Cardiomyopathy**
   - Diseases of the heart muscle, such as dilated cardiomyopathy, hypertrophic cardiomyopathy, or restrictive cardiomyopathy, can impair the heart's ability to pump effectively.
...
### 11. **Other Conditions**
   - Thyroid disorders, severe anemia, or infections like myocarditis can also lead to heart failure.

---

### Prevention and Management

Benchmark Application: `llama-bench`#

Use the CLI tool to start the application:

./build/bin/llama-bench \
-m /data/DeepSeek-V3-Q4_K_M/DeepSeek-V3-Q4_K_M-00001-of-00009.gguf \
-p 16,32,64,96,128,256,512,1024,2048,4096 \
-n 64,128,256 \
-ngl 999

The result of the command above should be similar to the following when running on a MI300X system:

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 8 ROCm devices:
Device 0: AMD Instinct MI300X, gfx942:sramecc+:xnack- (0x942), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI300X, gfx942:sramecc+:xnack- (0x942), VMM: no, Wave Size: 64
Device 2: AMD Instinct MI300X, gfx942:sramecc+:xnack- (0x942), VMM: no, Wave Size: 64
Device 3: AMD Instinct MI300X, gfx942:sramecc+:xnack- (0x942), VMM: no, Wave Size: 64
Device 4: AMD Instinct MI300X, gfx942:sramecc+:xnack- (0x942), VMM: no, Wave Size: 64
Device 5: AMD Instinct MI300X, gfx942:sramecc+:xnack- (0x942), VMM: no, Wave Size: 64
Device 6: AMD Instinct MI300X, gfx942:sramecc+:xnack- (0x942), VMM: no, Wave Size: 64
Device 7: AMD Instinct MI300X, gfx942:sramecc+:xnack- (0x942), VMM: no, Wave Size: 64
| model                          |       size |     params | backend    | ngl |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| deepseek2 671B Q4_K - Medium   | 376.65 GiB |   671.03 B | ROCm       | 999 |            pp16 |        118.10 ± 4.78 |
| deepseek2 671B Q4_K - Medium   | 376.65 GiB |   671.03 B | ROCm       | 999 |            pp32 |        153.70 ± 3.11 |
| deepseek2 671B Q4_K - Medium   | 376.65 GiB |   671.03 B | ROCm       | 999 |            pp64 |        191.69 ± 1.95 |
| deepseek2 671B Q4_K - Medium   | 376.65 GiB |   671.03 B | ROCm       | 999 |            pp96 |        180.58 ± 2.99 |
| deepseek2 671B Q4_K - Medium   | 376.65 GiB |   671.03 B | ROCm       | 999 |           pp128 |        192.25 ± 3.14 |
| deepseek2 671B Q4_K - Medium   | 376.65 GiB |   671.03 B | ROCm       | 999 |           pp256 |        318.73 ± 3.79 |
| deepseek2 671B Q4_K - Medium   | 376.65 GiB |   671.03 B | ROCm       | 999 |           pp512 |        513.29 ± 4.07 |
| deepseek2 671B Q4_K - Medium   | 376.65 GiB |   671.03 B | ROCm       | 999 |          pp1024 |        880.58 ± 5.17 |
| deepseek2 671B Q4_K - Medium   | 376.65 GiB |   671.03 B | ROCm       | 999 |          pp2048 |       1358.24 ± 2.49 |
| deepseek2 671B Q4_K - Medium   | 376.65 GiB |   671.03 B | ROCm       | 999 |          pp4096 |       1650.81 ± 4.47 |
| deepseek2 671B Q4_K - Medium   | 376.65 GiB |   671.03 B | ROCm       | 999 |            tg64 |         42.94 ± 0.08 |
| deepseek2 671B Q4_K - Medium   | 376.65 GiB |   671.03 B | ROCm       | 999 |           tg128 |         42.24 ± 0.04 |
| deepseek2 671B Q4_K - Medium   | 376.65 GiB |   671.03 B | ROCm       | 999 |           tg256 |         41.82 ± 0.10 |

build: 071e9e45 (6662)

Previous versions#

See llama.cpp version history to find documentation for previous releases of the ROCm/llama.cpp Docker image.

llama.cpp on ROCm installation

Contents

llama.cpp on ROCm installation#

Install llama.cpp#

Use a prebuilt Docker image with llama.cpp pre-installed#

Docker image support#

Build your own Docker image#

Test the llama.cpp installation#

Run a llama.cpp example#

Main Application: llama-cli#

Benchmark Application: llama-bench#

Previous versions#

Main Application: `llama-cli`#

Benchmark Application: `llama-bench`#