llama.cpp on ROCm installation#
2025-09-09
10 min read time
llama.cpp is an open-source framework for Large Language Model (LLM) inference that runs on both central processing units (CPUs) and graphics processing units (GPUs). It is written in plain C/C++, providing a simple, dependency-free setup.
The framework supports multiple quantization options, from 1.5-bit to 8-bit integers, to speed up inference and reduce memory usage. Originally built as a CPU-first library, llama.cpp is easy to integrate with other programming environments and is widely adopted across diverse platforms, including consumer devices.
For hardware, software, and third-party framework compatibility between ROCm and llama.cpp, see the following resources:
Note
llama.cpp is supported on ROCm 6.4.0.
Install llama.cpp#
To install llama.cpp for ROCm, you have the following options:
Use a prebuilt Docker image with llama.cpp pre-installed#
Docker is the recommended method to set up a llama.cpp environment, and it avoids potential installation issues. The tested, prebuilt image includes llama.cpp, ROCm, and other dependencies.
Important
To follow these instructions, input your chosen tag into <TAG>
. Example: llama.cpp-b5997_rocm6.4.0_ubuntu24.04
.
Tag endings of _full
, _server
, and _light
serve different purposes for entrypoints as follows:
Full: This image includes both the main executable file and the tools to convert
LLaMA
models intoggml
and convert into 4-bit quantization.Server: This image only includes the server executable file.
Light: This image only includes the main executable file.
You can download Docker images with specific ROCm, llama.cpp, and operating system versions. See the available tags on Docker Hub and see docker image support below.
Download your required public llama.cpp Docker image
docker pull rocm/llama.cpp:<TAG>_full docker pull rocm/llama.cpp:<TAG>_server docker pull rocm/llama.cpp:<TAG>_light
Launch and connect to the container with the respective entrypoints of your image
export MODEL_PATH='<your_model_path>' # To run the 'full' docker image with main executable (--run) and other options docker run --privileged \ --network=host \ --device=/dev/kfd \ --device=/dev/dri \ --group-add video \ --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ --ipc=host \ --shm-size 16G \ -v $MODEL_PATH:/data \ rocm/llama.cpp:<TAG>_full \ --run -m /data/DeepSeek-V3-Q4_K_M-00001-of-00009.gguf \ -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 999 # To run the 'server' docker image with the server executable docker run --privileged \ --network=host \ --device=/dev/kfd \ --device=/dev/dri \ --group-add video \ --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ --ipc=host \ --shm-size 16G \ -v $MODEL_PATH:/data \ rocm/llama.cpp:<TAG>_server \ -m /data/DeepSeek-V3-Q4_K_M-00001-of-00009.gguf \ --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 999 # To run the 'light' docker image with only the main executable docker run --privileged \ --network=host \ --device=/dev/kfd \ --device=/dev/dri \ --group-add video \ --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ --ipc=host \ --shm-size 16G \ -v $MODEL_PATH:/data \ rocm/llama.cpp:<TAG>_light \ -m /data/DeepSeek-V3-Q4_K_M-00001-of-00009.gguf \ -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 999
Note
This will automatically download the image if it does not exist on the host. You can also pass the
-v
argument to mount any data directories from the host onto the container.
Docker image support#
AMD validates and publishes ready-made llama.cpp images with ROCm backends on Docker Hub. The following Docker image tags and associated inventories are validated for ROCm 6.4.0.
Build your own Docker image#
If you want to explore llama.cpp capabilities without being limited to the entrypoints from the prebuilt Docker images, you have the option to build directly from source inside a ROCm Ubuntu base Docker image.
The prebuilt base Docker image has all dependencies installed, including:
ROCm
hipBlas
hipBlasLt
rocWMMA
Start your local container from the base ROCm 6.4.0 image
export MODEL_PATH='./models' docker run -it \ --name=$(whoami)_llamacpp \ --privileged --network=host \ --device=/dev/kfd --device=/dev/dri \ --group-add video --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ --ipc=host --shm-size 16G \ -v $MODEL_PATH:/data rocm/dev-ubuntu-24.04:6.4-complete
Once inside the docker container, run the following steps:
Setup your workspace
apt-get update && apt-get install -y nano libcurl4-openssl-dev cmake git mkdir -p /workspace && cd /workspace
Clone the ROCm/llama.cpp repository
git clone https://github.com/ROCm/llama.cpp cd llama.cpp
Set your ROCm architecture
To compile for a wide range of microarchitectures, run:
export LLAMACPP_ROCM_ARCH=gfx803,gfx900,gfx906,gfx908,gfx90a,gfx942,gfx1010,gfx1030,gfx1032,gfx1100,gfx1101,gfx1102
Build and install llama.cpp
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \ cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=$LLAMACPP_ROCM_ARCH \ -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON \ && cmake --build build --config Release -j$(nproc)
Test the llama.cpp installation#
llama.cpp unit tests are optional for validating your installation if you used a prebuilt Docker image from AMD ROCm Docker Hub.
To run unit tests manually and validate your installation fully, follow these steps:
To verify that llama.cpp has been successfully installed, run the Docker container as described in Build your own Docker image.
Once inside the container, ensure you have access to the Bash shell.
cd /workspace/llama.cpp ./build/bin/test-backend-ops
Note
Running unit tests requires at least one AMD GPU.
Run a llama.cpp example#
The ROCm/llama.cpp repository provides the necessary examples that exercise the functionality of your framework.
You can also search for llama.cpp examples on the AMD ROCm blog, to find instructions to prepare your model and test the containers.
Two most popular use-cases are:
llama-cli: The main executable to run the model interactively or get a response to a prompt.
llama-bench: Run a benchmark of your model with different configurations.
Main Application: llama-cli
#
Use the CLI tool to start the client
./build/bin/llama-cli -m /data/DeepSeek-V3-Q4_K_M/DeepSeek-V3-Q4_K_M-00001-of-00009.gguf -ngl 999
A prompt will appear when the client is ready, and you can start interacting with the model using the client:
> hi, who are you? Hi! I’m an AI assistant here to help answer your questions, provide information, or just chat with you. How can I assist you today? 😊 > What are the main causes of heart failure? Heart failure is a condition in which the heart cannot pump blood effectively to meet the body's needs. It can result from various underlying causes or contributing factors. The **main causes of heart failure** include: --- ### 1. **Coronary Artery Disease (CAD)** - Narrowing or blockage of the coronary arteries reduces blood flow to the heart muscle, weakening it over time. - A heart attack (myocardial infarction) can cause significant damage to the heart muscle, leading to heart failure. --- ### 2. **High Blood Pressure (Hypertension)** - Chronic high blood pressure forces the heart to work harder to pump blood, eventually causing the heart muscle to thicken or weaken. --- ### 3. **Cardiomyopathy** - Diseases of the heart muscle, such as dilated cardiomyopathy, hypertrophic cardiomyopathy, or restrictive cardiomyopathy, can impair the heart's ability to pump effectively. ... ### 11. **Other Conditions** - Thyroid disorders, severe anemia, or infections like myocarditis can also lead to heart failure. --- ### Prevention and Management
Benchmark Application: llama-bench
#
Use the CLI tool to start the application
./build/bin/llama-bench \ -m /data/DeepSeek-V3-Q4_K_M/DeepSeek-V3-Q4_K_M-00001-of-00009.gguf \ -p 16,32,64,96,128,256,512,1024,2048,4096 \ -n 64,128,256 \ -ngl 999
The result of the command above should be similar to the following when running on a MI300X system:
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 8 ROCm devices: Device 0: AMD Instinct MI300X, gfx942:sramecc+:xnack- (0x942), VMM: no, Wave Size: 64 Device 1: AMD Instinct MI300X, gfx942:sramecc+:xnack- (0x942), VMM: no, Wave Size: 64 Device 2: AMD Instinct MI300X, gfx942:sramecc+:xnack- (0x942), VMM: no, Wave Size: 64 Device 3: AMD Instinct MI300X, gfx942:sramecc+:xnack- (0x942), VMM: no, Wave Size: 64 Device 4: AMD Instinct MI300X, gfx942:sramecc+:xnack- (0x942), VMM: no, Wave Size: 64 Device 5: AMD Instinct MI300X, gfx942:sramecc+:xnack- (0x942), VMM: no, Wave Size: 64 Device 6: AMD Instinct MI300X, gfx942:sramecc+:xnack- (0x942), VMM: no, Wave Size: 64 Device 7: AMD Instinct MI300X, gfx942:sramecc+:xnack- (0x942), VMM: no, Wave Size: 64 | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | | deepseek2 671B Q4_K - Medium | 376.65 GiB | 671.03 B | ROCm | 999 | pp16 | 47.71 ± 1.50 | | deepseek2 671B Q4_K - Medium | 376.65 GiB | 671.03 B | ROCm | 999 | pp32 | 72.22 ± 0.76 | | deepseek2 671B Q4_K - Medium | 376.65 GiB | 671.03 B | ROCm | 999 | pp64 | 111.48 ± 2.64 | | deepseek2 671B Q4_K - Medium | 376.65 GiB | 671.03 B | ROCm | 999 | pp96 | 148.84 ± 1.22 | | deepseek2 671B Q4_K - Medium | 376.65 GiB | 671.03 B | ROCm | 999 | pp128 | 180.11 ± 1.54 | | deepseek2 671B Q4_K - Medium | 376.65 GiB | 671.03 B | ROCm | 999 | pp256 | 290.04 ± 1.37 | | deepseek2 671B Q4_K - Medium | 376.65 GiB | 671.03 B | ROCm | 999 | pp512 | 439.14 ± 1.68 | | deepseek2 671B Q4_K - Medium | 376.65 GiB | 671.03 B | ROCm | 999 | pp1024 | 439.02 ± 1.61 | | deepseek2 671B Q4_K - Medium | 376.65 GiB | 671.03 B | ROCm | 999 | pp2048 | 432.00 ± 2.87 | | deepseek2 671B Q4_K - Medium | 376.65 GiB | 671.03 B | ROCm | 999 | pp4096 | 420.19 ± 0.62 | | deepseek2 671B Q4_K - Medium | 376.65 GiB | 671.03 B | ROCm | 999 | tg64 | 37.36 ± 0.03 | | deepseek2 671B Q4_K - Medium | 376.65 GiB | 671.03 B | ROCm | 999 | tg128 | 37.04 ± 0.02 | | deepseek2 671B Q4_K - Medium | 376.65 GiB | 671.03 B | ROCm | 999 | tg256 | 36.53 ± 0.01 | build: 66906cd8 (5997)