llama.cpp on ROCm installation#

2026-03-03

6 min read time

Applies to Linux

System requirements#

To use llama.cpp b5997, you need the following prerequisites:

  • ROCm version: 6.4.0

  • Operating system: Ubuntu 24.04

  • GPU platform: AMD Instinct™ MI300X, MI210

Key ROCm libraries for llama.cpp#

llama.cpp functionality on ROCm is determined by its underlying library dependencies. These ROCm components affect the capabilities, performance, and feature set available to developers. Ensure you have the required libraries for your corresponding ROCm version.

ROCm library

ROCm 6.4.x version

Purpose

Usage

hipBLAS

2.4.0

Provides GPU-accelerated Basic Linear Algebra Subprograms (BLAS) for matrix and vector operations.

Supports operations such as matrix multiplication, matrix-vector products, and tensor contractions. Utilized in both dense and batched linear algebra operations.

hipBLASLt

0.12.0

hipBLASLt is an extension of the hipBLAS library, providing additional features like epilogues fused into the matrix multiplication kernel or use of integer tensor cores.

By setting the flag ROCBLAS_USE_HIPBLASLT, you can dispatch hipBLASLt kernels where possible.

rocWMMA

1.7.0

Accelerates warp-level matrix-multiply and matrix-accumulate to speed up matrix multiplication (GEMM) and accumulation operations with mixed precision support.

Can be used to enhance the flash attention performance on AMD compute, by enabling the flag at compile time.

Install llama.cpp#

To install llama.cpp on ROCm, you have the following options:

Use a prebuilt Docker image with llama.cpp pre-installed#

Docker is the recommended method to set up a llama.cpp environment, and it avoids potential installation issues. The tested, prebuilt image includes llama.cpp, ROCm, and other dependencies.

Important

To follow these instructions, input your chosen tag into <TAG>. Example: llama.cpp-b5997_rocm6.4.0_ubuntu24.04.

Tag endings of _full, _server, and _light serve different purposes for entrypoints as follows:

  • Full: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization.

  • Server: This image only includes the server executable file.

  • Light: This image only includes the main executable file.

You can download Docker images with specific ROCm, llama.cpp, and operating system versions. See the available tags on Docker Hub and see docker image support below.

  1. Download your required public llama.cpp Docker image:

    docker pull rocm/llama.cpp:<TAG>_full
    docker pull rocm/llama.cpp:<TAG>_server
    docker pull rocm/llama.cpp:<TAG>_light
    
  2. Launch and connect to the container with the respective entrypoints of your image:

    export MODEL_PATH='<your_model_path>'
    
    # To run the 'full' docker image with main executable (--run) and other options
    docker run --privileged \
               --network=host \
               --device=/dev/kfd \
               --device=/dev/dri \
               --group-add video \
               --cap-add=SYS_PTRACE \
               --security-opt seccomp=unconfined \
               --ipc=host \
               --shm-size 16G \
               -v $MODEL_PATH:/data \
               rocm/llama.cpp:<TAG>_full \
                 --run -m /data/DeepSeek-V3-Q4_K_M-00001-of-00009.gguf \
                 -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 999
    
    # To run the 'server' docker image with the server executable
    docker run --privileged \
               --network=host \
               --device=/dev/kfd \
               --device=/dev/dri \
               --group-add video \
               --cap-add=SYS_PTRACE \
               --security-opt seccomp=unconfined \
               --ipc=host \
               --shm-size 16G \
               -v $MODEL_PATH:/data \
               rocm/llama.cpp:<TAG>_server \
                 -m /data/DeepSeek-V3-Q4_K_M-00001-of-00009.gguf \
                 --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 999
    
    # To run the 'light' docker image with only the main executable
    docker run --privileged \
               --network=host \
               --device=/dev/kfd \
               --device=/dev/dri \
               --group-add video \
               --cap-add=SYS_PTRACE \
               --security-opt seccomp=unconfined \
               --ipc=host \
               --shm-size 16G \
               -v $MODEL_PATH:/data \
               rocm/llama.cpp:<TAG>_light \
                 -m /data/DeepSeek-V3-Q4_K_M-00001-of-00009.gguf \
                 -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 999
    

    Note

    This step will automatically download the image if it does not exist on the host. You can also pass the -v argument to mount any data directories from the host onto the container.

Docker image support#

AMD validates and publishes ready-made llama.cpp images with ROCm backends on Docker Hub. The following Docker image tags and associated inventories are validated for ROCm 6.4.0.

Build your own Docker image#

If you want to explore llama.cpp capabilities without being limited to the entrypoints from the prebuilt Docker images, you have the option to build directly from source inside a ROCm Ubuntu base Docker image.

The prebuilt base Docker image has all dependencies installed, including:

  • ROCm

  • hipBLAS

  • hipBLASLt

  • rocWMMA

  1. Start your local container from the base ROCm 6.4.0 image:

    export MODEL_PATH='./models'
    
    docker run -it \
          --name=$(whoami)_llamacpp \
          --privileged --network=host \
          --device=/dev/kfd --device=/dev/dri \
          --group-add video --cap-add=SYS_PTRACE \
          --security-opt seccomp=unconfined \
          --ipc=host --shm-size 16G \
          -v $MODEL_PATH:/data
          rocm/dev-ubuntu-24.04:6.4-complete
    

Once inside the Docker container, run the following steps:

  1. Set up your workspace:

    apt-get update && apt-get install -y nano libcurl4-openssl-dev cmake git
    mkdir -p /workspace && cd /workspace
    
  2. Clone the ROCm/llama.cpp repository:

    git clone https://github.com/ROCm/llama.cpp
    cd llama.cpp
    
  3. Set your ROCm architecture:

    To compile for supported microarchitectures, run:

    export LLAMACPP_ROCM_ARCH=gfx942,gfx90a
    

    Note

    To compile for a wide range of microarchitectures, run:

    export LLAMACPP_ROCM_ARCH=gfx803,gfx900,gfx906,gfx908,gfx90a,gfx942,gfx1010,gfx1030,gfx1032,gfx1100,gfx1101,gfx1102
    
  4. Build and install llama.cpp:

    HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
    cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=$LLAMACPP_ROCM_ARCH \
    -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON \
    && cmake --build build --config Release -j$(nproc)
    

Test the llama.cpp installation#

llama.cpp unit tests are optional for validating your installation if you used a prebuilt Docker image from AMD ROCm Docker Hub.

To run unit tests manually and validate your installation fully, follow these steps:

  1. To verify that llama.cpp has been successfully installed, run the Docker container as described in Build your own Docker image.

  2. Once inside the container, ensure you have access to the Bash shell.

    cd /workspace/llama.cpp
    ./build/bin/test-backend-ops
    

    Note

    Running unit tests requires at least one supported AMD GPU.