AMD ROCm LLMExt documentation

AMD ROCm LLMExt documentation#

2026-03-20

2 min read time

Applies to Linux

AMD ROCm LLMExt (ROCm-LLMExt) is an open-source software toolkit built on the ROCm platform for large language model (LLM) extensions, integrations, and performance enablement on AMD GPUs. The domain brings together training, post-training, inference, and orchestration components to make modern LLM stacks practical and reproducible on AMD hardware.

LLM Task

Features

Training

  • Large-scale transformer training

  • Distributed parallelism (data, tensor, and pipeline)

  • Mixed precision and performance tuning

  • Mixture-of-Experts (MoE) enablement

Post-training and alignment

  • Reinforcement learning and post-training workflows

  • Scalable experimentation

  • Reproducible configurations

Inference and serving

  • High-throughput decoding and low-latency serving

  • Optimized attention and inference operators

  • Lightweight and edge-friendly inference paths

Distributed execution

  • Multi-node orchestration

  • Cluster bring-up and scheduling

  • Batch and online inference pipelines

The ROCm-LLMExt source code is hosted on GitHub at ROCm/ROCm-LLMExt.

Note

ROCm-LLMExt 26.02 includes targeted updates to four components (verl, Ray, llama.cpp, and FlashInfer); two components remain unchanged (Stanford Megatron-LM and Megablocks).

ROCm-LLMExt documentation is organized into the following categories:

To contribute to the documentation, see Contributing to ROCm.

You can find licensing information on the Licensing page.