AMD ROCm LLMExt documentation

AMD ROCm LLMExt documentation#

2026-05-04

2 min read time

Applies to Linux

AMD ROCm LLMExt (ROCm-LLMExt) is an open-source software toolkit built on the ROCm platform for large language model (LLM) extensions, integrations, and performance enablement on AMD GPUs. The domain brings together training, post-training, inference, and orchestration components to make modern LLM stacks practical and reproducible on AMD hardware.

LLM Task	Features
Training	Large-scale transformer training Distributed parallelism (data, tensor, and pipeline) Mixed precision and performance tuning Mixture-of-Experts (MoE) enablement
Post-training and alignment	Reinforcement learning and post-training workflows Scalable experimentation Reproducible configurations
Inference and serving	High-throughput decoding and low-latency serving Optimized attention and inference operators Lightweight and edge-friendly inference paths
Distributed execution	Multi-node orchestration Cluster bring-up and scheduling Batch and online inference pipelines

The ROCm-LLMExt source code is hosted on GitHub at ROCm/ROCm-LLMExt.

Note

ROCm-LLMExt 26.04 introduces two agentic libraries (ComfyUI and ROCm-RAG) as part of the toolkit; other components remain unchanged (FlashInfer, llama.cpp, Ray, Triton Inference Server, and verl).

ROCm-LLMExt documentation is organized into the following categories:

Install

Install ROCm-LLMExt

Components

Resources

To contribute to the documentation, see Contributing to ROCm.

You can find licensing information on the Licensing page.