ROCm-LLMExt 25.09 release notes#
3 min read time
This is the second release of the AMD ROCm LLMExt toolkit (ROCm-LLMExt), an open-source software toolkit built on the ROCm platform for large language model (LLM) extensions, integrations, and performance enablement on AMD GPUs. The domain brings together training, post-training, inference, and orchestration components to make modern LLM stacks practical and reproducible on AMD hardware.
Release highlights#
Note
ROCm-LLMExt 25.09 includes targeted updates to one component (llama.cpp) and introduces another (FlashInfer); four components remain unchanged (verl, Stanford Megatron-LM, Megablocks, and Ray).
This release introduces support for ROCm 7.0.0, ROCm 6.4.3, ROCm 6.4.2 and ROCm 6.4.1 for the following component:
llama.cpp is an open-source inference library and framework for large language models (LLMs) that runs on both central processing units (CPUs) and graphics processing units (GPUs). It is written in plain C/C++, providing a simple, dependency-free setup.
This release introduces support for ROCm 6.4.1 for the following component:
FlashInfer is a library and kernel generator for large language models (LLMs) that provides a high-performance implementation of kernels for graphics processing units (GPUs). FlashInfer focuses on LLM serving and inference, as well as advanced performance across diverse scenarios.
System requirements#
For the 25.09 release, the ROCm‑LLMExt components span a range of ROCm version requirements depending on the specific extension. Ensure you follow the installation instructions for each individual component, where the exact ROCm dependency is listed, or refer to the compatibility matrix to verify supported ROCm versions.
ROCm-LLMExt components#
The following table lists ROCm-LLMExt component versions for the 25.09 release. Click to go to the component’s source on GitHub.
| Name | Version | Source |
|---|---|---|
| verl | 0.3.0.post0 | |
| Stanford Megatron-LM | 85f95ae | |
| Megablocks | 0.7.0 | |
| Ray | 2.48.0.post0 | |
| llama.cpp | b5997 ⇒ b6356 | |
| FlashInfer | 0.2.5 |
Detailed component changelogs#
llama.cpp b6356#
This release adds support for ROCm 7.0.0, 6.4.3, 6.4.2, and 6.4.1 on AMD Instinct MI325X, MI300X and MI210 GPUs.
FlashInfer 0.2.5#
This release adds support for ROCm 6.4.1 on AMD Instinct MI300X GPUs.