ROCm-LLMExt 26.02 release notes#
3 min read time
This is the third release of the AMD ROCm LLMExt toolkit (ROCm-LLMExt), an open-source software toolkit built on the ROCm platform for large language model (LLM) extensions, integrations, and performance enablement on AMD GPUs. The domain brings together training, post-training, inference, and orchestration components to make modern LLM stacks practical and reproducible on AMD hardware.
Release highlights#
Note
ROCm-LLMExt 26.02 includes targeted updates to four components (verl, Ray, llama.cpp, and FlashInfer); two components remain unchanged (Stanford Megatron-LM and Megablocks).
This release introduces support for ROCm 7.0.0 for two components:
verl is a flexible, efficient, and production-ready RL training library designed for large language models (LLMs) post-training.
Ray is a unified framework for scaling AI and Python applications from your laptop to a full cluster, without changing your code. Ray consists of a core distributed runtime and a set of AI libraries for simplifying machine learning computations. Ray is a general-purpose framework that runs many types of workloads efficiently. Any Python application can be scaled with Ray, without extra infrastructure.
This release enhances support on ROCm 7.0.0 with specific optimizations for AMD Instinct MI300X GPUs for the following component:
llama.cpp is an open-source inference library and framework for large language models (LLMs) that runs on both central processing units (CPUs) and graphics processing units (GPUs). It is written in plain C/C++, providing a simple, dependency-free setup.
This release introduces support for ROCm 7.1.1 for the following component:
FlashInfer is a library and kernel generator for large language models (LLMs) that provides a high-performance implementation of kernels for graphics processing units (GPUs). FlashInfer focuses on LLM serving and inference, as well as advanced performance across diverse scenarios.
System requirements#
For the 26.02 release, the ROCm‑LLMExt components span a range of ROCm version requirements depending on the specific extension. Ensure you follow the installation instructions for each individual component, where the exact ROCm dependency is listed, or refer to the compatibility matrix to verify supported ROCm versions.
ROCm-LLMExt components#
The following table lists ROCm-LLMExt component versions for the 26.02 release. Click to go to the component’s source on GitHub.
| Name | Version | Source |
|---|---|---|
| verl | 0.3.0.post0 ⇒ 0.6.0 | |
| Stanford Megatron-LM | 85f95ae | |
| Megablocks | 0.7.0 | |
| Ray | 2.48.0.post0 ⇒ 2.51.1 | |
| llama.cpp | b6356 ⇒ b6652 | |
| FlashInfer | 0.2.5 |
Detailed component changelogs#
verl 0.6.0#
This release adds support for ROCm 7.0.0 on AMD Instinct MI300X GPUs.
Ray 2.51.1#
This release adds support for ROCm 7.0.0 on AMD Instinct MI300X GPUs.
llama.cpp b6652#
This release enhances support on ROCm 7.0.0 with specific optimizations for AMD Instinct MI300X GPUs. It is also supported on MI325X and MI210 GPUs.
FlashInfer 0.2.5#
This release adds support for ROCm 7.1.1 on AMD Instinct MI325X and MI300X GPUs.