ROCm 7.2.1 release notes#

2026-03-25

14 min read time

Applies to Linux

The release notes provide a summary of notable changes since the previous ROCm release.

Note

If you’re using AMD Radeon GPUs or Ryzen APUs in a workstation setting with a display connected, see the Use ROCm on Radeon and Ryzen documentation to verify compatibility and system requirements.

Release highlights#

The following are notable new features and improvements in ROCm 7.2.1. For changes to individual components, see Detailed component changes.

Supported hardware, operating system, and virtualization changes#

Hardware support remains unchanged in this release.

ROCm 7.2.1 adds support for Ubuntu 24.04.4 (kernel: 6.8 [GA], 6.17 [HWE]) and marks end of support (EoS) for Ubuntu 24.04.3. For more information, see Ubuntu installation.

For more information about:

Virtualization support#

Virtualization support remains unchanged in this release. For more information, see Virtualization support.

User space, driver, and firmware dependent changes#

The software for AMD Data Center GPU products requires maintaining a hardware and software stack with interdependencies among the GPU and baseboard firmware, AMD GPU drivers, and the ROCm user space software. While AMD publishes drivers and ROCm user space components, your server or infrastructure provider publishes the GPU and baseboard firmware by bundling AMD’s firmware releases via AMD’s Platform Level Data Model (PLDM) bundle, which includes the Integrated Firmware Image (IFWI).

GPU and baseboard firmware versioning might differ across GPU families.

ROCm Version

GPU

PLDM Bundle (Firmware)

AMD GPU Driver (amdgpu)

AMD GPU
Virtualization Driver (GIM)

ROCm 7.2.1 MI355X 01.26.00.02
01.25.17.07
01.25.16.03
30.30.1
30.30.0
30.20.1
30.20.0
30.10.2
30.10.1
30.10
8.7.1.K
MI350X 01.26.00.02
01.25.17.07
01.25.16.03
30.30.1
30.30.0
30.20.1
30.20.0
30.10.2
30.10.1
30.10
MI325X[1] 01.25.06.05
01.25.04.02
30.30.1
30.30.0
30.20.1
30.20.0[1]
30.10.2
30.10.1
30.10
6.4.z where z (0-3)
6.3.3
MI300X[2] 01.25.06.04
01.25.03.12
01.25.02.04
30.30.1
30.30.0
30.20.1
30.20.0
30.10.2
30.10.1
30.10
6.4.z where z (0–3)
6.3.3
8.7.1.K
MI300A BKC 26.1 Not Applicable
MI250X IFWI 47 (or later)
MI250 MU5 w/ IFWI 75 (or later)
MI210 MU5 w/ IFWI 75 (or later) 8.7.1.K
MI100 VBIOS D3430401-037 Not Applicable

[1]: For AMD Instinct MI325X KVM SR-IOV users, don't use AMD GPU driver (amdgpu) 30.20.0.

[2]: For AMD Instinct MI300X KVM SR-IOV with Multi-VF (8 VF) support requires a compatible firmware BKC bundle which will be released in coming months.

hipBLASLt updates#

hipBLASLt has improved performance for MXFP8 and MXFP4 GEMMs.

Deep learning and AI framework updates#

ROCm provides a comprehensive ecosystem for deep learning development. For more information, see Deep learning frameworks for ROCm and the Compatibility matrix for the complete list of Deep learning and AI framework versions tested for compatibility with ROCm. AMD ROCm has officially updated support for the following Deep learning and AI frameworks:

JAX#

ROCm 7.2.1 enables support for JAX 0.8.2. For more information, see JAX compatibility.

ROCm Offline Installer Creator discontinuation#

The ROCm Offline Installer Creator is discontinued in ROCm 7.2.1. Equivalent installation capabilities are available through the ROCm Runfile Installer, a self-extracting installer that is not based on OS package managers. For more information, see ROCm Runfile Installer.

ROCm documentation updates#

ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider range of user needs and use cases.

ROCm components#

The following table lists the versions of ROCm components for ROCm 7.2.1, including any version changes from 7.2.0 to 7.2.1. Click the component’s updated version to go to a list of its changes.

Click to go to the component’s source code on GitHub.

Category Group Name Version
Libraries Machine learning and computer vision Composable Kernel 1.2.0
MIGraphX 2.15.0
MIOpen 3.5.1
MIVisionX 3.5.0
rocAL 2.5.0
rocDecode 1.5.0 ⇒ 1.7.0
rocJPEG 1.3.0 ⇒ 1.4.0
rocPyDecode 0.8.0
RPP 2.2.0 ⇒ 2.2.1
Communication RCCL 2.27.7
rocSHMEM 3.2.0 ⇒ 3.2.0
Math hipBLAS 3.2.0
hipBLASLt 1.2.1 ⇒ 1.2.2
hipFFT 1.0.22
hipfort 0.7.1
hipRAND 3.1.0
hipSOLVER 3.2.0
hipSPARSE 4.2.0
hipSPARSELt 0.2.6
rocALUTION 4.1.0
rocBLAS 5.2.0
rocFFT 1.0.36
rocRAND 4.2.0
rocSOLVER 3.32.0
rocSPARSE 4.2.0
rocWMMA 2.2.0
Tensile 4.45.0
Primitives hipCUB 4.2.0
hipTensor 2.2.0
rocPRIM 4.2.0
rocThrust 4.2.0
Tools System management AMD SMI 26.2.1 ⇒ 26.2.2
ROCm Data Center Tool 1.2.0
rocminfo 1.0.0
ROCm SMI 7.8.0
ROCm Validation Suite 1.3.0
Performance ROCm Bandwidth Test 2.6.0
ROCm Compute Profiler 3.4.0
ROCm Systems Profiler 1.3.0
ROCProfiler 2.0.0
ROCprofiler-SDK 1.1.0
ROCTracer 4.1.0
Development HIPIFY 22.0.0
ROCdbgapi 0.77.4
ROCm CMake 0.14.0
ROCm Debugger (ROCgdb) 16.3
ROCr Debug Agent 2.1.0
Compilers HIPCC 1.1.1
llvm-project 22.0.0
Runtimes HIP 7.2.0 ⇒ 7.2.1
ROCr Runtime 1.18.0

Detailed component changes#

The following sections describe key changes to ROCm components.

Note

For a historical overview of ROCm component updates, see the ROCm consolidated changelog.

AMD SMI (26.2.2)#

Added#

  • GPU board and base board temperature sensors to amd-smi monitor command.

Resolved issues#

  • JSON output was not formatted correctly when using watch mode with metrics.

  • Output was not properly redirected to file when using JSON format.

  • CPER component output was not redirected when using the --follow option.

  • Invalid CPER files caused garbage output for AFID lists.

  • JSON output was not formatted correctly for reset commands.

HIP (7.2.1)#

Resolved issues#

  • Corrected the validation of stream capture in global‑capture mode. It is no longer affected by any thread‑local capture‑mode sequences occurring in other threads.

  • Corrected the return value of hipEventQuery and hipEventSynchronize. The HIP runtime now properly handles and restricts stream capture within these APIs.

  • Corrected an issue in the batch-dispatch doorbell for AQL packets to avoid a potential CPU hang.

  • To address potential delays in memory‑object destruction that could affect application logic, the HIP runtime disables memory‑object reference counting in direct‑dispatch mode.

Changed#

  • The AMD_DIRECT_DISPATCH environment variable has been deprecated in the HIP runtime.

hipBLASLt (1.2.2)#

Changed#

  • Enumeration value update for the Sigmoid Activation Function feature.

rocDecode (1.7.0)#

Upcoming changes#

  • The rocDecode GitHub repository will be officially moved to ROCm/rocm-systems in an upcoming release.

rocJPEG (1.4.0)#

Changed#

  • Bug fixes and performance improvements.

Upcoming changes#

  • The rocJPEG GitHub repository will be officially moved to ROCm/rocm-systems in an upcoming release.

rocSHMEM (3.2.0)#

Added#

  • Warnings to notify if large BAR is not available.

Resolved issues#

  • GDA Backend will disable itself when no GDA compatible NICs are available rather than crashing.

  • Fix memory coherency issues on gfx1201.

Known issues#

  • Only 64-bit rocSHMEM atomic APIs are implemented for the GDA conduit.

RPP (2.2.1)#

Added#

  • Error-code capture in test scripts for all C++ tests.

Optimized#

  • Optimized F16 variants by replacing scalar load/store operations with AVX2 intrinsics for spatter, log, blend, color_cast, flip, crop_mirror_normalize, and exposure kernels.

ROCm known issues#

ROCm known issues are noted on GitHub. For known issues related to individual components, review the Detailed component changes.

hipBLASLt performance regression for specific GEMM configurations#

You might observe a noticeable performance regression if you’re using hipBLASLt with the following GPUs for LLMs with specific GEMM configurations:

AMD Instinct MI300X and MI325X GPUs#

Affected GEMM configurations:

  • 16384 × 16384 × 6656 (BBS, TN)

  • 32768 × 8192 × 3072 (BBS, TN)

  • 9728 × 8192 × 65536 (F8F8S, TN)

AMD Instinct MI350 Series GPUs#

Affected GEMM configurations:

  • 4096 × 4096 × 1 × 8192

  • 4096 × 4096 × 1 × 16384

  • 8192 × 8192 × 1 × 8192

  • 8192 × 8192 × 1 × 16384

Due to this issue, you might also observe a slight increase in the test or inference time. This issue is resolved in the hipBLASLt develop branch and will be part of a future ROCm release.

Longer runtime for hipBLASLt GEMM operations on Instinct MI300X GPUs in partitioned mode#

GEMM operations using hipBLASLt might result in longer runtime on AMD Instinct MI300X GPUs configured in CPX or NPS4 partition mode (38 control units or CUs). This issue occurs when hipBLASLt fails to find applicable pre-tuned kernels. As a result, it performs an extensive kernel search, which increases both search time and the overall operation runtime. This issue is resolved in the hipBLASLt develop branch and will be part of a future ROCm release.

ROCm resolved issues#

The following are previously known issues resolved in this release. For resolved issues related to individual components, review the Detailed component changes.

Increased runtime latency of the HIP hipStreamCreate API#

As issue that resulted in doubling of the runtime latency of the HIP hipStreamCreate API has been resolved. See GitHub issue #5978.

ROCm upcoming changes#

The following changes to the ROCm software stack are anticipated for future releases.

ROCTracer, ROCProfiler, rocprof, and rocprofv2 deprecation#

ROCTracer, ROCProfiler, rocprof, and rocprofv2 are deprecated. It’s strongly recommended to upgrade to the latest version of the ROCprofiler-SDK library and the (rocprofv3) tool to ensure continued support and access to new features.

To learn about key feature improvements and benefits of ROCprofiler-SDK over the deprecated ROCProfiler and ROCTracer, see Comparing ROCprofiler-SDK to legacy ROCm profiling tools.

It’s anticipated that ROCTracer, ROCProfiler, rocprof, and rocprofv2 will reach end of support (EoS) by the end of 2026 Q2.

ROCm SMI deprecation#

ROCm SMI will be phased out in an upcoming ROCm release and will enter maintenance mode. After this transition, only critical bug fixes will be addressed and no further feature development will take place.

It’s strongly recommended to transition your projects to AMD SMI, the successor to ROCm SMI. AMD SMI includes all the features of the ROCm SMI and will continue to receive regular updates, new functionality, and ongoing support. For more information on AMD SMI, see the AMD SMI documentation.

Changes to ROCm Object Tooling#

ROCm Object Tooling tools roc-obj-ls, roc-obj-extract, and roc-obj were deprecated in ROCm 6.4, and will be removed in a future release. Functionality has been added to the llvm-objdump --offloading tool option to extract all clang-offload-bundles into individual code objects found within the objects or executables passed as input. The llvm-objdump --offloading tool option also supports the --arch-name option, and only extracts code objects found with the specified target architecture. See llvm-objdump for more information.