ROCm 6.4.2 release notes#
2025-07-21
19 min read time
The release notes provide a summary of notable changes since the previous ROCm release.
Note
If you’re using AMD Radeon™ PRO or Radeon GPUs in a workstation setting with a display connected, see the Use ROCm on Radeon GPUs documentation to verify compatibility and system requirements.
Release highlights#
The following are notable new features and improvements in ROCm 6.4.2. For changes to individual components, see Detailed component changes.
ROCm Compute Profiler enhancements#
ROCm Compute Profiler includes the following changes:
The
--roofline-data-typeoption now supports FP8, FP16, BF16, FP32, FP64, I8, I32, and I64 data types. This is dependent on the GPU architecture. For more information, see Roofline options.ROCm Compute Profiler now uses AMD SMI instead of ROCm SMI. The AMD System Management Interface Library (AMD SMI) is a successor to ROCm SMI. It is a unified system management interface tool that provides a user-space interface for applications to monitor and control GPU applications and gives users the ability to query information about drivers and GPUs on the system. For more information, see ROCm/amdsmi and the AMD SMI documentation.
ROCm Compute Profiler has added 8-bit floating point (FP8) metrics support for AMD Instinct MI300 series accelerators. For more information, see System Speed-of-Light.
rocSOLVER enhancements#
rocSOLVER has improved the performance of eigensolvers and singular value decomposition (SVD). For more information, see rocSOLVER documentation.
ROCm Offline Installer Creator updates#
The ROCm Offline Installer Creator 6.4.2 includes the following features and improvements:
Added support for Oracle Linux 8.10 and 9.6, and SLES 15 SP7.
Additional package options for the Offline Installer Creator, including
amd-smi,rocdecode,rocjpeg, andrdc.ROCm meta packages are now used for selecting ROCm components and use cases.
Improved separation of kernel/driver and ROCm prerequisite packages to reduce the size of ROCm-only or driver-only offline installers.
In addition, the option to build an offline installer based on ROCm version 5.7.3 has been removed. To build an offline installer for ROCm 5.7.3, use the Offline Installer Creator from version 6.4.1 or earlier. See ROCm Offline Installer Creator for more information.
ROCm Runfile Installer updates#
The ROCm Runfile Installer 6.4.2 adds support for Oracle Linux 8.10 and 9.6 (using the RHEL 8 or 9 .run files), Debian 12 (using the Ubuntu 22.04 .run file), and SLES 15 SP7. It also fixes permission settings issues during ROCm and AMDGPU driver installation. For more information, see ROCm Runfile Installer.
ROCm documentation updates#
ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.
Tutorials for AI developers have been expanded with the following four new tutorials:
Inference tutorial: AI agent with MCPs using vLLM and PydanticAI
GPU development and optimization tutorials:
For more information about the changes, see Changelog for the AI Developer Hub.
ROCm provides a comprehensive ecosystem for deep learning development. For more details, see Deep learning frameworks for ROCm. As of July 2025, AMD ROCm provides support for the following additional deep learning frameworks:
Deep Graph Library is an easy-to-use, high-performance, and scalable Python package for deep learning on graphs. DGL is framework agnostic, meaning if a deep graph model is a component in an end-to-end application, the rest of the logic is implemented using PyTorch. It is currently supported on ROCm 6.4.0. For more information, see DGL compatibility.
Stanford Megatron-LM is a large-scale language model training framework. It’s designed to train massive transformer-based language models efficiently by model and data parallelism. It is currently supported on ROCm 6.3.0. For more information, see Stanford Megatron-LM compatibility.
Volcano Engine Reinforcement Learning for LLMs (verl) is a reinforcement learning framework designed for large language models (LLMs). verl offers a scalable, open-source fine-tuning solution optimized for AMD Instinct GPUs with full ROCm support. It is currently supported on ROCm 6.2.0. For more information, see verl compatibility.
Documentation for the new ROCprof Compute Viewer was added in May 2025. This tool is used to visualize and analyze GPU thread trace data collected using rocprofv3. Note that ROCprof Compute Viewer is in an early access state. Running production workloads is not recommended.
The AMDGPU installer documentation has been removed to encourage the use of the package manager for ROCm installation. While the package manager is the recommended method, you can still install ROCm using the AMDGPU installer by following the legacy process. Ensure to update the command with the intended ROCm version before running it. For more information, see Installation via native package manager.
Operating system and hardware support changes#
ROCm 6.4.2 adds support for SLES 15 SP7. For more information, see SLES installation.
ROCm 6.4.2 marks the end of support (EoS) for RHEL 9.5.
ROCm 6.4.2 adds support for RDNA3 architecture-based Radeon RX 7700 XT GPU. This GPU is supported on Ubuntu 24.04.2 and RHEL 9.6. For details, see the full list of Supported GPUs (Linux).
See the Compatibility matrix for more information about operating system and hardware compatibility.
ROCm components#
The following table lists the versions of ROCm components for ROCm 6.4.2, including any version changes from 6.4.1 to 6.4.2. Click the component’s updated version to go to a list of its changes. Click to go to the component’s source code on GitHub.
| Category | Group | Name | Version | |
|---|---|---|---|---|
| Libraries | Machine learning and computer vision | Composable Kernel | 1.1.0 | |
| MIGraphX | 2.12.0 | |||
| MIOpen | 3.4.0 | |||
| MIVisionX | 3.2.0 | |||
| rocAL | 2.2.0 | |||
| rocDecode | 0.10.0 | |||
| rocJPEG | 0.8.0 | |||
| rocPyDecode | 0.3.1 | |||
| RPP | 1.9.10 | |||
| Communication | RCCL | 2.22.3 ⇒ 2.22.3 | ||
| rocSHMEM | 2.0.0 ⇒ 2.0.1 | |||
| Math | hipBLAS | 2.4.0 | ||
| hipBLASLt | 0.12.1 ⇒ 0.12.1 | |||
| hipFFT | 1.0.18 | |||
| hipfort | 0.6.0 | |||
| hipRAND | 2.12.0 | |||
| hipSOLVER | 2.4.0 | |||
| hipSPARSE | 3.2.0 | |||
| hipSPARSELt | 0.2.3 | |||
| rocALUTION | 3.2.3 | |||
| rocBLAS | 4.4.0 ⇒ 4.4.1 | |||
| rocFFT | 1.0.32 | |||
| rocRAND | 3.3.0 | |||
| rocSOLVER | 3.28.0 ⇒ 3.28.2 | |||
| rocSPARSE | 3.4.0 | |||
| rocWMMA | 1.7.0 | |||
| Tensile | 4.43.0 | |||
| Primitives | hipCUB | 3.4.0 | ||
| hipTensor | 1.5.0 | |||
| rocPRIM | 3.4.0 ⇒ 3.4.1 | |||
| rocThrust | 3.3.0 | |||
| Tools | System management | AMD SMI | 25.4.2 ⇒ 25.5.1 | |
| ROCm Data Center Tool | 0.3.0 | |||
| rocminfo | 1.0.0 | |||
| ROCm SMI | 7.5.0 | |||
| ROCm Validation Suite | 1.1.0 ⇒ 1.1.0 | |||
| Performance | ROCm Bandwidth Test | 1.4.0 | ||
| ROCm Compute Profiler | 3.1.0 ⇒ 3.1.1 | |||
| ROCm Systems Profiler | 1.0.1 ⇒ 1.0.2 | |||
| ROCProfiler | 2.0.0 | |||
| ROCprofiler-SDK | 0.6.0 | |||
| ROCTracer | 4.1.0 | |||
| Development | HIPIFY | 19.0.0 | ||
| ROCdbgapi | 0.77.2 | |||
| ROCm CMake | 0.14.0 | |||
| ROCm Debugger (ROCgdb) | 15.2 | |||
| ROCr Debug Agent | 2.0.4 | |||
| Compilers | HIPCC | 1.1.1 | ||
| llvm-project | 19.0.0 | |||
| Runtimes | HIP | 6.4.1 ⇒ 6.4.2 | ||
| ROCr Runtime | 1.15.0 | |||
Detailed component changes#
The following sections describe key changes to ROCm components.
Note
For a historical overview of ROCm component updates, see the ROCm consolidated changelog.
AMD SMI (25.5.1)#
Added#
Compute Unit Occupancy information per process.
Support for getting the GPU Board voltage.
New firmware PLDM_BUNDLE.
amd-smi firmwarecan now show the PLDM Bundle on supported systems.amd-smi ras --afid --cper-file <file_path>to decode CPER records.
Changed#
Padded
asic_serialinamdsmi_get_asic_infowith 0s.Renamed field
COMPUTE_PARTITIONtoACCELERATOR_PARTITIONin CLI callamd-smi --partition.
Resolved issues#
Corrected VRAM memory calculation in
amdsmi_get_gpu_process_list. Previously, the VRAM memory usage reported byamdsmi_get_gpu_process_listwas inaccurate and was calculated using KB instead of KiB.
Note
See the full AMD SMI changelog for details, examples, and in-depth descriptions.
HIP (6.4.2)#
Added#
HIP API implementation for
hipEventRecordWithFlags, records an event in the specified stream with flags.Support for the pointer attribute
HIP_POINTER_ATTRIBUTE_CONTEXT.Support for the flags
hipEventWaitDefaultandhipEventWaitExternal.
Optimized#
Improved implementation in
hipEventSynchronize, HIP runtime now makes internal callbacks as non-blocking operations to improve performance.
Resolved issues#
Issue of dependency on
libgcc-s1during rocm-dev install on Debian Buster. HIP runtime removed this Debian package dependency, and useslibgcc1instead for this distros.Building issue for
COMGRdynamic load on Fedora and other Distros. HIP runtime now doesn’t link againstlibamd_comgr.so.Failure in the API
hipStreamDestroy, when stream type ishipStreamLegacy. The API now returns error codehipErrorInvalidResourceHandleon this condition.Kernel launch errors, such as
shared object initialization failed,invalid device functionorkernel execution failure. HIP runtime now loadsCOMGRproperly considering the file with its name and mapped image.Memory access fault in some applications. HIP runtime fixed offset accumulation in memory address.
The memory leak in virtual memory management (VMM). HIP runtime now uses the size of handle for allocated memory range instead of actual size for physical memory, which fixed the issue of address clash with VMM.
Large memory allocation issue. HIP runtime now checks GPU video RAM and system RAM properly and sets size limits during memory allocation either on the host or the GPU device.
Support of
hipDeviceMallocContiguousflags inhipExtMallocWithFlags(). It now enablesHSA_AMD_MEMORY_POOL_CONTIGUOUS_FLAGin the memory pool allocation on GPU device.Radom memory segmentation fault in handling
GraphExecobject release andhipDeviceSyncronization. HIP runtime now uses internal device synchronize function in__hipUnregisterFatBinary.
hipBLASLt (0.12.1)#
Added#
Support for gfx1151 on Linux, complementing the previous support in the HIP SDK for Windows.
RCCL (2.22.3)#
Added#
Added support for the LL128 protocol on gfx942.
rocBLAS (4.4.1)#
Resolved issues#
rocBLAS might have failed to produce correct results for cherk/zherk on gfx90a/gfx942 with problem sizes k > 500 due to the imaginary portion on the C matrix diagonal not being zeros. rocBLAS now zeros the imaginary portion.
ROCm Compute Profiler (3.1.1)#
Added#
8-bit floating point (FP8) metrics support for AMD Instinct MI300 GPUs.
Additional data types for roofline: FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on the GPU architecture).
Data type selection option
--roofline-data-type / -Rfor roofline profiling. The default data type is FP32.
Changed#
Changed dependency from
rocm-smitoamd-smi.
Resolved issues#
Fixed a crash related to Agent ID caused by the new format of the
rocprofv3output CSV file.
ROCm Systems Profiler (1.0.2)#
Optimized#
Improved readability of the OpenMP target offload traces by showing on a single Perfetto track.
Resolved issues#
Fixed the file path to the script that merges Perfetto files from multi-process MPI runs. The script has also been renamed from
merge-multiprocess-output.shtorocprof-sys-merge-output.sh.
ROCm Validation Suite (1.1.0)#
Added#
NPS2/DPX and NPS4/CPX partition modes support for AMD Instinct MI300X.
rocPRIM (3.4.1)#
Upcoming changes#
Changes to the template parameters of warp and block algorithms will be made in an upcoming release.
Due to an upcoming compiler change, the following symbols related to warp size have been marked as deprecated and will be removed in an upcoming major release:
rocprim::device_warp_size(). This has been replaced byrocprim::arch::wavefront::min_size()androcprim::arch::wavefront::max_size()for compile-time constants. Use these when allocating global or shared memory. For run-time constants, userocprim::arch::wavefront::size().rocprim::warp_size()ROCPRIM_WAVEFRONT_SIZE
The default scan accumulator types for device-level scan algorithms will be changed in an upcoming release, resulting in a breaking change. Previously, the default accumulator type was set to the input type for the inclusive scans and to the initial value type for the exclusive scans. This could lead to unexpected overflow if the input or initial type was smaller than the output type when the accumulator type wasn’t explicitly set using the
AccTypetemplate parameter. The new default accumulator types will be set to the type that results when the input or initial value type is applied to the scan operator.The following is the complete list of affected functions and how their default accumulator types are changing:
rocprim::inclusive_scancurrent default:
class AccType = typename std::iterator_traits<InputIterator>::value_type>future default:
class AccType = rocprim::invoke_result_binary_op_t<typename std::iterator_traits<InputIterator>::value_type, BinaryFunction>
rocprim::deterministic_inclusive_scancurrent default:
class AccType = typename std::iterator_traits<InputIterator>::value_type>future default:
class AccType = rocprim::invoke_result_binary_op_t<typename std::iterator_traits<InputIterator>::value_type, BinaryFunction>
rocprim::exclusive_scancurrent default:
class AccType = detail::input_type_t<InitValueType>>future default:
class AccType = rocprim::invoke_result_binary_op_t<rocprim::detail::input_type_t<InitValueType>, BinaryFunction>
rocprim::deterministic_exclusive_scancurrent default:
class AccType = detail::input_type_t<InitValueType>>future default:
class AccType = rocprim::invoke_result_binary_op_t<rocprim::detail::input_type_t<InitValueType>, BinaryFunction>
rocprim::load_csandrocprim::store_csare deprecated and will be removed in an upcoming release. Alternatively, you can userocprim::load_nontemporalandrocprim::store_nontemporalto load and store values in specific conditions (like bypassing the cache) forrocprim::thread_loadandrocprim::thread_store.
rocSHMEM (2.0.1)#
Resolved issues#
Incorrect output for
rocshmem_ctx_my_peandrocshmem_ctx_n_pes.Multi-team errors by providing team specific buffers in
rocshmem_ctx_wg_team_sync.Missing implementation of
rocshmem_gfor IPC conduit.
rocSOLVER (3.28.2)#
Added#
Hybrid computation support for existing routines, such as STERF.
SVD for general matrices based on Cuppen’s Divide and Conquer algorithm:
GESDD (with batched and strided_batched versions)
Optimized#
Reduced the device memory requirements for STEDC, SYEVD/HEEVD, and SYGVD/HEGVD.
Improved the performance of STEDC and divide and conquer Eigensolvers.
Improved the performance of SYTRD, the initial step of the Eigensolvers that start with the tridiagonalization of the input matrix.
ROCm known issues#
ROCm known issues are noted on GitHub. For known issues related to individual components, review the Detailed component changes.
ROCm resolved issues#
The following are previously known issues resolved in this release. For resolved issues related to individual components, review the Detailed component changes.
AMD SMI CLI: CPER entries not dumped continuously when using follow flag#
An issue where CPER entries were not streamed continuously as intended when using the --follow flag with amd-smi ras --cper has been resolved. See GitHub issue #4768.
Instinct MI300X reports incorrect raw GPU timestamps#
An issue where the command processor firmware reported incorrect raw GPU timestamps on MI300X accelerators has been resolved. See GitHub issue #4079.
MIOpen generates incorrect results for particular input with FP32 data type#
An issue where MIOpen generated incorrect results on the conv2dbackward function for a particular input with 32-bit floating point (FP32) data types has been resolved. The issue was only specific to FP32 data types with 2 * 2 kernel size and dilation 2 * 1. See GitHub issue #4606.
ROCm upcoming changes#
The following changes to the ROCm software stack are anticipated for future releases.
AMD SMI migration to AMDGPU driver repository#
In a future release, AMD SMI will be relocated from the ROCm organization repository to a new AMDTools repository to better align with its system-level functionality. amd-smi-lib will no longer be included in the rocm-developer-tools meta-package included with your standard ROCm installation. Instead, it will be packaged with the AMDGPU driver installation.
ROCm SMI deprecation#
ROCm SMI will be phased out in an upcoming ROCm release and will enter maintenance mode. After this transition, only critical bug fixes will be addressed and no further feature development will take place.
It’s strongly recommended to transition your projects to AMD SMI, the successor to ROCm SMI. AMD SMI includes all the features of the ROCm SMI and will continue to receive regular updates, new functionality, and ongoing support. For more information on AMD SMI, see the AMD SMI documentation.
ROCTracer, ROCProfiler, rocprof, and rocprofv2 deprecation#
Development and support for ROCTracer, ROCProfiler, rocprof, and rocprofv2 are being phased out in favor of ROCprofiler-SDK in upcoming ROCm releases. Starting with ROCm 6.4, only critical defect fixes will be addressed for older versions of the profiling tools and libraries. All users are encouraged to upgrade to the latest version of the ROCprofiler-SDK library and the (rocprofv3) tool to ensure continued support and access to new features. ROCprofiler-SDK is still in beta today and will be production-ready in a future ROCm release.
It’s anticipated that ROCTracer, ROCProfiler, rocprof, and rocprofv2 will reach end-of-life by future releases, aligning with Q1 of 2026.
AMDGPU wavefront size compiler macro deprecation#
Access to the wavefront size as a compile-time constant via the __AMDGCN_WAVEFRONT_SIZE
and __AMDGCN_WAVEFRONT_SIZE__ macros or the constexpr warpSize variable is deprecated
and will be disabled in a future release.
The
__AMDGCN_WAVEFRONT_SIZE__macro and__AMDGCN_WAVEFRONT_SIZEalias will be removed in an upcoming release. It is recommended to remove any use of this macro. For more information, see AMDGPU support.warpSizewill only be available as a non-constexprvariable. Where required, the wavefront size should be queried via thewarpSizevariable in device code, or viahipGetDevicePropertiesin host code. Neither of these will result in a compile-time constant. For more information, see warpSize.For cases where compile-time evaluation of the wavefront size cannot be avoided, uses of
__AMDGCN_WAVEFRONT_SIZE,__AMDGCN_WAVEFRONT_SIZE__, orwarpSizecan be replaced with a user-defined macro orconstexprvariable with the wavefront size(s) for the target hardware. For example:
#if defined(__GFX9__)
#define MY_MACRO_FOR_WAVEFRONT_SIZE 64
#else
#define MY_MACRO_FOR_WAVEFRONT_SIZE 32
#endif
HIPCC Perl scripts deprecation#
The HIPCC Perl scripts (hipcc.pl and hipconfig.pl) will be removed in an upcoming release.
Changes to ROCm Object Tooling#
ROCm Object Tooling tools roc-obj-ls, roc-obj-extract, and roc-obj are
deprecated in ROCm 6.4, and will be removed in a future release. Functionality
has been added to the llvm-objdump --offloading tool option to extract all
clang-offload-bundles into individual code objects found within the objects
or executables passed as input. The llvm-objdump --offloading tool option also
supports the --arch-name option, and only extracts code objects found with
the specified target architecture. See llvm-objdump
for more information.
HIP runtime API changes#
There are a number of upcoming changes planned for HIP runtime API in an upcoming major release
that are not backward compatible with prior releases. Most of these changes increase
alignment between HIP and CUDA APIs or behavior. Some of the upcoming changes are to
clean up header files, remove namespace collision, and have a clear separation between
hipRTC and HIP runtime. For more information, see HIP 7.0 Is Coming: What You Need to Know to Stay Ahead.