ROCm 6.4.2 release notes

ROCm 6.4.2 release notes#

2025-07-21

19 min read time

Applies to Linux

The release notes provide a summary of notable changes since the previous ROCm release.

Release highlights
Operating system and hardware support changes
ROCm components versioning
Detailed component changes
ROCm known issues
ROCm resolved issues
ROCm upcoming changes

Note

If you’re using AMD Radeon™ PRO or Radeon GPUs in a workstation setting with a display connected, see the Use ROCm on Radeon GPUs documentation to verify compatibility and system requirements.

Release highlights#

The following are notable new features and improvements in ROCm 6.4.2. For changes to individual components, see Detailed component changes.

ROCm Compute Profiler enhancements#

ROCm Compute Profiler includes the following changes:

The --roofline-data-type option now supports FP8, FP16, BF16, FP32, FP64, I8, I32, and I64 data types. This is dependent on the GPU architecture. For more information, see Roofline options.
ROCm Compute Profiler now uses AMD SMI instead of ROCm SMI. The AMD System Management Interface Library (AMD SMI) is a successor to ROCm SMI. It is a unified system management interface tool that provides a user-space interface for applications to monitor and control GPU applications and gives users the ability to query information about drivers and GPUs on the system. For more information, see ROCm/amdsmi and the AMD SMI documentation.
ROCm Compute Profiler has added 8-bit floating point (FP8) metrics support for AMD Instinct MI300 series accelerators. For more information, see System Speed-of-Light.

rocSOLVER enhancements#

rocSOLVER has improved the performance of eigensolvers and singular value decomposition (SVD). For more information, see rocSOLVER documentation.

ROCm Offline Installer Creator updates#

The ROCm Offline Installer Creator 6.4.2 includes the following features and improvements:

Added support for Oracle Linux 8.10 and 9.6, and SLES 15 SP7.
Additional package options for the Offline Installer Creator, including amd-smi, rocdecode, rocjpeg, and rdc.
ROCm meta packages are now used for selecting ROCm components and use cases.
Improved separation of kernel/driver and ROCm prerequisite packages to reduce the size of ROCm-only or driver-only offline installers.

In addition, the option to build an offline installer based on ROCm version 5.7.3 has been removed. To build an offline installer for ROCm 5.7.3, use the Offline Installer Creator from version 6.4.1 or earlier. See ROCm Offline Installer Creator for more information.

ROCm Runfile Installer updates#

The ROCm Runfile Installer 6.4.2 adds support for Oracle Linux 8.10 and 9.6 (using the RHEL 8 or 9 .run files), Debian 12 (using the Ubuntu 22.04 .run file), and SLES 15 SP7. It also fixes permission settings issues during ROCm and AMDGPU driver installation. For more information, see ROCm Runfile Installer.

ROCm documentation updates#

ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.

Tutorials for AI developers have been expanded with the following four new tutorials:
- Inference tutorial: AI agent with MCPs using vLLM and PydanticAI
- GPU development and optimization tutorials:
For more information about the changes, see Changelog for the AI Developer Hub.
ROCm provides a comprehensive ecosystem for deep learning development. For more details, see Deep learning frameworks for ROCm. As of July 2025, AMD ROCm provides support for the following additional deep learning frameworks:
- Deep Graph Library is an easy-to-use, high-performance, and scalable Python package for deep learning on graphs. DGL is framework agnostic, meaning if a deep graph model is a component in an end-to-end application, the rest of the logic is implemented using PyTorch. It is currently supported on ROCm 6.4.0. For more information, see DGL compatibility.
- Stanford Megatron-LM is a large-scale language model training framework. It’s designed to train massive transformer-based language models efficiently by model and data parallelism. It is currently supported on ROCm 6.3.0. For more information, see Stanford Megatron-LM compatibility.
- Volcano Engine Reinforcement Learning for LLMs (verl) is a reinforcement learning framework designed for large language models (LLMs). verl offers a scalable, open-source fine-tuning solution optimized for AMD Instinct GPUs with full ROCm support. It is currently supported on ROCm 6.2.0. For more information, see verl compatibility.
Documentation for the new ROCprof Compute Viewer was added in May 2025. This tool is used to visualize and analyze GPU thread trace data collected using rocprofv3. Note that ROCprof Compute Viewer is in an early access state. Running production workloads is not recommended.
The AMDGPU installer documentation has been removed to encourage the use of the package manager for ROCm installation. While the package manager is the recommended method, you can still install ROCm using the AMDGPU installer by following the legacy process. Ensure to update the command with the intended ROCm version before running it. For more information, see Installation via native package manager.

Operating system and hardware support changes#

ROCm 6.4.2 adds support for SLES 15 SP7. For more information, see SLES installation.

ROCm 6.4.2 marks the end of support (EoS) for RHEL 9.5.

ROCm 6.4.2 adds support for RDNA3 architecture-based Radeon RX 7700 XT GPU. This GPU is supported on Ubuntu 24.04.2 and RHEL 9.6. For details, see the full list of Supported GPUs (Linux).

See the Compatibility matrix for more information about operating system and hardware compatibility.

ROCm components#

The following table lists the versions of ROCm components for ROCm 6.4.2, including any version changes from 6.4.1 to 6.4.2. Click the component’s updated version to go to a list of its changes. Click to go to the component’s source code on GitHub.

Category	Group	Name	Version
Libraries	Machine learning and computer vision	Composable Kernel	1.1.0
		MIGraphX	2.12.0
		MIOpen	3.4.0
		MIVisionX	3.2.0
		rocAL	2.2.0
		rocDecode	0.10.0
		rocJPEG	0.8.0
		rocPyDecode	0.3.1
		RPP	1.9.10
	Communication	RCCL	2.22.3 ⇒ 2.22.3
	Communication	rocSHMEM	2.0.0 ⇒ 2.0.1
	Math	hipBLAS	2.4.0
		hipBLASLt	0.12.1 ⇒ 0.12.1
		hipFFT	1.0.18
		hipfort	0.6.0
		hipRAND	2.12.0
		hipSOLVER	2.4.0
		hipSPARSE	3.2.0
		hipSPARSELt	0.2.3
		rocALUTION	3.2.3
		rocBLAS	4.4.0 ⇒ 4.4.1
		rocFFT	1.0.32
		rocRAND	3.3.0
		rocSOLVER	3.28.0 ⇒ 3.28.2
		rocSPARSE	3.4.0
		rocWMMA	1.7.0
		Tensile	4.43.0
	Primitives	hipCUB	3.4.0
		hipTensor	1.5.0
		rocPRIM	3.4.0 ⇒ 3.4.1
		rocThrust	3.3.0
Tools	System management	AMD SMI	25.4.2 ⇒ 25.5.1
		ROCm Data Center Tool	0.3.0
		rocminfo	1.0.0
		ROCm SMI	7.5.0
		ROCm Validation Suite	1.1.0 ⇒ 1.1.0
			Performance	ROCm Bandwidth Test	1.4.0
				ROCm Compute Profiler	3.1.0 ⇒ 3.1.1
ROCm Systems Profiler	1.0.1 ⇒ 1.0.2
ROCProfiler	2.0.0
ROCprofiler-SDK	0.6.0
ROCTracer	4.1.0
	Development	HIPIFY	19.0.0
		ROCdbgapi	0.77.2
		ROCm CMake	0.14.0
		ROCm Debugger (ROCgdb)	15.2
		ROCr Debug Agent	2.0.4
Compilers		HIPCC	1.1.1
Compilers		llvm-project	19.0.0
Runtimes		HIP	6.4.1 ⇒ 6.4.2
Runtimes		ROCr Runtime	1.15.0

Detailed component changes#

The following sections describe key changes to ROCm components.

Note

For a historical overview of ROCm component updates, see the ROCm consolidated changelog.

AMD SMI (25.5.1)#

Added#

Compute Unit Occupancy information per process.
Support for getting the GPU Board voltage.
New firmware PLDM_BUNDLE. amd-smi firmware can now show the PLDM Bundle on supported systems.
amd-smi ras --afid --cper-file <file_path> to decode CPER records.

Changed#

Padded asic_serial in amdsmi_get_asic_info with 0s.
Renamed field COMPUTE_PARTITION to ACCELERATOR_PARTITION in CLI call amd-smi --partition.

Resolved issues#

Corrected VRAM memory calculation in amdsmi_get_gpu_process_list. Previously, the VRAM memory usage reported by amdsmi_get_gpu_process_list was inaccurate and was calculated using KB instead of KiB.

Note

See the full AMD SMI changelog for details, examples, and in-depth descriptions.

HIP (6.4.2)#

Added#

HIP API implementation for hipEventRecordWithFlags, records an event in the specified stream with flags.
Support for the pointer attribute HIP_POINTER_ATTRIBUTE_CONTEXT.
Support for the flags hipEventWaitDefault and hipEventWaitExternal.

Optimized#

Improved implementation in hipEventSynchronize, HIP runtime now makes internal callbacks as non-blocking operations to improve performance.

Resolved issues#

Issue of dependency on libgcc-s1 during rocm-dev install on Debian Buster. HIP runtime removed this Debian package dependency, and uses libgcc1 instead for this distros.
Building issue for COMGR dynamic load on Fedora and other Distros. HIP runtime now doesn’t link against libamd_comgr.so.
Failure in the API hipStreamDestroy, when stream type is hipStreamLegacy. The API now returns error code hipErrorInvalidResourceHandle on this condition.
Kernel launch errors, such as shared object initialization failed, invalid device function or kernel execution failure. HIP runtime now loads COMGR properly considering the file with its name and mapped image.
Memory access fault in some applications. HIP runtime fixed offset accumulation in memory address.
The memory leak in virtual memory management (VMM). HIP runtime now uses the size of handle for allocated memory range instead of actual size for physical memory, which fixed the issue of address clash with VMM.
Large memory allocation issue. HIP runtime now checks GPU video RAM and system RAM properly and sets size limits during memory allocation either on the host or the GPU device.
Support of hipDeviceMallocContiguous flags in hipExtMallocWithFlags(). It now enables HSA_AMD_MEMORY_POOL_CONTIGUOUS_FLAG in the memory pool allocation on GPU device.
Radom memory segmentation fault in handling GraphExec object release and hipDeviceSyncronization. HIP runtime now uses internal device synchronize function in __hipUnregisterFatBinary.

hipBLASLt (0.12.1)#

Added#

Support for gfx1151 on Linux, complementing the previous support in the HIP SDK for Windows.

RCCL (2.22.3)#

Added#

Added support for the LL128 protocol on gfx942.

rocBLAS (4.4.1)#

Resolved issues#

rocBLAS might have failed to produce correct results for cherk/zherk on gfx90a/gfx942 with problem sizes k > 500 due to the imaginary portion on the C matrix diagonal not being zeros. rocBLAS now zeros the imaginary portion.

ROCm Compute Profiler (3.1.1)#

Added#

8-bit floating point (FP8) metrics support for AMD Instinct MI300 GPUs.
Additional data types for roofline: FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on the GPU architecture).
Data type selection option --roofline-data-type / -R for roofline profiling. The default data type is FP32.

Changed#

Changed dependency from rocm-smi to amd-smi.

Resolved issues#

Fixed a crash related to Agent ID caused by the new format of the rocprofv3 output CSV file.

ROCm Systems Profiler (1.0.2)#

Optimized#

Improved readability of the OpenMP target offload traces by showing on a single Perfetto track.

Resolved issues#

Fixed the file path to the script that merges Perfetto files from multi-process MPI runs. The script has also been renamed from merge-multiprocess-output.sh to rocprof-sys-merge-output.sh.

ROCm Validation Suite (1.1.0)#

Added#

NPS2/DPX and NPS4/CPX partition modes support for AMD Instinct MI300X.

rocPRIM (3.4.1)#

Upcoming changes#

Changes to the template parameters of warp and block algorithms will be made in an upcoming release.
Due to an upcoming compiler change, the following symbols related to warp size have been marked as deprecated and will be removed in an upcoming major release:
- rocprim::device_warp_size(). This has been replaced by rocprim::arch::wavefront::min_size() and rocprim::arch::wavefront::max_size() for compile-time constants. Use these when allocating global or shared memory. For run-time constants, use rocprim::arch::wavefront::size().
- rocprim::warp_size()
- ROCPRIM_WAVEFRONT_SIZE
The default scan accumulator types for device-level scan algorithms will be changed in an upcoming release, resulting in a breaking change. Previously, the default accumulator type was set to the input type for the inclusive scans and to the initial value type for the exclusive scans. This could lead to unexpected overflow if the input or initial type was smaller than the output type when the accumulator type wasn’t explicitly set using the AccType template parameter. The new default accumulator types will be set to the type that results when the input or initial value type is applied to the scan operator.

The following is the complete list of affected functions and how their default accumulator types are changing:
- rocprim::inclusive_scan
  - current default: class AccType = typename std::iterator_traits<InputIterator>::value_type>
  - future default: class AccType = rocprim::invoke_result_binary_op_t<typename std::iterator_traits<InputIterator>::value_type, BinaryFunction>
- rocprim::deterministic_inclusive_scan
  - current default: class AccType = typename std::iterator_traits<InputIterator>::value_type>
  - future default: class AccType = rocprim::invoke_result_binary_op_t<typename std::iterator_traits<InputIterator>::value_type, BinaryFunction>
- rocprim::exclusive_scan
  - current default: class AccType = detail::input_type_t<InitValueType>>
  - future default: class AccType = rocprim::invoke_result_binary_op_t<rocprim::detail::input_type_t<InitValueType>, BinaryFunction>
- rocprim::deterministic_exclusive_scan
  - current default: class AccType = detail::input_type_t<InitValueType>>
  - future default: class AccType = rocprim::invoke_result_binary_op_t<rocprim::detail::input_type_t<InitValueType>, BinaryFunction>
rocprim::load_cs and rocprim::store_cs are deprecated and will be removed in an upcoming release. Alternatively, you can use rocprim::load_nontemporal and rocprim::store_nontemporal to load and store values in specific conditions (like bypassing the cache) for rocprim::thread_load and rocprim::thread_store.

rocSHMEM (2.0.1)#

Resolved issues#

Incorrect output for rocshmem_ctx_my_pe and rocshmem_ctx_n_pes.
Multi-team errors by providing team specific buffers in rocshmem_ctx_wg_team_sync.
Missing implementation of rocshmem_g for IPC conduit.

rocSOLVER (3.28.2)#

Added#

Hybrid computation support for existing routines, such as STERF.
SVD for general matrices based on Cuppen’s Divide and Conquer algorithm:
- GESDD (with batched and strided_batched versions)

Optimized#

Reduced the device memory requirements for STEDC, SYEVD/HEEVD, and SYGVD/HEGVD.
Improved the performance of STEDC and divide and conquer Eigensolvers.
Improved the performance of SYTRD, the initial step of the Eigensolvers that start with the tridiagonalization of the input matrix.

ROCm known issues#

ROCm known issues are noted on GitHub. For known issues related to individual components, review the Detailed component changes.

ROCm resolved issues#

The following are previously known issues resolved in this release. For resolved issues related to individual components, review the Detailed component changes.

AMD SMI CLI: CPER entries not dumped continuously when using follow flag#

An issue where CPER entries were not streamed continuously as intended when using the --follow flag with amd-smi ras --cper has been resolved. See GitHub issue #4768.

Instinct MI300X reports incorrect raw GPU timestamps#

An issue where the command processor firmware reported incorrect raw GPU timestamps on MI300X accelerators has been resolved. See GitHub issue #4079.

MIOpen generates incorrect results for particular input with FP32 data type#

An issue where MIOpen generated incorrect results on the conv2dbackward function for a particular input with 32-bit floating point (FP32) data types has been resolved. The issue was only specific to FP32 data types with 2 * 2 kernel size and dilation 2 * 1. See GitHub issue #4606.

ROCm upcoming changes#

The following changes to the ROCm software stack are anticipated for future releases.

AMD SMI migration to AMDGPU driver repository#

In a future release, AMD SMI will be relocated from the ROCm organization repository to a new AMDTools repository to better align with its system-level functionality. amd-smi-lib will no longer be included in the rocm-developer-tools meta-package included with your standard ROCm installation. Instead, it will be packaged with the AMDGPU driver installation.

ROCm SMI deprecation#

ROCm SMI will be phased out in an upcoming ROCm release and will enter maintenance mode. After this transition, only critical bug fixes will be addressed and no further feature development will take place.

It’s strongly recommended to transition your projects to AMD SMI, the successor to ROCm SMI. AMD SMI includes all the features of the ROCm SMI and will continue to receive regular updates, new functionality, and ongoing support. For more information on AMD SMI, see the AMD SMI documentation.

ROCTracer, ROCProfiler, rocprof, and rocprofv2 deprecation#

Development and support for ROCTracer, ROCProfiler, rocprof, and rocprofv2 are being phased out in favor of ROCprofiler-SDK in upcoming ROCm releases. Starting with ROCm 6.4, only critical defect fixes will be addressed for older versions of the profiling tools and libraries. All users are encouraged to upgrade to the latest version of the ROCprofiler-SDK library and the (rocprofv3) tool to ensure continued support and access to new features. ROCprofiler-SDK is still in beta today and will be production-ready in a future ROCm release.

It’s anticipated that ROCTracer, ROCProfiler, rocprof, and rocprofv2 will reach end-of-life by future releases, aligning with Q1 of 2026.

AMDGPU wavefront size compiler macro deprecation#

Access to the wavefront size as a compile-time constant via the __AMDGCN_WAVEFRONT_SIZE and __AMDGCN_WAVEFRONT_SIZE__ macros or the constexpr warpSize variable is deprecated and will be disabled in a future release.

The __AMDGCN_WAVEFRONT_SIZE__ macro and __AMDGCN_WAVEFRONT_SIZE alias will be removed in an upcoming release. It is recommended to remove any use of this macro. For more information, see AMDGPU support.
warpSize will only be available as a non-constexpr variable. Where required, the wavefront size should be queried via the warpSize variable in device code, or via hipGetDeviceProperties in host code. Neither of these will result in a compile-time constant. For more information, see warpSize.
For cases where compile-time evaluation of the wavefront size cannot be avoided, uses of __AMDGCN_WAVEFRONT_SIZE, __AMDGCN_WAVEFRONT_SIZE__, or warpSize can be replaced with a user-defined macro or constexpr variable with the wavefront size(s) for the target hardware. For example:

   #if defined(__GFX9__)
   #define MY_MACRO_FOR_WAVEFRONT_SIZE 64
   #else
   #define MY_MACRO_FOR_WAVEFRONT_SIZE 32
   #endif

HIPCC Perl scripts deprecation#

The HIPCC Perl scripts (hipcc.pl and hipconfig.pl) will be removed in an upcoming release.

Changes to ROCm Object Tooling#

ROCm Object Tooling tools roc-obj-ls, roc-obj-extract, and roc-obj are deprecated in ROCm 6.4, and will be removed in a future release. Functionality has been added to the llvm-objdump --offloading tool option to extract all clang-offload-bundles into individual code objects found within the objects or executables passed as input. The llvm-objdump --offloading tool option also supports the --arch-name option, and only extracts code objects found with the specified target architecture. See llvm-objdump for more information.

HIP runtime API changes#

There are a number of upcoming changes planned for HIP runtime API in an upcoming major release that are not backward compatible with prior releases. Most of these changes increase alignment between HIP and CUDA APIs or behavior. Some of the upcoming changes are to clean up header files, remove namespace collision, and have a clear separation between hipRTC and HIP runtime. For more information, see HIP 7.0 Is Coming: What You Need to Know to Stay Ahead.

ROCm 6.4.2 release notes

Contents

ROCm 6.4.2 release notes#

Release highlights#

ROCm Compute Profiler enhancements#

rocSOLVER enhancements#

ROCm Offline Installer Creator updates#

ROCm Runfile Installer updates#

ROCm documentation updates#

Operating system and hardware support changes#

ROCm components#

Detailed component changes#

AMD SMI (25.5.1)#

Added#

Changed#

Resolved issues#

HIP (6.4.2)#

Added#

Optimized#

Resolved issues#

hipBLASLt (0.12.1)#

Added#

RCCL (2.22.3)#

Added#

rocBLAS (4.4.1)#

Resolved issues#

ROCm Compute Profiler (3.1.1)#

Added#

Changed#

Resolved issues#

ROCm Systems Profiler (1.0.2)#

Optimized#

Resolved issues#

ROCm Validation Suite (1.1.0)#

Added#

rocPRIM (3.4.1)#

Upcoming changes#

rocSHMEM (2.0.1)#

Resolved issues#

rocSOLVER (3.28.2)#

Added#

Optimized#

ROCm known issues#

ROCm resolved issues#

AMD SMI CLI: CPER entries not dumped continuously when using follow flag#

Instinct MI300X reports incorrect raw GPU timestamps#

MIOpen generates incorrect results for particular input with FP32 data type#

ROCm upcoming changes#

AMD SMI migration to AMDGPU driver repository#

ROCm SMI deprecation#

ROCTracer, ROCProfiler, rocprof, and rocprofv2 deprecation#

AMDGPU wavefront size compiler macro deprecation#

HIPCC Perl scripts deprecation#

Changes to ROCm Object Tooling#

HIP runtime API changes#