ROCm Core SDK 7.12.0 release notes

ROCm Core SDK 7.12.0 release notes#

2026-03-26

31 min read time

Applies to Linux and Windows

ROCm Core SDK 7.12.0 continues the technology preview release stream that began with ROCm 7.9.0, advancing the transition to the new TheRock build and release system. To learn more about TheRock, see ROCm Core SDK and TheRock Build System.

This release expands support for more AMD GPUs and APUs. Developers can expect a more consistent build experience and streamlined workflows that pave the way toward modular future ROCm releases planned for mid-2026.

Important

ROCm 7.12.0 follows the versioning discontinuity that began with 7.9.0 and remains separate from the 7.0 to 7.2 production releases. For the latest production stream release, see the ROCm documentation.

Maintaining parallel release streams – preview and production – gives users ample time to evaluate and adopt the new build system and dependency changes. The technology preview stream is planned to continue through mid‑2026, after which it will replace the current production stream.

Release highlights#

ROCm Core SDK 7.12.0 with TheRock builds upon the 7.11.0 release with several key enhancements:

Expanded AMD GPU support#

The ROCm 7.12.0 preview adds support for the following AMD GPUs and APUs:

AMD Instinct MI100
AMD Radeon RX 7700 XE
AMD Radeon RX 7600
AMD Ryzen AI 9 HX PRO 475
AMD Ryzen AI 9 HX PRO 470
AMD Ryzen AI 9 PRO 465
AMD Ryzen AI 7 PRO 450
AMD Ryzen AI 5 PRO 440
AMD Ryzen AI 5 PRO 435
AMD Ryzen 9 270
AMD Ryzen 7 260
AMD Ryzen 7 250
AMD Ryzen 5 240
AMD Ryzen 5 230
AMD Ryzen 5 220
AMD Ryzen 3 210

For the full list of supported hardware, see Hardware support.

Expanded Linux distribution support#

ROCm 7.12.0 adds support for Debian 12 with AMD Instinct GPUs.

For the full list of supported Linux distributions, see Operating system support.

Expanded GPU virtualization support for Instinct GPUs#

ROCm 7.12.0 adds support for the following KVM SR-IOV virtualization configurations on AMD Instinct MI355X and MI350X GPUs.

On MI355X: Ubuntu 24.04 host OS with RHEL 10.0 or RHEL 9.6 guest OS.
On MI350X: Ubuntu 24.04 host OS with RHEL 9.6 guest OS.

For details, see GPU virtualization support.

Added GPU partitioning support#

ROCm 7.12.0 adds support for the following compute partition and NUMA-per-socket (NPS) configurations on AMD Instinct GPUs in bare metal deployments.

Device	Compute partition mode	NPS mode	Deployment
Instinct MI355X, MI350X	CPX	NPS 2	Bare metal
Instinct MI355X, MI350X	DPX	NPS 2
Instinct MI300X	CPX	NPS 4
Instinct MI300X	DPX	NPS 2

Added Runfile installation method#

The ROCm Runfile Installer can install ROCm and/or the AMD GPU Driver (amdgpu) without using a native Linux package management system, making it ideal for systems with policy constraints or restricted environments. Network access is not needed for installation as long as dependencies for ROCm and/or AMD GPU driver (amdgpu) are met. A single installer supports all GFX architectures, automates post-installation configuration, and offers an interactive command-line GUI for guided setup.

For details, see the ROCm installation instructions.

Added rocSHMEM library to TheRock#

The rocSHMEM (ROCm OpenSHMEM) runtime provides GPU-centric networking through an OpenSHMEM-like interface. It simplifies application code complexity and enables finer communication and computation overlap than traditional host-driven networking.

rocSHMEM is supported on Linux on AMD Instinct, Radeon PRO, and Radeon GPUs. See the project in ROCm/rocm-systems for more information.

Expanded AI ecosystem support#

PyTorch 2.10.0 is now supported on Linux and Windows. PyTorch 2.7 support is no longer validated. See Install PyTorch.
JAX 0.8.2 and 0.8.0 are now built and distributed through TheRock on Linux. See Install JAX.
vLLM 0.16.0 wheels and Docker images are now available through AMD package repositories for select GFX architectures (gfx950, gfx942, gfx1200, gfx1201, and gfx1151) on Linux. See vLLM inference.

See AI ecosystem support for details.

ROCm profilers now support virtualized environments#

ROCprofiler-SDK, ROCm Systems Profiler (rocprofiler-systems), and ROCm Compute Profiler (rocprofiler-compute) now support performance profiling and analysis in KVM (kernel-based virtual machine) environments, enabling developers to profile GPU workloads running on virtualized infrastructure.

ROCm Optiq (Beta): ROCm Compute Profiler and ROCm Systems Profiler support#

ROCm Optiq (Beta) now adds rich visualization support for ROCm Compute Profiler data, significantly expanding its analysis capabilities beyond the previously introduced ROCm Systems Profiler support (also in Beta). See the ROCm Optiq release notes and 0.3.0 Beta documentation for details.

New views include high‑level summaries, kernel tables and charts, roofline analyses, detailed kernel and memory breakdowns, and system speed‑of‑light metrics—enabling developers to quickly identify compute‑ or memory‑bound bottlenecks and deeply analyze kernel performance through an interactive GUI on Windows and Linux.

ROCm Compute Profiler: introduced iteration multiplexing#

ROCm Compute Profiler (rocprofiler‑compute) now supports iteration multiplexing for large workloads. This enhancement enables the collection of the full set of hardware performance counters in a single profiling run, significantly reducing overall profiling time. Iteration multiplexing eliminates the need for application replay to gather extensive counter sets, which is often impractical for large or long‑running workloads. For smaller workloads with a limited number of kernel dispatches, existing pass‑reduction techniques remain recommended. For more details, see Iteration multiplexing.

ROCm Compute Profiler: isolate profiling output by MPI rank#

ROCm Compute Profiler (rocprofiler-compute) now supports isolating profiling output by MPI rank when profiling distributed workloads. If ranks are detected and no rank placeholder is specified in the output path, each rank automatically writes its results to a rank-named subdirectory, preventing output collisions and simplifying per-rank analysis. For more information, see Multi-rank profiling in the ROCm Compute Profiler documentation.

ROCm Compute Profiler: experimental Torch operator counter collection and tracing#

ROCm Compute Profiler (rocprofiler-compute) introduces experimental support for Torch operator-based counter collection and tracing. This feature enables profiling at the PyTorch operator level, allowing developers to correlate hardware performance counters with individual Torch operations and better understand the GPU performance characteristics of deep learning workloads. For more information, see Torch operator mapping in the ROCm Compute Profiler documentation.

Memory latency and derived counters now visible in ROCprof Compute Viewer#

You can now view memory latency and derived counters in ROCprof Compute Viewer, providing clearer insights into memory performance characteristics. This enhancement improves analysis and interpretation of memory-related bottlenecks.

ROCm Systems Profiler: added network performance metrics for Pensando AI NICs#

ROCm Systems Profiler (rocprofiler-systems) now surfaces key network metrics for Pensando AI NICs, including Congestion Notification Packets (CNPs sent and received) and bandwidth utilization as a percentage of peak throughput. For more information, see Network performance profiling in the ROCm Systems Profiler documentation.

ROCm Systems Profiler: added Triton workload profiling support#

ROCm Systems Profiler (rocprofiler-systems) now supports profiling Triton-based workloads, enabling detailed runtime tracing in distributed environments. This enhancement allows developers to correlate Triton framework activity with HIP runtime behavior, including CPU and GPU execution, memory usage, and communication patterns across multi-node jobs.

ROCm Systems Profiler: added preset profiles#

ROCm Systems Profiler (rocprofiler-systems) now includes preset profiles that automatically configure profiling settings for common workload scenarios using a single command-line flag. These presets provide optimized, pre-tuned configurations that reduce setup complexity, minimize profiling overhead, and ensure consistent behavior across general-purpose, workload-specific, and API tracing use cases. For more information, see Using preset profiles in the ROCm Systems Profiler documentation.

ROCm Systems Profiler: added OpenSHMEM and UCX tracing#

ROCm Systems Profiler (rocprofiler-systems) now supports comprehensive OpenSHMEM and UCX tracing, providing deeper visibility into inter-node communication patterns and helping developers identify and diagnose communication inefficiencies in large-scale AI and HPC workloads. For more information, see how to profile OpenSHMEM and UCX using ROCm Systems Profiler.

ROCm Systems Profiler now supports attaching to running processes#

ROCm Systems Profiler (rocprofiler-systems) now supports attaching to and profiling an already running process using the new rocprof-sys-attach utility. This capability enables profiling of long-running applications, services, or externally launched jobs without requiring a restart, making it easier to capture performance data for specific runtime phases. Attached profiling provides detailed insights while the application continues to run, supporting dynamic and flexible performance analysis workflows. For more information, see Attaching to a running process.

ROCprofiler-SDK and rocprofv3: expanded Ryzen AI profiling support#

ROCprofiler-SDK and rocprofv3 now enable profiling on Ryzen AI Max 395, 390, and 385, Ryzen AI 7 350, 340, and 330, and Ryzen AI 7 400, extending performance analysis capabilities to the latest Ryzen AI platforms on Linux.

ROCprofiler-SDK: rocprofiler_force_configure() enables late-start profiling#

The rocprofiler_force_configure() API now automatically detects already-initialized HSA and HIP runtimes, enabling late-start profiling without requiring application restarts. This enhancement supports profiling for applications that dynamically load tools at runtime, use plugin architectures where the ROCprofiler-SDK is loaded after GPU initialization, or need to attach to already running GPU workloads.

ROCprofiler-SDK now exposes process attach and detach as a public API#

The rocattach.so library enables attaching to and detaching from running processes using ptrace-based control and lifecycle synchronization. The tool_attach and tool_detach entry points allow any rocprofiler-SDK tool library to integrate into the attach and detach workflow. This functionality is now exposed as a public API, allowing ROCprofiler-SDK users to incorporate custom tool libraries into the attach and detach workflow without re-implementing this logic.

Compatibility notices#

In terms of package compatibility, ROCm 7.12.0 diverges from the existing ROCm 7.0 production stream and future stable releases in that stream:

Compute-focused: ROCm 7.12.0 enables support for primarily compute workloads. Future releases will support mixed workloads (compute and graphics).

Graphics applications that rely on the ROCm stack are not fully supported with this release. For users running graphics applications alongside ROCm 7.12.0, use the inbox Mesa user mode driver. Do not manually install the Mesa user mode driver.
No upgrade path from existing production releases including ROCm 7.2.1 and earlier, as well as from upcoming stable releases. See the explanatory note.
Not intended for production workloads: users running production environments should continue using the ROCm 7.0 stream. See the explanatory note.
Not fully featured: this release is a stepping stone toward fully open software development.
Limited hardware support: preview releases are only supported on some AMD Instinct GPUs, Radeon GPUs, and Ryzen APUs. See Supported hardware and operating systems.
Software components: additional components are planned to be introduced in future preview releases as part of the ROCm Core SDK. Other libraries and tools not included in the future Core SDK will either be:
- Released as standalone project-specific packages, or
- Grouped into domain-specific toolkits.

Hardware support#

The following table lists supported AMD Instinct GPUs, Radeon GPUs, and Ryzen APUs. Each supported device is listed with its corresponding GPU microarchitecture and LLVM target.

Note

If your GPU is not listed, it might be community-enabled through TheRock nightly builds. For more information, see TheRock supported GPUs. For installation guidance, see TheRock releases.

Instinct

Device series	Device	LLVM target	Architecture
AMD Instinct MI350 Series	Instinct MI355X Instinct MI350X	gfx950	CDNA 4
AMD Instinct MI300 Series	Instinct MI325X Instinct MI300X Instinct MI300A	gfx942	CDNA 3
AMD Instinct MI200 Series	Instinct MI250X Instinct MI250 Instinct MI210	gfx90a	CDNA 2
AMD Instinct MI100 Series	Instinct MI100	gfx908	CDNA

Radeon PRO

AMD device series	Device	LLVM target	Architecture
Radeon AI PRO R9000 Series	Radeon AI PRO R9700 Radeon AI PRO R9600D	gfx1201	RDNA 4
Radeon PRO W7000 Series	Radeon PRO W7900 Dual Slot Radeon PRO W7900 Radeon PRO W7800 48GB Radeon PRO W7800	gfx1100	RDNA 3
Radeon PRO W7000 Series	Radeon PRO W7700	gfx1101
Radeon PRO V Series	Radeon PRO V710	gfx1101

Radeon

AMD device series	Device	LLVM target	Architecture
Radeon RX 9000 Series	Radeon RX 9070 XT Radeon RX 9070 GRE Radeon RX 9070	gfx1201	RDNA 4
Radeon RX 9000 Series	Radeon RX 9060 XT LP Radeon RX 9060 XT Radeon RX 9060	gfx1200	RDNA 4
Radeon RX 7000 Series	Radeon RX 7900 XTX Radeon RX 7900 XT Radeon RX 7900 GRE	gfx1100	RDNA 3
	Radeon RX 7800 XT Radeon RX 7700 XT Radeon RX 7700 XE Radeon RX 7700	gfx1101
	Radeon RX 7600	gfx1102

Ryzen

AMD device series	Device	LLVM target	Architecture
Ryzen AI Max PRO 300 Series	Ryzen AI Max+ PRO 395 Ryzen AI Max PRO 390 Ryzen AI Max PRO 385 Ryzen AI Max PRO 380	gfx1151	RDNA 3.5
Ryzen AI Max 300 Series	Ryzen AI Max+ 395 Ryzen AI Max 390 Ryzen AI Max 385	gfx1151
Ryzen AI PRO 400 Series	Ryzen AI 9 HX PRO 475 Ryzen AI 9 HX PRO 470 Ryzen AI 9 PRO 465 Ryzen AI 7 PRO 450 Ryzen AI 5 PRO 440 Ryzen AI 5 PRO 435	gfx1150
Ryzen AI 300 Series	Ryzen AI 9 HX 375 Ryzen AI 9 HX 370 Ryzen AI 9 365	gfx1150
Ryzen 200 Series	Ryzen 9 270 Ryzen 7 260 Ryzen 7 250 Ryzen 5 240 Ryzen 5 230 Ryzen 5 220 Ryzen 3 210	gfx1103	RDNA 3

Note

This preview release supports a limited number of GPU and APUs. Hardware support will be expanded with future releases, following a six-week release cadence.

Operating system support#

ROCm supports the following Linux distribution and Microsoft Windows versions. If you’re running ROCm on Linux, ensure your system is using a supported kernel version. Future preview releases will expand operating system support coverage.

Important

The following table is a general overview of supported OSes. Actual support might vary by GPU. Use the Compatibility matrix to verify support for your specific setup before installation.

Instinct

Linux distribution	Supported versions	Linux kernel version
Ubuntu	24.04.3	GA 6.8
Ubuntu	22.04.5	GA 5.15
Debian	13	6.12
Debian	12	6.1.0
Red Hat Enterprise Linux (RHEL)	10.1	6.12.0-124
	10.0	6.12.0-55
	9.7	5.14.0-611
	9.6	5.14.0-570
	9.4	5.14.0-427
	8.10	4.18.0-553
Oracle Linux	10	UEK 8.1
	9	UEK 8
	8	UEK 7
Rocky Linux	9	5.14.0-570
SUSE Linux Enterprise Server (SLES)	16.0	6.12
SUSE Linux Enterprise Server (SLES)	15.7	6.4.0-150700.51

Radeon PRO

Operating system	Supported versions	Linux kernel version
Ubuntu	24.04.3	GA 6.8
Ubuntu	22.04.5	GA 5.15
Red Hat Enterprise Linux (RHEL)	10.1	6.12.0-124
Red Hat Enterprise Linux (RHEL)	9.7	5.14.0-611
Windows	11 25H2	—

Radeon

Operating system	Supported versions	Linux kernel version
Ubuntu	24.04.3	GA 6.8
Ubuntu	22.04.5	GA 5.15
Red Hat Enterprise Linux (RHEL)	10.1	6.12.0-124
Red Hat Enterprise Linux (RHEL)	9.7	5.14.0-611
Windows	11 25H2	—

Ryzen

Operating system	Supported versions	Linux kernel version
Ubuntu	24.04.3	HWE 6.14
Windows	11 25H2	—

Kernel driver and firmware bundle support#

ROCm requires a coordinated stack of compatible firmware, driver, and user space components. Maintaining version alignment between these layers ensures correct GPU operation and performance, especially for AMD data center products. While AMD publishes the AMD GPU driver and ROCm user space components, your server OEM or infrastructure provider distributes the firmware packages. AMD supplies those firmware images (PLDM bundles), which the OEM integrates and distributes.

Instinct

AMD device	Firmware	Linux driver
Instinct MI355X	PLDM bundle 01.25.17.07, 01.25.16.03	AMD GPU Driver (amdgpu) 31.20.0 31.10.0 30.20.1 30.20.0 30.10.2 30.10.1 30.10.0
Instinct MI350X	PLDM bundle 01.25.17.07, 01.25.16.03
Instinct MI325X	PLDM bundle 01.25.04.02
Instinct MI300X	PLDM bundle 01.25.03.12
Instinct MI300A	BKC 26.1
Instinct MI250X	IFWI 75 (or later)
Instinct MI250	Maintenance update 5 with IFWI 75 (or later)
Instinct MI210	Maintenance update 5 with IFWI 75 (or later)
Instinct MI100	VBIOS D3430401-037

Radeon PRO

AMD device	Linux driver	Windows driver
Radeon AI PRO R9700	AMD GPU Driver (amdgpu) 31.20.0 31.10.0 30.20.1 30.20.0 30.10.2 30.10.1 30.10.0	AMD Software: Adrenalin Edition 26.3.1
Radeon AI PRO R9600D		—
Radeon PRO W7900 Dual Slot		AMD Software: Adrenalin Edition 26.3.1
Radeon PRO W7900
Radeon PRO W7800 48GB
Radeon PRO W7800
Radeon PRO W7700
Radeon PRO V710		—

Radeon

AMD device	Linux driver	Windows driver
Radeon RX 9070 XT	AMD GPU Driver (amdgpu) 31.20.0 31.10.0 30.20.1 30.20.0 30.10.2 30.10.1 30.10.0	—
Radeon RX 9070 GRE
Radeon RX 9070
Radeon RX 9060 XT LP
Radeon RX 9060 XT
Radeon RX 9060
Radeon RX 7900 XTX		AMD Software: Adrenalin Edition 26.3.1
Radeon RX 7900 XT
Radeon RX 7900 GRE
Radeon RX 7800 XT
Radeon RX 7700 XT
Radeon RX 7700 XE
Radeon RX 7700		—
Radeon RX 7600		—

Ryzen

AMD device	Linux driver	Windows driver
Ryzen AI Max+ PRO 395	Inbox kernel driver in Ubuntu 24.04.3	AMD Software: Adrenalin Edition 26.3.1
Ryzen AI Max PRO 390
Ryzen AI Max PRO 385
Ryzen AI Max PRO 380
Ryzen AI Max+ 395
Ryzen AI Max 390
Ryzen AI Max 385
Ryzen AI 9 HX 375
Ryzen AI 9 HX 370
Ryzen AI 9 365
Ryzen AI 9 HX PRO 475
Ryzen AI 9 HX PRO 470
Ryzen AI 9 PRO 465
Ryzen AI 7 PRO 450
Ryzen AI 5 PRO 440
Ryzen AI 5 PRO 435
Ryzen 9 270
Ryzen 7 260
Ryzen 7 250
Ryzen 5 240
Ryzen 5 230
Ryzen 5 220
Ryzen 3 210

GPU virtualization support#

AMD Instinct data center GPUs support virtualization in the following configurations. Supported SR-IOV configurations require the AMD GPU Virtualization Driver (GIM) 8.7.1K – see the AMD Instinct Virtualization Driver documentation for more information.

AMD GPU	Hypervisor	Virtualization technology	Virtualization driver	Host OS	Guest OS
Instinct MI355X	KVM	Passthrough	—	Ubuntu 24.04	Ubuntu 24.04
		SR-IOV	GIM 8.7.1K		Ubuntu 24.04
					RHEL 10.0
					RHEL 9.6
Instinct MI350X	KVM	Passthrough	—	Ubuntu 24.04	Ubuntu 24.04
		SR-IOV	GIM 8.7.1K		Ubuntu 24.04
		SR-IOV	GIM 8.7.1K		RHEL 9.6
Instinct MI325X	KVM	SR-IOV	GIM 8.7.1K	Ubuntu 22.04	Ubuntu 22.04
Instinct MI300X	KVM	Passthrough	—	Ubuntu 22.04	Ubuntu 22.04
Instinct MI300X	KVM	SR-IOV	GIM 8.7.1K	Ubuntu 22.04	Ubuntu 22.04

AI ecosystem support#

ROCm 7.12.0 provides optimized support for popular deep learning frameworks and AI inference engines. The following table lists supported frameworks and libraries, their compatible operating systems, and validated versions.

Framework	Supported versions	Supported OS	Supported Python versions
PyTorch	2.10.0, 2.9.1, 2.8.0	Linux	3.13, 3.12, 3.11
PyTorch	2.10.0, 2.9.1	Windows	3.13, 3.12, 3.11
JAX	0.8.2, 0.8.0	Linux	3.14, 3.13, 3.12, 3.11
vLLM (gfx950, gfx942, gfx1200, gfx1201, gfx1151 GPUs only)	0.16.0	Linux	3.12 (requires PyTorch 2.9.1)

ROCm Core SDK components#

The following table lists tools and libraries included in the ROCm 7.12.0 release. Expect future releases to expand the list of components.

Important

The following table is a general overview of ROCm Core SDK components. Actual support for these libraries and tools might vary by GPU and OS. Use the Compatibility matrix to verify support for your specific setup.

Component group	Component name	Support
Math and compute libraries	Composable Kernel	Linux, Windows
	hipBLAS
	hipBLASLt
	hipCUB
	hipFFT
	hipRAND
	hipSOLVER
	hipSPARSE
	MIOpen
	rocBLAS
	rocFFT
	rocRAND
	rocSOLVER
	rocSPARSE
	rocPRIM
	rocThrust
	rocWMMA
	hipSPARSELt	Linux only (Instinct MI350, MI300 Series, Ryzen APUs)
Communication libraries	RCCL	Linux only
Communication libraries	rocSHMEM	Linux only (Instinct, Radeon PRO, Radeon)
Support libraries	ROCm CMake	Linux, Windows
Runtimes and compilers	HIP	Linux, Windows
	HIPIFY
	LLVM
	SPIRV-LLVM-Translator
	ROCr Runtime	Linux only
Profiling and debugging tools	ROCm Compute Profiler (rocprofiler-compute)	Linux only (Instinct)
	ROCm Systems Profiler (rocprofiler-systems)	Linux only (Instinct)
	ROCprofiler-SDK	Linux
	ROCdbgapi	Linux only (Instinct, Radeon PRO, Radeon)
	ROCm Debugger (ROCgdb)
	ROCr Debug Agent
Control and monitoring tools	AMD SMI	Linux only (Instinct, Radeon PRO, Radeon)
	hipinfo	Windows
	rocminfo	Linux only

Known issues#

The following are known issues identified in ROCm 7.12.0.

JAX GPU initialization might fail without AMD_COMGR_NAMESPACE set#

When running JAX with ROCm, symbol collisions can occur between the ROCm compiler infrastructure and other libraries. These collisions may prevent proper GPU initialization for JAX and can lead to crashes or cause JAX to silently fall back to CPU execution.

Set the environment variable AMD_COMGR_NAMESPACE=1 to isolate the ROCm compiler infrastructure’s symbol namespace and avoid these collisions.

export AMD_COMGR_NAMESPACE=1

JAX fails to initialize due to missing ROCm shared libraries#

A path resolution issue in the JAX environment prevents the loader from locating required ROCm SDK shared libraries, causing JAX GPU initialization to fail.

As a workaround, set LD_LIBRARY_PATH to include the ROCm SDK core library path before running JAX. Replace <python_version> with the Python version being used with JAX (3.14, 3.13, 3.12, or 3.11); for example:

export LD_LIBRARY_PATH=/opt/python/lib/<python_version>/site-packages/_rocm_sdk_core/lib:$LD_LIBRARY_PATH

To ensure JAX does not silently fallback to CPU execution, set JAX_PLATFORMS=rocm.

vLLM server fails to launch with ROCm 7.12 Docker image due to path failure#

A path resolution issue in the vLLM Docker environment prevents the loader from locating required ROCm SDK shared libraries. As a result, library lookups are redirected to an invalid or unexpected location, causing the vLLM server startup to fail.

As a workaround, before starting the vLLM server inside the ROCm 7.12 vLLM Docker container, set LD_LIBRARY_PATH to include the ROCm SDK core library path; for example:

export LD_LIBRARY_PATH=/opt/python/lib/python3.12/site-packages/_rocm_sdk_core/lib:$LD_LIBRARY_PATH

vLLM server fails to launch for models with tensor parallelism set to 8#

Launching the vLLM server might fail for models configured with tensor parallelism (--tensor-parallel-size 8 or tp=8), resulting in a custom_all_reduce_hip.cuh: invalid device pointer error. This issue will be fixed in a future release.

PyTorch DDP Gloo backend test might fail on AMD GPUs#

On AMD GPUs, the PyTorch Distributed Data Parallel (DDP) test test_ddp_apply_optim_in_backward_grad_as_bucket_view_false fails when using the Gloo backend. This issue affects correctness of distributed training flows that rely on this code path in PyTorch 2.8 when configured with Gloo.

As a workaround, use the NCCL backend instead of Gloo for multi-GPU distributed training using PyTorch 2.8. For example:

torch.distributed.init_process_group(backend="nccl", ...)

This issue will be fixed in a future release.

HIP kernel launch limit might be hit for some models#

With PyTorch 2.10, some models can hit the HIP kernel launch limit of 2³² kernel launches within a single process. When this limit is reached, further kernel launches fail. One known affected model is:

black-forest-labs/flux

This issue manifests as a HIP kernel launch error during model execution. This issue will be fixed in a future release.

Performance regression in specific MAD PyTorch models between 2.9 and 2.10#

On ROCm PyTorch 2.10, some MAD-based ImageNet training and wrapper models show a performance regression compared to ROCm PyTorch 2.9. The currently known affected models include:

pyt_torchimagenet_inceptionv3_training
pyt_torchimagenet_resnet50_training
pt2_resnet152_pywrapper

These workloads might run slower on 2.10 than on 2.9 under similar conditions. This issue will be fixed in a future release.

ROCm 7.12 validation with PyTorch unit tests is limited#

For ROCm 7.12, the validation coverage using the PyTorch unit test suite is limited. Only a subset of the full PyTorch unit tests has been executed and validated on this release.

Some torchaudio transforms cannot be exported with torch.jit.script#

The following torchaudio transforms fail to export with torch.jit.script due to missing TorchScript annotations and type compatibility issues:

FrequencyMasking
TimeMasking
DifferentiableFIR
RNNTLoss

Users cannot export torchaudio models containing these transforms to TorchScript, blocking deployment of optimized audio processing pipelines. This issue will be fixed in a future release.

As a workaround, build torchaudio from the rocm/audio branch, which includes the fix, or use eager mode execution instead of TorchScript.

PyTorch TestAutograd.test_multi_grad_all_hooks fails on Windows#

On Windows, the PyTorch sub-test TestAutograd.test_multi_grad_all_hooks fails during runtime compilation of a temporary C++ extension due to MSVC linker errors. This issue will be fixed in a future release.

TransferBench plugin fails to build for gfx1103#

Building rocm_bandwidth_test with --offload-arch=gfx1103 fails when compiling the TransferBench plugin. As a result, TransferBench-based builds might not complete successfully Ryzen 200 Series (gfx1103) APUs. This issue will be fixed in a future release.

HIP unit tests trigger a TDR event on Windows with gfx1103#

On Windows with Ryzen 200 Series (gfx1103) APUs, the HIP unit test Unit_hipStreamValue_Wait_Blocking - uint32_t triggers a Timeout Detection and Recovery (TDR) event. This causes the GPU driver to reset during test execution. This issue will be fixed in a future release.

PyTorch TestNN and RNN tests might fail on Windows with gfx1103#

On Windows systems using Ryzen 200 Series (gfx1103) APUs, some PyTorch TestNN and RNN tests might fail at runtime due to MIOpen HIPRTC compilation errors in Composable Kernel (CK) reductions. The failure occurs because the required CK_AMD_GPU_GFX* macros are not defined for gfx1103, resulting in HIPRTC_ERROR_COMPILATION and miopenStatusUnknownError. This issue will be fixed in a future release.

amd-smi reset -r help text is not updated#

The amd-smi reset --reload-driver (-r) command has been deprecated, but the help text is not updated to reflect the current CLI options.

You can use modprobe instead of amd-smi reset -r to unload and reload the AMD GPU driver:

modprobe -r amdgpu && modprobe amdgpu

rocWMMA header produces unknown type errors in HIP RTC#

Including rocwmma/rocwmma.hpp in HIP RTC (runtime compilation) contexts produces compiler errors such as unknown type name '__bf16' and unknown type name '__fp8_e4m3_fnuz'. This prevents using rocWMMA in HIP RTC workflows.

As a workaround, add typedef definitions for the missing types before including the rocWMMA header. For example:

#if defined(__HIPCC_RTC__)
typedef _BitInt(16) __bf16;
typedef _BitInt(8) __fp8_e4m3_fnuz;
typedef _BitInt(8) __fp8_e5m2_fnuz;
#endif
#include <rocwmma/rocwmma.hpp>

hipCUB DeviceMerge large-size stress test fails with OOM on gfx1150#

On gfx1150 APUs, the hipCUB DeviceMerge large-size stress test (MergeLargeSizeIterators) might fail with an out-of-memory (OOM) error when running ROCm 7.12.0. All standard DeviceMerge test cases pass; only the large-size stress configuration is affected. This issue will be fixed in a future release.

ROCm Debug Agent tests fail with “wave not found in queue” on gfx1150#

On Gorgan Point (gfx1150) APUs, ROCm Debug Agent tests may fail with a fatal wave not found in queue error. This occurs during debug API queue and wavefront tracking, causing rocm-dbgapi to terminate while processing shader debug events. This issue will be fixed in a future release.

rocprof-compute and rocprofv3-avail fail due to shared library not found#

Errors might occur when running rocprof-compute or rocprofv3-avail commands that require ROCm shared libraries. For example:

OSError: librocm_sysdeps_dw.so.1: cannot open shared object file: No such file or directory

As a workaround, add the ROCm system dependencies path to LD_LIBRARY_PATH before running the affected tools. Replace <ROCM_PATH> with your ROCm installation location:

export LD_LIBRARY_PATH=<ROCM_PATH>/lib/rocm_sysdeps/lib:$LD_LIBRARY_PATH

This issue will be fixed in a future release.

Training instability with custom-built hipBLASLt and tuned GEMMs on MI300 Series GPUs#

In partner-style validation of MLPerf DLRM DCN v2 training on Instinct MI300 Series GPUs (gfx942), a stack using PyTorch with a custom-built hipBLASLt that includes tuned GEMMs for that workload can experience training instability, with NaNs appearing after many iterations. The time-to-failure varies between runs. This issue can affect anyone mirroring that integration; typical ROCm-shipped stacks are not found to experience the same issue.

Use the ROCm-provided hipBLASLt and supported ROCm stack rather than an experimental or locally rebuilt hipBLASLt with additional GEMM tuning until a fix is released.

rocPRIM adjacent_find test hangs on Windows with Navi44#

On Windows with Navi44 GPUs, the adjacent_find unit test in rocPRIM hangs when running ROCm 7.11 or 7.12. It’s recommended to avoid running the adjacent_find unit test on Windows. This issue will be fixed in a future release.

MIOpen GPU_Find2Conv_FP32 tests might intermittently fail#

The GPU_Find2Conv_FP32.Find2ConvTest tests can intermittently fail when run in ROCm 7.12.0. This is not a new issue; it sometimes occurred in previous releases but became more frequent when the tests were converted from ctest to gtest. The failure depends on the order in which tests are executed. This issue will be fixed in a future release.

Multi-ROCm installation fails on RPM-based distros#

On RPM-based Linux distributions (including RHEL and SLES), installing ROCm 7.12 alongside an existing ROCm 7.11 installation using the amdrocm7.<...>-gfx<...> meta-packages can fail due to RPM file conflicts. This prevents side‑by‑side installation of ROCm 7.11 and 7.12 using the standard repositories and package names. This issue will be fixed in a future release.

Resolved issues#

The following notable issues have been fixed in ROCm 7.12.0.

ROCm debugging tools binaries now fully available#

Previously, ROCm debugging tools – ROCdbgapi, ROCgdb, and ROCr Debug Agent – were not available after installing using your Linux distribution’s package manager or using pip.

This issue has been resolved.

Multi-node RCCL tests could crash or hang on MI355X with AINIC NICs#

Previously, multi-node RCCL tests (such as alltoall_perf, allgather_perf, allreduce_perf, and reduce_scatter_perf) could crash intermittently or hang on Instinct MI355X GPUs when using AINIC NICs and the AINIC RoCE path (RCCL_AINIC_ROCE=1).

This issue has been resolved.

hipify-clang emitted spurious errors with CUDA 12.x#

Previously, when using hipify-clang with CUDA 12.x, the following messages could appear during hipification:

error: must pass in an explicit nvptx64 gpu architecture to 'ptxas'
error: must pass in an explicit nvptx64 gpu architecture to 'nvlink'

These were emitted by the CUDA detection phase, which unnecessarily invoked the CUDA device toolchain. The .hip files were still generated correctly and could be used normally.

This issue has been resolved.

Apex encountered crashes and segfaults with the TheRock build system#

Previously, Apex would encounter crashes, missing module errors, and segfaults related to the HIP runtime during testing with the TheRock build system.

This issue has been resolved.

MIOpen unit tests failed to find rocrand headers during runtime kernel compilation#

Previously, MIOpen unit tests could fail to find the rocrand_xorwow.h header during runtime compilation of certain kernels (for example, MIOpenSoftmaxAttn.cpp), resulting in HIPRTC_ERROR_COMPILATION. This was caused by missing runtime include-path configuration in TheRock artifacts: ROCm could be installed in arbitrary locations, and the rocrand headers were not reliably discoverable by HIPRTC/COMGR at runtime.

This issue has been resolved.

Upcoming changes#

Future preview releases will expand support for:

Additional ROCm Core SDK components
Domain-specific expansion toolkits (data science, life science, finance, simulation, and other HPC domains)
Extended AMD hardware support

ROCm Core SDK 7.12.0 release notes

Contents

ROCm Core SDK 7.12.0 release notes#

Release highlights#

Expanded AMD GPU support#

Expanded Linux distribution support#

Expanded GPU virtualization support for Instinct GPUs#

Added GPU partitioning support#

Added Runfile installation method#

Added rocSHMEM library to TheRock#

Expanded AI ecosystem support#

ROCm profilers now support virtualized environments#

ROCm Optiq (Beta): ROCm Compute Profiler and ROCm Systems Profiler support#

ROCm Compute Profiler: introduced iteration multiplexing#

ROCm Compute Profiler: isolate profiling output by MPI rank#

ROCm Compute Profiler: experimental Torch operator counter collection and tracing#

Memory latency and derived counters now visible in ROCprof Compute Viewer#

ROCm Systems Profiler: added network performance metrics for Pensando AI NICs#

ROCm Systems Profiler: added Triton workload profiling support#

ROCm Systems Profiler: added preset profiles#

ROCm Systems Profiler: added OpenSHMEM and UCX tracing#

ROCm Systems Profiler now supports attaching to running processes#

ROCprofiler-SDK and rocprofv3: expanded Ryzen AI profiling support#

ROCprofiler-SDK: rocprofiler_force_configure() enables late-start profiling#

ROCprofiler-SDK now exposes process attach and detach as a public API#

Compatibility notices#

Hardware support#

Operating system support#

Kernel driver and firmware bundle support#

GPU virtualization support#

AI ecosystem support#

ROCm Core SDK components#

Known issues#

JAX GPU initialization might fail without AMD_COMGR_NAMESPACE set#

JAX fails to initialize due to missing ROCm shared libraries#

vLLM server fails to launch with ROCm 7.12 Docker image due to path failure#

vLLM server fails to launch for models with tensor parallelism set to 8#

PyTorch DDP Gloo backend test might fail on AMD GPUs#

HIP kernel launch limit might be hit for some models#

Performance regression in specific MAD PyTorch models between 2.9 and 2.10#

ROCm 7.12 validation with PyTorch unit tests is limited#

Some torchaudio transforms cannot be exported with torch.jit.script#

PyTorch TestAutograd.test_multi_grad_all_hooks fails on Windows#

TransferBench plugin fails to build for gfx1103#

HIP unit tests trigger a TDR event on Windows with gfx1103#

PyTorch TestNN and RNN tests might fail on Windows with gfx1103#

amd-smi reset -r help text is not updated#

rocWMMA header produces unknown type errors in HIP RTC#

hipCUB DeviceMerge large-size stress test fails with OOM on gfx1150#

ROCm Debug Agent tests fail with “wave not found in queue” on gfx1150#

rocprof-compute and rocprofv3-avail fail due to shared library not found#

Training instability with custom-built hipBLASLt and tuned GEMMs on MI300 Series GPUs#

rocPRIM adjacent_find test hangs on Windows with Navi44#

MIOpen GPU_Find2Conv_FP32 tests might intermittently fail#

Multi-ROCm installation fails on RPM-based distros#

Resolved issues#

ROCm debugging tools binaries now fully available#

Multi-node RCCL tests could crash or hang on MI355X with AINIC NICs#

hipify-clang emitted spurious errors with CUDA 12.x#

Apex encountered crashes and segfaults with the TheRock build system#

MIOpen unit tests failed to find rocrand headers during runtime kernel compilation#

Upcoming changes#