ROCm Core SDK 7.12.0 release notes#
2026-03-26
28 min read time
ROCm Core SDK 7.12.0 continues the technology preview release stream that began with ROCm 7.9.0, advancing the transition to the new TheRock build and release system. To learn more about TheRock, see ROCm Core SDK and TheRock Build System.
This release expands support for more AMD GPUs and APUs. Developers can expect a more consistent build experience and streamlined workflows that pave the way toward modular future ROCm releases planned for mid-2026.
Important
ROCm 7.12.0 follows the versioning discontinuity that began with 7.9.0 and remains separate from the 7.0 to 7.2 production releases. For the latest production stream release, see the ROCm documentation.
Maintaining parallel release streams – preview and production – gives users ample time to evaluate and adopt the new build system and dependency changes. The technology preview stream is planned to continue through mid‑2026, after which it will replace the current production stream.
Release highlights#
ROCm Core SDK 7.12.0 with TheRock builds upon the 7.11.0 release with several key enhancements:
Expanded AMD GPU support#
The ROCm 7.12.0 preview adds support for the following AMD GPUs and APUs:
AMD Instinct MI100
AMD Radeon RX 7700 XE
AMD Radeon RX 7600
AMD Ryzen AI 9 HX PRO 475
AMD Ryzen AI 9 HX PRO 470
AMD Ryzen AI 9 PRO 465
AMD Ryzen AI 7 PRO 450
AMD Ryzen AI 5 PRO 440
AMD Ryzen AI 5 PRO 435
AMD Ryzen 9 270
AMD Ryzen 7 260
AMD Ryzen 7 250
AMD Ryzen 5 240
AMD Ryzen 5 230
AMD Ryzen 5 220
AMD Ryzen 3 210
For the full list of supported hardware, see Hardware support.
Expanded Linux distribution support#
ROCm 7.12.0 adds support for Debian 12 with AMD Instinct GPUs.
For the full list of supported Linux distributions, see Operating system support.
Expanded GPU virtualization support for Instinct GPUs#
ROCm 7.12.0 adds support for the following KVM SR-IOV virtualization configurations on AMD Instinct MI355X and MI350X GPUs.
On MI355X: Ubuntu 24.04 host OS with RHEL 10.0 or RHEL 9.6 guest OS.
On MI350X: Ubuntu 24.04 host OS with RHEL 9.6 guest OS.
For details, see GPU virtualization support.
Added GPU partitioning support#
ROCm 7.12.0 adds support for the following compute partition and NUMA-per-socket (NPS) configurations on AMD Instinct GPUs in bare metal deployments.
Device |
Compute partition mode |
NPS mode |
Deployment |
|---|---|---|---|
|
Instinct MI355X, MI350X |
CPX |
NPS 2 |
Bare metal |
DPX |
NPS 2 |
||
|
Instinct MI300X |
CPX |
NPS 4 |
|
DPX |
NPS 2 |
Added Runfile installation method#
The ROCm Runfile Installer can install ROCm and/or the AMD GPU Driver (amdgpu) without using a native Linux package management system, making it ideal for systems with policy constraints or restricted environments. Network access is not needed for installation as long as dependencies for ROCm and/or AMD GPU driver (amdgpu) are met. A single installer supports all GFX architectures, automates post-installation configuration, and offers an interactive command-line GUI for guided setup.
For details, see the ROCm installation instructions.
Added rocSHMEM library to TheRock#
The rocSHMEM (ROCm OpenSHMEM) runtime provides GPU-centric networking through an OpenSHMEM-like interface. It simplifies application code complexity and enables finer communication and computation overlap than traditional host-driven networking.
rocSHMEM is supported on Linux on AMD Instinct, Radeon PRO, and Radeon GPUs. See the project in ROCm/rocm-systems for more information.
Expanded AI ecosystem support#
PyTorch 2.10.0 is now supported on Linux and Windows. PyTorch 2.7 support is no longer validated. See Install PyTorch.
JAX 0.8.2 and 0.8.0 are now built and distributed through TheRock on Linux. See Install JAX.
vLLM 0.16.0 wheels and Docker images are now available through AMD package repositories for select GFX architectures (gfx950, gfx942, gfx1200, gfx1201, and gfx1151) on Linux. See vLLM inference.
See AI ecosystem support for details.
ROCm profilers now support virtualized environments#
ROCprofiler-SDK, ROCm Systems Profiler (rocprofiler-systems), and ROCm Compute Profiler (rocprofiler-compute) now support performance profiling and analysis in KVM (kernel-based virtual machine) environments, enabling developers to profile GPU workloads running on virtualized infrastructure.
ROCm Compute Profiler: introduced iteration multiplexing#
ROCm Compute Profiler (rocprofiler‑compute) now supports iteration multiplexing for large workloads. This enhancement enables the collection of the full set of hardware performance counters in a single profiling run, significantly reducing overall profiling time. Iteration multiplexing eliminates the need for application replay to gather extensive counter sets, which is often impractical for large or long‑running workloads. For smaller workloads with a limited number of kernel dispatches, existing pass‑reduction techniques remain recommended. For more details, see Iteration multiplexing.
ROCm Compute Profiler: isolate profiling output by MPI rank#
ROCm Compute Profiler (rocprofiler-compute) now supports isolating profiling output by MPI rank when profiling distributed workloads. If ranks are detected and no rank placeholder is specified in the output path, each rank automatically writes its results to a rank-named subdirectory, preventing output collisions and simplifying per-rank analysis. For more information, see Multi-rank profiling in the ROCm Compute Profiler documentation.
ROCm Compute Profiler: experimental Torch operator counter collection and tracing#
ROCm Compute Profiler (rocprofiler-compute) introduces experimental support for Torch operator-based counter collection and tracing. This feature enables profiling at the PyTorch operator level, allowing developers to correlate hardware performance counters with individual Torch operations and better understand the GPU performance characteristics of deep learning workloads. For more information, see Torch operator mapping in the ROCm Compute Profiler documentation.
Memory latency and derived counters now visible in ROCprof Compute Viewer#
You can now view memory latency and derived counters in ROCprof Compute Viewer, providing clearer insights into memory performance characteristics. This enhancement improves analysis and interpretation of memory-related bottlenecks.
ROCm Systems Profiler: added network performance metrics for Pensando AI NICs#
ROCm Systems Profiler (rocprofiler-systems) now surfaces key network metrics for Pensando AI NICs, including Congestion Notification Packets (CNPs sent and received) and bandwidth utilization as a percentage of peak throughput. For more information, see Network performance profiling in the ROCm Systems Profiler documentation.
ROCm Systems Profiler: added Triton workload profiling support#
ROCm Systems Profiler (rocprofiler-systems) now supports profiling Triton-based workloads, enabling detailed runtime tracing in distributed environments. This enhancement allows developers to correlate Triton framework activity with HIP runtime behavior, including CPU and GPU execution, memory usage, and communication patterns across multi-node jobs.
ROCm Systems Profiler: added preset profiles#
ROCm Systems Profiler (rocprofiler-systems) now includes preset profiles that automatically configure profiling settings for common workload scenarios using a single command-line flag. These presets provide optimized, pre-tuned configurations that reduce setup complexity, minimize profiling overhead, and ensure consistent behavior across general-purpose, workload-specific, and API tracing use cases. For more information, see Using preset profiles in the ROCm Systems Profiler documentation.
ROCm Systems Profiler: added OpenSHMEM and UCX tracing#
ROCm Systems Profiler (rocprofiler-systems) now supports comprehensive OpenSHMEM and UCX tracing, providing deeper visibility into inter-node communication patterns and helping developers identify and diagnose communication inefficiencies in large-scale AI and HPC workloads. For more information, see how to profile OpenSHMEM and UCX using ROCm Systems Profiler.
ROCm Systems Profiler now supports attaching to running processes#
ROCm Systems Profiler (rocprofiler-systems) now supports attaching to and
profiling an already running process using the new rocprof-sys-attach
utility. This capability enables profiling of long-running applications,
services, or externally launched jobs without requiring a restart, making it
easier to capture performance data for specific runtime phases. Attached
profiling provides detailed insights while the application continues to run,
supporting dynamic and flexible performance analysis workflows. For more
information, see Attaching to a running
process.
ROCprofiler-SDK and rocprofv3: expanded Ryzen AI profiling support#
ROCprofiler-SDK and rocprofv3 now enable profiling on Ryzen AI Max 395,
390, and 385, Ryzen AI 7 350, 340, and 330, and Ryzen AI 7 400, extending
performance analysis capabilities to the latest Ryzen AI platforms on Linux.
ROCprofiler-SDK: rocprofiler_force_configure() enables late-start profiling#
The rocprofiler_force_configure() API now automatically detects
already-initialized HSA and HIP runtimes, enabling late-start profiling without
requiring application restarts. This enhancement supports profiling for
applications that dynamically load tools at runtime, use plugin architectures
where the ROCprofiler-SDK is loaded after GPU initialization, or need to attach
to already running GPU workloads.
ROCprofiler-SDK now exposes process attach and detach as a public API#
The rocattach.so library enables attaching to and detaching from running
processes using ptrace-based control and lifecycle synchronization. The
tool_attach and tool_detach entry points allow any rocprofiler-SDK tool
library to integrate into the attach and detach workflow. This functionality is
now exposed as a public API, allowing ROCprofiler-SDK users to incorporate
custom tool libraries into the attach and detach workflow without
re-implementing this logic.
Compatibility notices#
In terms of package compatibility, ROCm 7.12.0 diverges from the existing ROCm 7.0 production stream and future stable releases in that stream:
Compute-focused: ROCm 7.12.0 enables support for primarily compute workloads. Future releases will support mixed workloads (compute and graphics).
Graphics applications that rely on the ROCm stack are not fully supported with this release. For users running graphics applications alongside ROCm 7.12.0, use the inbox Mesa user mode driver. Do not manually install the Mesa user mode driver.
No upgrade path from existing production releases including ROCm 7.2.1 and earlier, as well as from upcoming stable releases. See the explanatory note.
Not intended for production workloads: users running production environments should continue using the ROCm 7.0 stream. See the explanatory note.
Not fully featured: this release is a stepping stone toward fully open software development.
Limited hardware support: preview releases are only supported on some AMD Instinct GPUs, Radeon GPUs, and Ryzen APUs. See Supported hardware and operating systems.
Software components: additional components are planned to be introduced in future preview releases as part of the ROCm Core SDK. Other libraries and tools not included in the future Core SDK will either be:
Released as standalone project-specific packages, or
Grouped into domain-specific toolkits.
Hardware support#
The following table lists supported AMD Instinct GPUs, Radeon GPUs, and Ryzen APUs. Each supported device is listed with its corresponding GPU microarchitecture and LLVM target.
|
Device series |
Device |
LLVM target |
Architecture |
|---|---|---|---|
| AMD Instinct MI350 Series |
gfx950 |
CDNA 4 | |
| AMD Instinct MI300 Series |
gfx942 |
CDNA 3 | |
| AMD Instinct MI200 Series |
gfx90a |
CDNA 2 | |
| AMD Instinct MI100 Series | Instinct MI100 |
gfx908 |
CDNA |
|
AMD device series |
Device |
LLVM target |
Architecture |
|---|---|---|---|
| Radeon AI PRO R9000 Series |
gfx1201 |
RDNA 4 | |
| Radeon PRO W7000 Series |
gfx1100 |
RDNA 3 | |
|
gfx1101 |
|||
| Radeon PRO V Series |
|
AMD device series |
Device |
LLVM target |
Architecture |
|---|---|---|---|
| Radeon RX 9000 Series |
Radeon RX 9070 GRE |
gfx1201 |
RDNA 4 |
|
gfx1200 |
|||
| Radeon RX 7000 Series |
gfx1100 |
RDNA 3 | |
|
Radeon RX 7700 XE |
gfx1101 |
||
|
gfx1102 |
|
AMD device series |
Device |
LLVM target |
Architecture |
|---|---|---|---|
| Ryzen AI Max PRO 300 Series |
gfx1151 |
RDNA 3.5 |
|
| Ryzen AI Max 300 Series | |||
| Ryzen AI PRO 400 Series |
gfx1150 |
||
| Ryzen AI 300 Series | |||
| Ryzen 200 Series |
gfx1103 |
RDNA 3 |
Note
This preview release supports a limited number of GPU and APUs. Hardware support will be expanded with future releases, following a six-week release cadence.
Operating system support#
ROCm supports the following Linux distribution and Microsoft Windows versions. If you’re running ROCm on Linux, ensure your system is using a supported kernel version. Future preview releases will expand operating system support coverage.
Important
The following table is a general overview of supported OSes. Actual support might vary by GPU. Use the Compatibility matrix to verify support for your specific setup before installation.
|
Linux distribution |
Supported versions |
Linux kernel version |
|---|---|---|
|
Ubuntu |
24.04.3 |
GA 6.8 |
|
22.04.5 |
GA 5.15 |
|
|
Debian |
13 |
6.12 |
|
12 |
6.1.0 |
|
|
Red Hat Enterprise Linux (RHEL) |
10.1 |
6.12.0-124 |
|
10.0 |
6.12.0-55 |
|
|
9.7 |
5.14.0-611 |
|
|
9.6 |
5.14.0-570 |
|
|
9.4 |
5.14.0-427 |
|
|
8.10 |
4.18.0-553 |
|
|
Oracle Linux |
10 |
UEK 8.1 |
|
9 |
UEK 8 |
|
|
8 |
UEK 7 |
|
|
Rocky Linux |
9 |
5.14.0-570 |
|
SUSE Linux Enterprise Server (SLES) |
16.0 |
6.12 |
|
15.7 |
6.4.0-150700.51 |
|
Operating system |
Supported versions |
Linux kernel version |
|---|---|---|
|
Ubuntu |
24.04.3 |
GA 6.8 |
|
22.04.5 |
GA 5.15 |
|
|
Red Hat Enterprise Linux (RHEL) |
10.1 |
6.12.0-124 |
|
9.7 |
5.14.0-611 |
|
|
Windows |
11 25H2 |
— |
|
Operating system |
Supported versions |
Linux kernel version |
|---|---|---|
|
Ubuntu |
24.04.3 |
GA 6.8 |
|
22.04.5 |
GA 5.15 |
|
|
Red Hat Enterprise Linux (RHEL) |
10.1 |
6.12.0-124 |
|
9.7 |
5.14.0-611 |
|
|
Windows |
11 25H2 |
— |
|
Operating system |
Supported versions |
Linux kernel version |
|---|---|---|
|
Ubuntu |
24.04.3 |
HWE 6.14 |
|
Windows |
11 25H2 |
— |
Kernel driver and firmware bundle support#
ROCm requires a coordinated stack of compatible firmware, driver, and user space components. Maintaining version alignment between these layers ensures correct GPU operation and performance, especially for AMD data center products. While AMD publishes the AMD GPU driver and ROCm user space components, your server OEM or infrastructure provider distributes the firmware packages. AMD supplies those firmware images (PLDM bundles), which the OEM integrates and distributes.
|
AMD device |
Firmware |
Linux driver |
|---|---|---|
|
Instinct MI355X |
PLDM bundle 01.25.17.07, 01.25.16.03 |
AMD GPU Driver (amdgpu) |
|
Instinct MI350X |
||
|
Instinct MI325X |
PLDM bundle 01.25.04.02 |
|
|
Instinct MI300X |
PLDM bundle 01.25.03.12 |
|
|
Instinct MI300A |
BKC 26.1 |
|
|
Instinct MI250X |
IFWI 75 (or later) |
|
|
Instinct MI250 |
Maintenance update 5 with IFWI 75 (or later) |
|
|
Instinct MI210 |
||
|
Instinct MI100 |
VBIOS D3430401-037 |
|
AMD device |
Linux driver |
Windows driver |
|---|---|---|
|
Radeon AI PRO R9700 |
AMD GPU Driver (amdgpu) |
AMD Software: Adrenalin Edition |
|
Radeon AI PRO R9600D |
— |
|
|
Radeon PRO W7900 Dual Slot |
AMD Software: Adrenalin Edition |
|
|
Radeon PRO W7900 |
||
|
Radeon PRO W7800 48GB |
||
|
Radeon PRO W7800 |
||
|
Radeon PRO W7700 |
||
|
Radeon PRO V710 |
— |
|
AMD device |
Linux driver |
Windows driver |
|---|---|---|
|
Radeon RX 9070 XT |
AMD GPU Driver (amdgpu) |
— |
|
Radeon RX 9070 GRE |
||
|
Radeon RX 9070 |
||
|
Radeon RX 9060 XT LP |
||
|
Radeon RX 9060 XT |
||
|
Radeon RX 9060 |
||
|
Radeon RX 7900 XTX |
AMD Software: Adrenalin Edition |
|
|
Radeon RX 7900 XT |
||
|
Radeon RX 7900 GRE |
||
|
Radeon RX 7800 XT |
||
|
Radeon RX 7700 XT |
||
|
Radeon RX 7700 XE |
||
|
Radeon RX 7700 |
— |
|
|
Radeon RX 7600 |
|
AMD device |
Linux driver |
Windows driver |
|---|---|---|
|
Ryzen AI Max+ PRO 395 |
Inbox kernel driver |
AMD Software: Adrenalin Edition |
|
Ryzen AI Max PRO 390 |
||
|
Ryzen AI Max PRO 385 |
||
|
Ryzen AI Max PRO 380 |
||
|
Ryzen AI Max+ 395 |
||
|
Ryzen AI Max 390 |
||
|
Ryzen AI Max 385 |
||
|
Ryzen AI 9 HX 375 |
||
|
Ryzen AI 9 HX 370 |
||
|
Ryzen AI 9 365 |
||
|
Ryzen AI 9 HX PRO 475 |
||
|
Ryzen AI 9 HX PRO 470 |
||
|
Ryzen AI 9 PRO 465 |
||
|
Ryzen AI 7 PRO 450 |
||
|
Ryzen AI 5 PRO 440 |
||
|
Ryzen AI 5 PRO 435 |
||
|
Ryzen 9 270 |
||
|
Ryzen 7 260 |
||
|
Ryzen 7 250 |
||
|
Ryzen 5 240 |
||
|
Ryzen 5 230 |
||
|
Ryzen 5 220 |
||
|
Ryzen 3 210 |
GPU virtualization support#
AMD Instinct data center GPUs support virtualization in the following configurations. Supported SR-IOV configurations require the AMD GPU Virtualization Driver (GIM) 8.7.1K – see the AMD Instinct Virtualization Driver documentation for more information.
|
AMD GPU |
Hypervisor |
Virtualization technology |
Virtualization driver |
Host OS |
Guest OS |
|---|---|---|---|---|---|
|
Instinct MI355X |
KVM |
Passthrough |
— |
Ubuntu 24.04 |
Ubuntu 24.04 |
|
SR-IOV |
GIM 8.7.1K |
Ubuntu 24.04 |
|||
|
RHEL 10.0 |
|||||
|
RHEL 9.6 |
|||||
|
Instinct MI350X |
KVM |
Passthrough |
— |
Ubuntu 24.04 |
Ubuntu 24.04 |
|
SR-IOV |
GIM 8.7.1K |
Ubuntu 24.04 |
|||
| RHEL 9.6 | |||||
|
Instinct MI325X |
KVM |
SR-IOV |
GIM 8.7.1K |
Ubuntu 22.04 |
Ubuntu 22.04 |
|
Instinct MI300X |
KVM |
Passthrough |
— |
Ubuntu 22.04 |
Ubuntu 22.04 |
|
SR-IOV |
GIM 8.7.1K |
AI ecosystem support#
ROCm 7.12.0 provides optimized support for popular deep learning frameworks and AI inference engines. The following table lists supported frameworks and libraries, their compatible operating systems, and validated versions.
|
Framework |
Supported versions |
Supported OS |
Supported Python versions |
|---|---|---|---|
|
PyTorch |
2.10.0, 2.9.1, 2.8.0 |
Linux |
3.13, 3.12, 3.11 |
|
2.10.0, 2.9.1 |
Windows |
||
|
JAX |
0.8.2, 0.8.0 |
Linux |
3.14, 3.13, 3.12, 3.11 |
|
0.16.0 |
Linux |
3.12 |
ROCm Core SDK components#
The following table lists tools and libraries included in the ROCm 7.12.0 release. Expect future releases to expand the list of components.
Important
The following table is a general overview of ROCm Core SDK components. Actual support for these libraries and tools might vary by GPU and OS. Use the Compatibility matrix to verify support for your specific setup.
|
Component group |
Component name |
Support |
|---|---|---|
|
Math and compute libraries |
Composable Kernel | Linux, Windows |
| hipBLAS | ||
| hipBLASLt | ||
| hipCUB | ||
| hipFFT | ||
| hipRAND | ||
| hipSOLVER | ||
| hipSPARSE | ||
| MIOpen | ||
| rocBLAS | ||
| rocFFT | ||
| rocRAND | ||
| rocSOLVER | ||
| rocSPARSE | ||
| rocPRIM | ||
| rocThrust | ||
| rocWMMA | ||
| hipSPARSELt | Linux only (Instinct MI350, MI300 Series, Ryzen APUs) | |
|
Communication libraries |
RCCL | Linux only |
| rocSHMEM | Linux only (Instinct, Radeon PRO, Radeon) | |
|
Support libraries |
ROCm CMake | Linux, Windows |
|
Runtimes and compilers |
HIP |
Linux, Windows |
| HIPIFY | ||
| LLVM | ||
| SPIRV-LLVM-Translator | ||
| ROCr Runtime | Linux only | |
|
Profiling and debugging tools |
ROCm Compute Profiler (rocprofiler-compute) | Linux only (Instinct) |
| ROCm Systems Profiler (rocprofiler-systems) | ||
| ROCprofiler-SDK | Linux | |
| ROCdbgapi | Linux only (Instinct, Radeon PRO, Radeon) | |
| ROCm Debugger (ROCgdb) | ||
| ROCr Debug Agent | ||
|
Control and monitoring tools |
AMD SMI | Linux only (Instinct, Radeon PRO, Radeon) |
| hipinfo | Windows | |
| rocminfo | Linux only |
Known issues#
The following are known issues identified in ROCm 7.12.0.
JAX GPU initialization might fail without AMD_COMGR_NAMESPACE set#
When running JAX with ROCm, symbol collisions can occur between the ROCm compiler infrastructure and other libraries. These collisions may prevent proper GPU initialization for JAX and can lead to crashes or cause JAX to silently fall back to CPU execution.
If AMD_COMGR_NAMESPACE=1 is not set:
JAX might fail to initialize the GPU
JAX workloads might unexpectedly run on the CPU instead of the GPU
Processes might crash during initialization
Set the environment variable AMD_COMGR_NAMESPACE=1 to isolate the ROCm
compiler infrastructure’s symbol namespace and avoid these collisions.
vLLM server fails to launch with ROCm 7.12 Docker image due to path failure#
A path resolution failure in the vLLM Docker environment prevents the loader from locating required ROCm SDK shared libraries. As a result, library lookups are redirected to an invalid or unexpected location, causing the vLLM server startup to fail.
As a workaround, before starting the vLLM server inside the ROCm 7.12 vLLM
Docker container, set LD_LIBRARY_PATH to include the ROCm SDK core library
path; for example:
export LD_LIBRARY_PATH=/opt/python/lib/python3.12/site-packages/_rocm_sdk_core/lib:$LD_LIBRARY_PATH
vLLM server fails to launch for models with tensor parallelism set to 8#
Launching the vLLM server might fail for models configured with tensor
parallelism (--tensor-parallel-size 8 or tp=8), resulting in
a custom_all_reduce_hip.cuh: invalid device pointer error. This issue will be
fixed in a future release.
PyTorch DDP Gloo backend test might fail on AMD GPUs#
On AMD GPUs, the PyTorch Distributed Data Parallel (DDP) test
test_ddp_apply_optim_in_backward_grad_as_bucket_view_false fails when using
the Gloo backend. This issue affects correctness of distributed training flows
that rely on this code path in PyTorch 2.8 when configured with Gloo.
As a workaround, use the NCCL backend instead of Gloo for multi-GPU distributed training using PyTorch 2.8. For example:
torch.distributed.init_process_group(backend="nccl", ...)
This issue will be fixed in a future release.
HIP kernel launch limit might be hit for some models#
With PyTorch 2.10, some models can hit the HIP kernel launch limit of 2³² kernel launches within a single process. When this limit is reached, further kernel launches fail. One known affected model is:
This issue manifests as a HIP kernel launch error during model execution. This issue will be fixed in a future release.
Performance regression in specific MAD PyTorch models between 2.9 and 2.10#
On ROCm PyTorch 2.10, some MAD-based ImageNet training and wrapper models show a performance regression compared to ROCm PyTorch 2.9. The currently known affected models include:
pyt_torchimagenet_inceptionv3_trainingpyt_torchimagenet_resnet50_trainingpt2_resnet152_pywrapper
These workloads might run slower on 2.10 than on 2.9 under similar conditions. This issue will be fixed in a future release.
ROCm 7.12 validation with PyTorch unit tests is limited#
For ROCm 7.12, the validation coverage using the PyTorch unit test suite is limited. Only a subset of the full PyTorch unit tests has been executed and validated on this release.
Some torchaudio transforms cannot be exported with torch.jit.script#
The following torchaudio transforms fail to export with torch.jit.script due
to missing TorchScript annotations and type compatibility issues:
FrequencyMaskingTimeMaskingDifferentiableFIRRNNTLoss
Users cannot export torchaudio models containing these transforms to TorchScript, blocking deployment of optimized audio processing pipelines. This issue will be fixed in a future release.
As a workaround, build torchaudio from the rocm/audio branch, which includes
the fix, or use eager mode execution instead of TorchScript.
PyTorch TestAutograd.test_multi_grad_all_hooks fails on Windows#
On Windows, the PyTorch sub-test TestAutograd.test_multi_grad_all_hooks fails
during runtime compilation of a temporary C++ extension due to MSVC linker
errors. This issue will be fixed in a future release.
TransferBench plugin fails to build for gfx1103#
Building rocm_bandwidth_test with --offload-arch=gfx1103 fails when compiling
the TransferBench plugin. As a result, TransferBench-based builds might not complete successfully
Ryzen 200 Series (gfx1103) APUs. This issue will be fixed in a future release.
HIP unit tests trigger a TDR event on Windows with gfx1103#
On Windows with Ryzen 200 Series (gfx1103) APUs, the HIP unit test
Unit_hipStreamValue_Wait_Blocking - uint32_t triggers a Timeout Detection and
Recovery (TDR) event. This causes the GPU driver to reset during test execution.
This issue will be fixed in a future release.
amd-smi reset -r help text is not updated#
The amd-smi reset --reload-driver (-r) command has been deprecated, but the
help text is not updated to reflect the current CLI options.
You can use modprobe instead of amd-smi reset -r to unload and reload the
AMD GPU driver:
modprobe -r amdgpu && modprobe amdgpu
rocWMMA header produces unknown type errors in HIP RTC#
Including rocwmma/rocwmma.hpp in HIP RTC (runtime compilation) contexts
produces compiler errors such as unknown type name '__bf16' and
unknown type name '__fp8_e4m3_fnuz'. This prevents using rocWMMA in HIP RTC
workflows.
As a workaround, add typedef definitions for the missing types before including the rocWMMA header. For example:
#if defined(__HIPCC_RTC__)
typedef _BitInt(16) __bf16;
typedef _BitInt(8) __fp8_e4m3_fnuz;
typedef _BitInt(8) __fp8_e5m2_fnuz;
#endif
#include <rocwmma/rocwmma.hpp>
hipCUB DeviceMerge large-size stress test fails with OOM on gfx1150#
On gfx1150 APUs, the hipCUB DeviceMerge large-size stress
test (MergeLargeSizeIterators) might fail with an out-of-memory (OOM) error
when running ROCm 7.12.0. All standard DeviceMerge test cases pass; only the
large-size stress configuration is affected. This issue will be fixed in a
future release.
ROCm Debug Agent tests fail with “wave not found in queue” on gfx1150#
On Gorgan Point (gfx1150) APUs, ROCm Debug Agent tests may fail with a fatal
wave not found in queue error. This occurs during debug API queue and
wavefront tracking, causing rocm-dbgapi to terminate while processing shader
debug events. This issue will be fixed in a future release.
Multi-ROCm installation fails on RPM-based distros#
On RPM-based Linux distributions (including RHEL and SLES), installing
ROCm 7.12 alongside an existing ROCm 7.11 installation using the
amdrocm7.<...>-gfx<...> meta-packages can fail due to RPM file conflicts. This
prevents side‑by‑side installation of ROCm 7.11 and 7.12 using the standard
repositories and package names. This issue will be fixed in a future release.
Resolved issues#
The following notable issues have been fixed in ROCm 7.12.0.
ROCm debugging tools binaries now fully available#
Previously, ROCm debugging tools – ROCdbgapi, ROCgdb, and ROCr Debug Agent –
were not available after installing using your Linux distribution’s package
manager or using pip.
This issue has been resolved.
Multi-node RCCL tests could crash or hang on MI355X with AINIC NICs#
Previously, multi-node RCCL tests (such as alltoall_perf, allgather_perf,
allreduce_perf, and reduce_scatter_perf) could crash intermittently or hang
on Instinct MI355X GPUs when using AINIC NICs and the AINIC RoCE path
(RCCL_AINIC_ROCE=1).
This issue has been resolved.
hipify-clang emitted spurious errors with CUDA 12.x#
Previously, when using hipify-clang with CUDA 12.x, the following messages
could appear during hipification:
error: must pass in an explicit nvptx64 gpu architecture to 'ptxas'
error: must pass in an explicit nvptx64 gpu architecture to 'nvlink'
These were emitted by the CUDA detection phase, which unnecessarily invoked the
CUDA device toolchain. The .hip files were still generated correctly and
could be used normally.
This issue has been resolved.
Apex encountered crashes and segfaults with the TheRock build system#
Previously, Apex would encounter crashes, missing module errors, and segfaults related to the HIP runtime during testing with the TheRock build system.
This issue has been resolved.
MIOpen unit tests failed to find rocrand headers during runtime kernel compilation#
Previously, MIOpen unit tests could fail to find the rocrand_xorwow.h header
during runtime compilation of certain kernels (for example,
MIOpenSoftmaxAttn.cpp), resulting in HIPRTC_ERROR_COMPILATION. This was
caused by missing runtime include-path configuration in TheRock artifacts: ROCm
could be installed in arbitrary locations, and the rocrand headers were not
reliably discoverable by HIPRTC/COMGR at runtime.
This issue has been resolved.
Upcoming changes#
Future preview releases will expand support for:
Additional ROCm Core SDK components
Domain-specific expansion toolkits (data science, life science, finance, simulation, and other HPC domains)
Extended AMD hardware support