ROCm 7.1.1 release notes

ROCm 7.1.1 release notes#

2025-11-26

26 min read time

Applies to Linux

The release notes provide a summary of notable changes since the previous ROCm release.

Release highlights
Supported hardware, operating system, and virtualization changes
User space, driver, and firmware dependent changes
ROCm components versioning
Detailed component changes
ROCm known issues
ROCm resolved issues
ROCm upcoming changes

Note

If you’re using AMD Radeon GPUs or Ryzen APUs in a workstation setting with a display connected, see the Use ROCm on Radeon and Ryzen documentation to verify compatibility and system requirements.

Release highlights#

The following are notable new features and improvements in ROCm 7.1.1. For changes to individual components, see Detailed component changes.

Supported hardware, operating system, and virtualization changes#

ROCm 7.1.1 adds support for the following operating systems and kernel versions:

RHEL 10.1 (kernel: 6.12.0-124)
RHEL 9.7 (kernel: 5.14.0-611)

ROCm 7.1.1 extends the Debian 13 support to AMD Instinct MI355X and MI350X GPUs.

For more information about:

AMD hardware, see Supported GPUs (Linux).
Operating systems, see Supported operating systems and ROCm installation for Linux.

Virtualization support#

ROCm 7.1.1 adds Ubuntu 24.04 as a Guest OS in KVM SR-IOV for AMD Instinct MI300X GPUs. For more information, see Virtualization Support.

User space, driver, and firmware dependent changes#

The software for AMD Data Center GPU products requires maintaining a hardware and software stack with interdependencies among the GPU and baseboard firmware, AMD GPU drivers, and the ROCm user space software.

ROCm Version	GPU	PLDM Bundle (Firmware)	AMD GPU Driver (amdgpu)	AMD GPU Virtualization Driver (GIM)
ROCm 7.1.1	MI355X	01.25.16.03 01.25.15.04	30.20.1 30.20.0 30.10.2 30.10.1 30.10	8.6.0.K
	MI350X	01.25.16.03 01.25.15.04	30.20.1 30.20.0 30.10.2 30.10.1 30.10
	MI325X^[1]	01.25.04.02	30.20.1 30.20.0^[1] 30.10.2 30.10.1 30.10 6.4.z where z (0-3) 6.3.y where y (1-3)
	MI300X	01.25.03.12	30.20.1 30.20.0 30.10.2 30.10.1 30.10 6.4.z where z (0–3) 6.3.y where y (1–3)	8.6.0.K
	MI300A	BKC 26		Not Applicable
	MI250X	IFWI 47 (or later)
	MI250	MU5 w/ IFWI 75 (or later)
	MI210	MU5 w/ IFWI 75 (or later)		8.6.0.K
	MI100	VBIOS D3430401-037		Not Applicable

[1]: For AMD Instinct MI325X KVM SR-IOV users, don't use AMD GPU Driver (amdgpu) 30.20.0.

AMD Instinct MI355X and MI350X metrics and telemetry enhancements#

AMD SMI now supports per-partition metrics and monitoring on AMD Instinct MI355X and MI350X GPUs – depending on PLDM bundle minimum version 01.25.16.03, including reporting for thermal throttle limits and thermal alert thresholds. For AMD SMI on bare metal, metrics per GPU partition are available through the library API: amdsmi_get_gpu_partition_metrics_info(). See the AMD SMI changelog for details.

AMD Instinct MI355X GPU resiliency improvement#

Multimedia Engine Reset is now supported by the AMD GPU Driver (amdgpu) 30.20.1 for AMD Instinct MI355X GPUs. This finer-grain GPU resiliency enables recovery from faults related to VCN or JPEG without requiring a full GPU reset, thereby improving system stability and fault tolerance. Note that VCN queue reset functionality requires PLDM bundle 01.25.16.03 (or later) firmware.

AMD Instinct MI325X SR-IOV Mode 1 reset issue fixed#

An issue affecting AMD Instinct MI325X GPUs in SR-IOV Mode 1 has been resolved in AMD GPU Driver (amdgpu) version 30.20.1. This fix enables seamless usage of KVM virtualization with SR-IOV configurations and allows users to proceed with ROCm and AMD GPU Driver updates without encountering reset-related failures.

GEMM kernel selection improvement#

GEMM kernel selection efficiency has been improved using Origami. This results in improved out-of-the-box performance of GEMM functions for hipBLASLT and rocBLAS, as well as a reduced need for tuning. This improvement reduces selection time, increases selection accuracy, and adds Origami libraries for all GEMM problem types on AMD Instinct MI350X GPUs.

Performance improvement in CK/AITER fused-attn#

Padding is now supported in native CK/AITER fused-attn mode, reducing the overall runtime. Previously, the Transformer Engine (TE) had to remove padding before processing and reapply it afterward as a workaround, which added runtime overhead. With this update, TE can now pass padded input directly to CK/AITER and receive padded output, eliminating the need for that workaround.

AI model support update#

ROCm 7.1.1 updates the support for the following AI models:

Hugging Face Transformers is now supported on gfx1201.
Microsoft Phi-4-multimodal-instruct is now supported on gfx1201.
Qwen QwQ-32B is now supported on gfx1201.
Google Gemma 3 27B is now supported on gfx1100.

ROCm Data Science updates#

ROCm Data Science Toolkit (ROCm-DS) is a comprehensive open-source software collection designed to accelerate data science and machine learning workloads on AMD GPUs. In November 2025, ROCm-DS transitioned from early access (EA) to general availability (GA).

This GA release marks a significant milestone for ROCm-DS as hipDF and hipMM transition to production status. Additionally, it introduces two new production components: hipRAFT and hipVS. For more information, see AMD ROCm-DS documentation.

Deep learning and AI framework updates#

ROCm provides a comprehensive ecosystem for deep learning development. For more information, see Deep learning frameworks for ROCm and the Compatibility matrix for the complete list of Deep learning and AI framework versions tested for compatibility with ROCm. As of November 2025, AMD ROCm has officially updated support for the following Deep learning and AI frameworks:

PyTorch#

ROCm 7.1.1 enables support for PyTorch 2.9. For more information, see PyTorch compatibility.

Deep Graph Library (DGL)#

Deep Graph Library (DGL) is an easy-to-use, high-performance, and scalable Python package for deep learning on graphs. DGL is framework agnostic, meaning that if a deep graph model is a component in an end-to-end application, the rest of the logic is implemented using PyTorch. It’s supported on ROCm 7.0.0, ROCm 6.4.3, and ROCm 6.4.0. For more information, see DGL compatibility.

llama.cpp#

llama.cpp is an open-source framework for Large Language Model (LLM) inference that runs on both central processing units (CPUs) and graphics processing units (GPUs). It is written in plain C/C++, providing a simple, dependency-free setup. It’s supported on ROCm 7.0.0 and ROCm 6.4.x. For more information, see llama.cpp compatibility.

ROCm Offline Installer Creator updates#

The ROCm Offline Installer Creator 7.1.1 includes the following features and improvements:

Added support for RHEL 9.7 and 10.1.
Added support for creating an offline installer for SLES 15.7, where the kernel version of the target OS differs from the host OS creating the installer.

See ROCm Offline Installer Creator for more information.

ROCm Runfile Installer updates#

The ROCm Runfile Installer 7.1.1 includes the following features and improvements:

Added support for RHEL 9.7 and 10.1.
Fixed an issue where, after dependency installation, some dependencies were still marked as uninstalled.
Fixed an issue where the AMDGPU driver install would fail when multiple kernels were installed.
Performance improvements for the RHEL/Oracle Linux dependency install.

For more information, see ROCm Runfile Installer.

Expansion of the ROCm examples repository#

The ROCm examples repository has been expanded with examples for the following ROCm components:

Usage examples are now available for the following performance analysis tools:

The complete source code for the HIP Graph Tutorial is also available as part of the ROCm examples.

ROCm documentation updates#

ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.

The HIP documentation has been enhanced with new GPU programming pattern tutorials. These tutorials address common GPU challenges, including memory coherence, race conditions, and data transfer overhead. They provide practical, performance-oriented examples for real-world applications in machine learning, scientific computing, and image processing. The following tutorials have been added:
- Two-dimensional kernels: Efficient matrix and image processing with optimized thread mapping and memory access.
- Stencil operations: Implementing spatially dependent computations for image filtering and physics simulations.
- Atomic operations: Managing concurrent memory access safely for tasks such as histogram generation.
- Multi-kernel programming: Coordinating multiple GPU kernels for complex iterative algorithms such as graph traversal.
- CPU-GPU cooperative computing: Balancing workloads between CPU and GPU for hybrid algorithms such as K-means clustering.
Tutorials for AI developers have been expanded with the following two new pretraining tutorials:
- Pretraining with TorchTitan
- Training a model with Primus
For more information about the changes, see the Changelog for the AI Developer Hub.
ROCm environment variables are used to configure and optimize the development and runtime experience. These variables define key settings such as installation paths, platform selection, and runtime behavior for applications running on AMD GPUs. The new ROCm environment variables topic summarizes HIP and ROCR-Runtime environment variables, and provides links to environment variable topics for other ROCm components.

ROCm components#

The following table lists the versions of ROCm components for ROCm 7.1.1, including any version changes from 7.1.0 to 7.1.1. Click the component’s updated version to go to a list of its changes.

Click to go to the component’s source code on GitHub.

Category	Group	Name	Version
Libraries	Machine learning and computer vision	Composable Kernel	1.1.0 ⇒ 1.1.0
		MIGraphX	2.14.0 ⇒ 2.14.0
		MIOpen	3.5.1
		MIVisionX	3.4.0
		rocAL	2.4.0
		rocDecode	1.4.0
		rocJPEG	1.2.0
		rocPyDecode	0.7.0
		RPP	2.1.0
	Communication	RCCL	2.27.7 ⇒ 2.27.7
	Communication	rocSHMEM	3.0.0 ⇒ 3.1.0
	Math	hipBLAS	3.1.0
		hipBLASLt	1.1.0
		hipFFT	1.0.21
		hipfort	0.7.1
		hipRAND	3.1.0
		hipSOLVER	3.1.0
		hipSPARSE	4.1.0
		hipSPARSELt	0.2.5
		rocALUTION	4.0.1
		rocBLAS	5.1.0 ⇒ 5.1.1
		rocFFT	1.0.35
		rocRAND	4.1.0
		rocSOLVER	3.31.0
		rocSPARSE	4.1.0
		rocWMMA	2.0.0 ⇒ 2.1.0
		Tensile	4.44.0
	Primitives	hipCUB	4.1.0
		hipTensor	2.0.0
		rocPRIM	4.1.0
		rocThrust	4.1.0
Tools	System management	AMD SMI	26.1.0 ⇒ 26.2.0
		ROCm Data Center Tool	1.2.0
		rocminfo	1.0.0
		ROCm SMI	7.8.0
		ROCm Validation Suite	1.2.0 ⇒ 1.3.0
			Performance	ROCm Bandwidth Test	2.6.0 ⇒ 2.6.0
				ROCm Compute Profiler	3.3.0 ⇒ 3.3.1
ROCm Systems Profiler	1.2.0 ⇒ 1.2.1
ROCProfiler	2.0.0
ROCprofiler-SDK	1.0.0
ROCTracer	4.1.0
	Development	HIPIFY	20.0.0
		ROCdbgapi	0.77.4
		ROCm CMake	0.14.0
		ROCm Debugger (ROCgdb)	16.3
		ROCr Debug Agent	2.1.0
Compilers		HIPCC	1.1.1
Compilers		llvm-project	20.0.0
Runtimes		HIP	7.1.0 ⇒ 7.1.1
Runtimes		ROCr Runtime	1.18.0

Detailed component changes#

The following sections describe key changes to ROCm components.

Note

For a historical overview of ROCm component updates, see the ROCm consolidated changelog.

AMD SMI (26.2.0)#

Added#

Caching for repeated ASIC information calls.
- The cache added to amdsmi_get_gpu_asic_info improves performance by avoiding redundant hardware queries.
- The cache stores ASIC info for each GPU device with a configurable duration, defaulting to 10 seconds. Use the AMDSMI_ASIC_INFO_CACHE_MS environment variable for cache duration configuration for amdsmi_get_gpu_asic_info API calls.
Support for GPU partition metrics.
- Provides support for xcp_metrics v1.0 and extends support for v1.1 (dynamic metrics).
- Added amdsmi_get_gpu_partition_metrics_info, which provides per XCP (partition) metrics.
Support for displaying newer VRAM memory types in amd-smi static --vram.
- The amdsmi_get_gpu_vram_info() API now supports detecting DDR5, LPDDR4, LPDDR5, and HBM3E memory types.

Changed#

Updated amd-smi static --numa socket affinity data structure. It now displays CPU affinity information in both hexadecimal bitmask format and expanded CPU core ranges, replacing the previous simplified socket enumeration approach.

Resolved issues#

Fixed incorrect topology weight calculations.
- Out-of-bound writes caused corruption in the weights field.
Fixed amd-smi event not respecting the Linux timeout command.
Fixed an issue where amdsmi_get_power_info returned AMDSMI_STATUS_API_FAILED.
- VMs were incorrectly reporting AMDSMI_STATUS_API_FAILED when unable to get the power cap within the amdsmi_get_power_info.
- The API now returns N/A or UINT_MAX for values that can’t be retrieved, instead of failing.
Fixed output for amd-smi xgmi -l --json.

Note

See the full AMD SMI changelog for details, examples, and in-depth descriptions.

Composable Kernel (1.1.0)#

Upcoming changes#

Composable Kernel will adopt C++20 features in an upcoming ROCm release, updating the minimum compiler requirement to C++20. Ensure that your development environment meets this requirement to facilitate a seamless transition.

HIP (7.1.1)#

Added#

Support for the flag hipHostRegisterIoMemory in hipHostRegister, used to register I/O memory with HIP runtime so the GPU can access it.

Resolved issues#

Incorrect Compute Unit (CU) mask in logging. HIP runtime now correctly sets the field width for the output print operation. When logging is enabled via the environment variable AMD_LOG_LEVEL, the runtime logs the accurate CU mask.
A segmentation fault occurred when the dynamic queue management mechanism was enabled. HIP runtime now ensures GPU queues aren’t NULL during marker submission, preventing crashes and improving robustness.
An error encountered on HIP tear-down after device reset in certain applications due to accessing stale memory objects. HIP runtime now properly releases memory associated with host calls, ensuring reliable device resets.
A race condition occurred in certain graph-related applications when pending asynchronous signal handlers referenced device memory that had already been released, leading to memory corruption. HIP runtime now uses a reference counting strategy to manage access to device objects in asynchronous event handlers, ensuring safe and reliable memory usage.

MIGraphX (2.14.0)#

Resolved issues#

Fixed an error that resulted when running make check on systems running on a gfx1201 GPU.

RCCL (2.27.7)#

Resolved issues#

Fixed a single-node data corruption issue in MSCCL on the AMD Instinct MI350X and MI355X GPUs for the LL protocol. This previously affected about two percent of the runs for single-node AllReduce with inputs smaller than 512 KiB.

rocBLAS (5.1.1)#

Changed#

By default, rocBLAS will not use stream order allocation for its internal workspace. To enable this behavior, set the ROCBLAS_STREAM_ORDER_ALLOC environment variable.

ROCm Bandwidth Test (2.6.0)#

Resolved issues#

Test failure with error message Cannot make canonical path.
Healthcheck test failure with seg fault on gfx942.
Segmentation fault observed in schmoo and one2all when executed on sgpu setup.

Known issues#

rocm-bandwidth-test folder fails to be removed after driver uninstallation:
- After running amdgpu-uninstall, the rocm-bandwidth-test folder and package are still present.
- Workaround: Remove the package manually using:
```
sudo apt-get remove -y rocm-bandwidth-test
```

ROCm Compute Profiler (3.3.1)#

Added#

Support for PC sampling of multi-kernel applications.
- PC Sampling output instructions are displayed with the name of the kernel to which the individual instruction belongs.
- Single kernel selection is supported so that the PC samples of the selected kernel can be displayed.

Changed#

Roofline analysis now runs on GPU 0 by default instead of all GPUs.

Optimized#

Improved roofline benchmarking by updating the flops_benchmark calculation.
Improved standalone roofline plots in profile mode (PDF output) and analyze mode (CLI and GUI visual plots):
- Fixed the peak MFMA/VALU lines being cut off.
- Cleaned up the overlapping roofline numeric values by moving them into the side legend.
- Added AI points chart with respective values, cache level, and compute/memory bound status.
- Added full kernel names to the symbol chart.

Resolved issues#

Resolved existing issues to improve stability.

ROCm Systems Profiler (1.2.1)#

Resolved issues#

Fixed an issue of OpenMP Tools (OMPT) events, GPU performance counters, VA-API, MPI, and host events failing to be collected in the rocpd output.

ROCm Validation Suite (1.3.0)#

Added#

Support for different test levels with -r option for AMD Instinct MI3XXx GPUs.
Set compute type for DGEMM operations on AMD Instinct MI350X and MI355X GPUs.

rocSHMEM (3.1.0)#

Added#

Allowed IPC, RO, and GDA backends to be selected at runtime.
GDA conduit for different NIC vendors:
- Broadcom BNXT_RE (Thor 2)
- Mellanox MLX5 (IB and RoCE ConnectX-7)
New APIs:
- rocshmem_get_device_ctx

Changed#

The following APIs have been deprecated:
- rocshmem_wg_init
- rocshmem_wg_finalize
- rocshmem_wg_init_thread
rocshmem_ptr can now return non-null pointer to a shared memory region when the IPC transport is available to reach that region. Previously, it would return a null pointer.
ROCSHMEM_RO_DISABLE_IPC is renamed to ROCSHMEM_DISABLE_MIXED_IPC.
- This environment variable wasn’t documented in earlier releases. It’s now documented.

Removed#

rocSHMEM no longer requires rocPRIM and rocThrust as dependencies.
Removed MPI compile-time dependency.

Known issues#

Only a subset of rocSHMEM APIs are implemented for the GDA conduit.

rocWMMA (2.1.0)#

Added#

More unit tests to increase the code coverage.

Changed#

Increased compile timeout and improved visualization in math-ci.

Removed#

Absolute paths from the RPATH of sample and test binary files.

Resolved issues#

Fixed issues caused by HIP changes:
- Removed the .data member from HIP_vector_type.
- Broadcast constructor now only writes to the first vector element.
Fixed a bug related to int32_t usage in hipRTC_gemm for gfx942, caused by breaking changes in HIP.
Replaced #pragma unroll with static for to fix a bug caused by the upgraded compiler which no longer supports using #pragma unroll with template parameter indices.
Corrected test predicates for BLK and VW cooperative kernels.
Modified compute_utils.sh in build-infra to ensure rocWMMA is built with gfx1151 target for ROCm 7.0 and beyond.

ROCm known issues#

ROCm known issues are noted on GitHub. For known issues related to individual components, review the Detailed component changes.

RCCL performance degradation on AMD Instinct MI300X GPU with AMD Pollara AI NIC#

If you’re using RCCL on AMD Instinct MI300X GPUs with AMD Pollara AI NIC, you might observe performance degradation for specific collectives and message sizes. The affected collectives are Scatter, AllToAll, and AlltoAllv. It’s recommended to avoid using RCCL packaged with ROCm 7.1.1. As a workaround, use the RCCL develop branch, which contains the fix and will be included in a future ROCm release. See GitHub issue #5717.

Segmentation fault in training models using TensorFlow 2.20.0 Docker images#

Training models tf2_tfm_resnet50_fp16_train and tf2_tfm_resnet50_fp32_train might fail with a segmentation fault when run on the TensorFlow 2.20.0 Docker image with ROCm 7.1.1. As a workaround, use TensorFlow 2.19.x Docker image for training the models in ROCm 7.1.1. This issue will be fixed in a future ROCm release. See GitHub issue #5718.

AMD SMI CLI triggers repeated kernel errors on GPUs with partitioning support#

Running the amd-smi CLI on GPUs with partitioning support, such as the AMD Instinct MI300 series, might produce repeated kernel error messages in the system logs. This occurs when amd-smi attempts to open the GPU partition device nodes /dev/dri/renderD* during the permission checks. On GPUs with partitioning support, unconfigured partition devices are intentionally invalid until configured. As a result, the AMD GPU Driver (amdgpu) logs errors in dmesg, such as:

amdgpu 0000:15:00.0: amdgpu: renderD153 partition 1 not valid!

These repeated kernel logs can clutter the system logs and may cause unnecessary concern about GPU health. However, this is a non-functional issue and does not affect AMD SMI functionality or GPU performance. This issue will be fixed in a future ROCm release. See GitHub issue #5720.

Excessive bad page logs in AMD GPU Driver (amdgpu)#

Due to partial data corruption in the Electrically Erasable Programmable Read-Only Memory (EEPROM) and limited error handling in the AMD GPU Driver (amdgpu), excessive log output might occur when querying the reliability, availability, and serviceability (RAS) bad pages. This issue will be fixed in a future AMD GPU Driver (amdgpu) and ROCm release. See GitHub issue #5719.

Incorrect results in gemm_ex operations for rocBLAS and hipBLAS#

Some gemm_ex operations with 8-bit input data types (int8, float8, bfloat8) for specific matrix dimensions (K = 1 and number of workgroups > 1) might yield incorrect results. The issue results from incorrect tailloop code that fails to consider workgroup index when calculating valid element size. The issue will be fixed in a future ROCm release. See GitHub issue #5722.

hipBLASLt performance variation for a particular FP8 GEMM operation on AMD Instinct MI325X GPUs#

If you’re using hipBLASLt on AMD Instinct MI325X GPUs for large FP8 GEMM operations (such as 9728x8192x65536), you might observe a noticeable performance variation. The issue is currently under investigation and will be fixed in a future ROCm release. See GitHub issue #5734.

ROCm resolved issues#

The following are previously known issues resolved in this release. For resolved issues related to individual components, review the Detailed component changes.

Issue uninstalling ROCm Bandwidth Test using the amdgpu-install script#

The issue where ROCm Bandwidth Test could not be cleanly uninstalled using the amdgpu-install script due to a missing rocm-core dependency has been resolved. See GitHub issue #5611.

RCCL profiler plugin failure with AllToAll operations#

The issue where the RCCL profiler plugin librccl-profiler.so failed with a segmentation fault during AllToAll collective operations due to improperly assigned point-to-point task function pointers has been resolved. This issue resulted in invalid memory access and prevented the profiling of AllToAll performance. Other operations, such as AllReduce, were unaffected. See GitHub issue #5653.

Reduced precision in gemm_ex operations for rocBLAS and hipBLAS#

An issue causing certain gemm_ex operations with half or f32_r data types to return 16-bit precision instead of the expected 32-bit precision when matrix dimensions were m=1 or n=1 has been resolved. The issue resulted from an optimization that enabled _ex APIs to use lower precision multiples. It limited the high-precision matrix operations performed in PyTorch with rocBLAS and hipBLAS. See GitHub issue #5640.

ROCm upcoming changes#

The following changes to the ROCm software stack are anticipated for future releases.

ROCm SMI deprecation#

ROCm SMI will be phased out in an upcoming ROCm release and will enter maintenance mode. After this transition, only critical bug fixes will be addressed and no further feature development will take place.

It’s strongly recommended to transition your projects to AMD SMI, the successor to ROCm SMI. AMD SMI includes all the features of the ROCm SMI and will continue to receive regular updates, new functionality, and ongoing support. For more information on AMD SMI, see the AMD SMI documentation.

ROCTracer, ROCProfiler, rocprof, and rocprofv2 deprecation#

ROCTracer, ROCProfiler, rocprof, and rocprofv2 are deprecated and only critical defect fixes will be addressed for older versions of the profiling tools and libraries. It’s strongly recommended to upgrade to the latest version of the ROCprofiler-SDK library and the (rocprofv3) tool to ensure continued support and access to new features.

It’s anticipated that ROCTracer, ROCProfiler, rocprof, and rocprofv2 will reach end-of-life by future releases, aligning with Q1 of 2026.

AMDGPU wavefront size compiler macro deprecation#

Access to the wavefront size as a compile-time constant via the __AMDGCN_WAVEFRONT_SIZE and __AMDGCN_WAVEFRONT_SIZE__ macros are deprecated and will be disabled in a future release. In ROCm 7.0.0 warpSize is only available as a non-constexpr variable. You’re encouraged to update your code if needed to ensure future compatibility.

The __AMDGCN_WAVEFRONT_SIZE__ macro and __AMDGCN_WAVEFRONT_SIZE alias will be removed in an upcoming release. It is recommended to remove any use of this macro. For more information, see AMDGPU support.
warpSize is only available as a non-constexpr variable. Where required, the wavefront size should be queried via the warpSize variable in device code, or via hipGetDeviceProperties in host code. Neither of these will result in a compile-time constant. For more information, see warpSize.
For cases where compile-time evaluation of the wavefront size cannot be avoided, uses of __AMDGCN_WAVEFRONT_SIZE, __AMDGCN_WAVEFRONT_SIZE__, or warpSize can be replaced with a user-defined macro or constexpr variable with the wavefront size(s) for the target hardware. For example:

   #if defined(__GFX9__)
   #define MY_MACRO_FOR_WAVEFRONT_SIZE 64
   #else
   #define MY_MACRO_FOR_WAVEFRONT_SIZE 32
   #endif

Changes to ROCm Object Tooling#

ROCm Object Tooling tools roc-obj-ls, roc-obj-extract, and roc-obj were deprecated in ROCm 6.4, and will be removed in a future release. Functionality has been added to the llvm-objdump --offloading tool option to extract all clang-offload-bundles into individual code objects found within the objects or executables passed as input. The llvm-objdump --offloading tool option also supports the --arch-name option, and only extracts code objects found with the specified target architecture. See llvm-objdump for more information.

ROCm 7.1.1 release notes

Contents

ROCm 7.1.1 release notes#

Release highlights#

Supported hardware, operating system, and virtualization changes#

Virtualization support#

User space, driver, and firmware dependent changes#

AMD Instinct MI355X and MI350X metrics and telemetry enhancements#

AMD Instinct MI355X GPU resiliency improvement#

AMD Instinct MI325X SR-IOV Mode 1 reset issue fixed#

GEMM kernel selection improvement#

Performance improvement in CK/AITER fused-attn#

AI model support update#

ROCm Data Science updates#

Deep learning and AI framework updates#

PyTorch#

Deep Graph Library (DGL)#

llama.cpp#

ROCm Offline Installer Creator updates#

ROCm Runfile Installer updates#

Expansion of the ROCm examples repository#

ROCm documentation updates#

ROCm components#

Detailed component changes#

AMD SMI (26.2.0)#

Added#

Changed#

Resolved issues#

Composable Kernel (1.1.0)#

Upcoming changes#

HIP (7.1.1)#

Added#

Resolved issues#

MIGraphX (2.14.0)#

Resolved issues#

RCCL (2.27.7)#

Resolved issues#

rocBLAS (5.1.1)#

Changed#

ROCm Bandwidth Test (2.6.0)#

Resolved issues#

Known issues#

ROCm Compute Profiler (3.3.1)#

Added#

Changed#

Optimized#

Resolved issues#

ROCm Systems Profiler (1.2.1)#

Resolved issues#

ROCm Validation Suite (1.3.0)#

Added#

rocSHMEM (3.1.0)#

Added#

Changed#

Removed#

Known issues#

rocWMMA (2.1.0)#

Added#

Changed#

Removed#

Resolved issues#

ROCm known issues#

RCCL performance degradation on AMD Instinct MI300X GPU with AMD Pollara AI NIC#

Segmentation fault in training models using TensorFlow 2.20.0 Docker images#

AMD SMI CLI triggers repeated kernel errors on GPUs with partitioning support#

Excessive bad page logs in AMD GPU Driver (amdgpu)#

Incorrect results in gemm_ex operations for rocBLAS and hipBLAS#

hipBLASLt performance variation for a particular FP8 GEMM operation on AMD Instinct MI325X GPUs#

ROCm resolved issues#

Issue uninstalling ROCm Bandwidth Test using the amdgpu-install script#

RCCL profiler plugin failure with AllToAll operations#

Reduced precision in gemm_ex operations for rocBLAS and hipBLAS#

ROCm upcoming changes#

ROCm SMI deprecation#

ROCTracer, ROCProfiler, rocprof, and rocprofv2 deprecation#

AMDGPU wavefront size compiler macro deprecation#

Changes to ROCm Object Tooling#