ROCm 6.2.1 release notes#
2024-09-20
14 min read time
The release notes provide a summary of notable changes since the previous ROCm release.
The Compatibility matrix provides the full list of supported hardware, operating systems, ecosystems, third-party components, and ROCm components for each ROCm release.
Release notes for previous ROCm releases are available in earlier versions of the documentation. See the ROCm documentation release history.
Release highlights#
The following are notable new features and improvements in ROCm 6.2.1. For changes to individual components, see Detailed component changes.
rocAL major version change#
The new version of rocAL introduces many new features, but does not modify any of the existing public API functions. However, the version number was incremented from 1.3 to 2.0. Applications linked to version 1.3 must be recompiled to link against version 2.0.
See the rocAL detailed changes for more information.
New support for FBGEMM (Facebook General Matrix Multiplication)#
As of ROCm 6.2.1, ROCm supports Facebook General Matrix Multiplication (FBGEMM) and the related FBGEMM_GPU library.
FBGEMM is a low-precision, high-performance CPU kernel library for convolution and matrix multiplication. It is used for server-side inference and as a back end for PyTorch quantized operators. FBGEMM_GPU includes a collection of PyTorch GPU operator libraries for training and inference. For more information, see the ROCm Model acceleration libraries guide and PyTorch’s FBGEMM GitHub repository.
ROCm Offline Installer Creator changes#
The ROCm Offline Installer Creator 6.2.1 introduces several new features and improvements including:
Logging support for create and install logs
More stringent checks for Linux versions and distributions
Updated prerequisite repositories
Fixed CTest issues
ROCm documentation changes#
The HIP documentation has been updated with several new topics aimed at improving usability and providing more detailed information.
The Programming Model Reference and Understanding the Programming Model topics in HIP have been consolidated into one topic, HIP programming model (conceptual).
The HIP virtual memory management and HIP virtual memory management API topics have been added.
Note
To contribute to ROCm documentation, see the ROCm documentation contribution guidelines.
Operating system and hardware support changes#
ROCm 6.2.1 adds support for Ubuntu 24.04.1 (kernel: 6.8 [GA]).
See the Compatibility matrix for the full list of supported operating systems and hardware architectures.
ROCm components#
The following table lists the versions of ROCm components for ROCm 6.2.1, including any version changes from 6.2.0 to 6.2.1.
Click the component’s updated version to go to a detailed list of its changes. Click to go to the component’s source code on GitHub.
Category | Group | Name | Version | |
---|---|---|---|---|
Libraries | Machine learning and computer vision | Composable Kernel | 1.1.0 | |
MIGraphX | 2.10 | |||
MIOpen | 3.2.0 | |||
MIVisionX | 3.0.0 | |||
rocAL | 1.0.0 ⇒ 2.0.0 | |||
rocDecode | 0.6.0 | |||
rocPyDecode | 0.1.0 | |||
RPP | 1.8.0 | |||
Communication | RCCL | 2.20.5 ⇒ 2.20.5 | ||
Math | hipBLAS | 2.2.0 | ||
hipBLASLt | 0.8.0 | |||
hipFFT | 1.0.15 | |||
hipfort | 0.4.0 | |||
hipRAND | 2.11.0 | |||
hipSOLVER | 2.2.0 | |||
hipSPARSE | 3.1.1 | |||
hipSPARSELt | 0.2.1 | |||
rocALUTION | 3.2.0 | |||
rocBLAS | 4.1.2 ⇒ 4.2.1 | |||
rocFFT | 1.0.28 ⇒ 1.0.29 | |||
rocRAND | 3.1.0 | |||
rocSOLVER | 3.26.0 | |||
rocSPARSE | 3.2.0 | |||
rocWMMA | 1.5.0 | |||
Tensile | 4.41.0 | |||
Primitives | hipCUB | 3.2.0 | ||
hipTensor | 1.3.0 | |||
rocPRIM | 3.2.0 ⇒ 3.2.1 | |||
rocThrust | 3.1.0 | |||
Tools | System management | AMD SMI | 24.6.2 ⇒ 24.6.3 | |
rocminfo | 1.0.0 | |||
ROCm Data Center Tool | 0.3.0 | |||
ROCm SMI | 7.3.0 ⇒ 7.3.0 | |||
ROCm Validation Suite | 1.0.0 | |||
Performance | Omniperf | 2.0.1 ⇒ 2.0.1 | ||
Omnitrace | 1.11.2 ⇒ 1.11.2 | |||
ROCm Bandwidth Test | 1.4.0 | |||
ROCProfiler | 2.0.0 | |||
ROCprofiler-SDK | 0.4.0 | |||
ROCTracer | 4.1.0 | |||
Development | HIPIFY | 18.0.0 ⇒ 18.0.0 | ||
ROCdbgapi | 0.76.0 | |||
ROCm CMake | 0.13.0 | |||
ROCm Debugger (ROCgdb) | 14.2 | |||
ROCr Debug Agent | 2.0.3 | |||
Compilers | HIPCC | 1.1.1 | ||
llvm-project | 18.0.0 | |||
Runtimes | HIP | 6.2 ⇒ 6.2.1 | ||
ROCr Runtime | 1.14.0 |
Detailed component changes#
The following sections describe key changes to ROCm components.
AMD SMI (24.6.3)#
Changes#
Added
amd-smi static --ras
on Guest VMs. Guest VMs can view enabled/disabled RAS features on Host cards.
Removals#
Removed
amd-smi metric --ecc
&amd-smi metric --ecc-blocks
on Guest VMs. Guest VMs do not support getting current ECC counts from the Host cards.
Resolved issues#
Fixed TypeError in
amd-smi process -G
.Updated CLI error strings to handle empty and invalid GPU/CPU inputs.
Fixed Guest VM showing passthrough options.
Fixed firmware formatting where leading 0s were missing.
HIP (6.2.1)#
Resolved issues#
Soft hang when using
AMD_SERIALIZE_KERNEL
Memory leak in
hipIpcCloseMemHandle
HIPIFY (18.0.0)#
Changes#
Added CUDA 12.5.1 support.
Added cuDNN 9.2.1 support.
Added LLVM 18.1.8 support.
Added
hipBLAS
64-bit APIs support.Added Support for math constants
math_constants.h
.
Omniperf (2.0.1)#
Changes#
Enabled rocprofv1 for MI300 hardware.
Added dependency checks on application launch.
Updated Omniperf packaging.
Rolled back Grafana version in Dockerfile for Angular plugin compatibility.
Added GPU model distinction for MI300 systems.
Refactored and updated documemtation.
Resolved issues#
Fixed an issue with analysis output.
Fixed issues with profiling multi-process and multi-GPU applications.
Optimizations#
Reduced running time of Omniperf when profiling.
Improved console logging.
Omnitrace (1.11.2)#
Known issues#
Perfetto can no longer open Omnitrace proto files. Loading Perfetto trace output .proto
files in the latest version of ui.perfetto.dev
can result in a dialog with the message, “Oops, something went wrong! Please file a bug.” The information in the dialog will refer to an “Unknown field type.” The workaround is to open the files with the previous version of the Perfetto UI found at https://ui.perfetto.dev/v46.0-35b3d9845/#!/.
See issue #3767 on GitHub.
RCCL (2.20.5)#
Known issues#
On systems running Linux kernel 6.8.0, such as Ubuntu 24.04, Direct Memory Access (DMA) transfers between the GPU and NIC are disabled and impacts multi-node RCCL performance. This issue was reproduced with RCCL 2.20.5 (ROCm 6.2.0 and 6.2.1) on systems with Broadcom Thor-2 NICs and affects other systems with RoCE networks using Linux 6.8.0 or newer. Older RCCL versions are also impacted.
This issue will be addressed in a future ROCm release.
See issue #3772 on GitHub.
rocAL (2.0.0)#
Changes#
The new version of rocAL introduces many new features, but does not modify any of the existing public API functions.However, the version number was incremented from 1.3 to 2.0. Applications linked to version 1.3 must be recompiled to link against version 2.0.
Added development and test packages.
Added C++ rocAL audio unit test and Python script to run and compare the outputs.
Added Python support for audio decoders.
Added Pytorch iterator for audio.
Added Python audio unit test and support to verify outputs.
Added rocDecode for HW decode.
Added support for:
Audio loader and decoder, which uses libsndfile library to decode wav files
Audio augmentation - PreEmphasis filter, Spectrogram, ToDecibels, Resample, NonSilentRegionDetection, MelFilterBank
Generic augmentation - Slice, Normalize
Reading from file lists in file reader
Downmixing audio channels during decoding
TensorTensorAdd and TensorScalarMultiply operations
Uniform and Normal distribution nodes
Image to tensor updates
ROCm install - use case graphics removed
Known issues#
Dependencies are not installed with the rocAL package installer. Dependencies must be installed with the prerequisite setup script provided. See the rocAL README on GitHub for details.
rocBLAS (4.2.1)#
Removals#
Removed Device_Memory_Allocation.pdf link in documentation.
Resolved issues#
Fixed error/warning message during
rocblas_set_stream()
call.
rocFFT (1.0.29)#
Optimizations#
Implemented 1D kernels for factorizable sizes less than 1024.
ROCm SMI (7.3.0)#
Optimizations#
Improved handling of UnicodeEncodeErrors with non UTF-8 locales. Non UTF-8 locales were causing crashes on UTF-8 special characters.
Resolved issues#
Fixed an issue where the Compute Partition tests segfaulted when AMDGPU was loaded with optional parameters.
Known issues#
When setting CPX as a partition mode, there is a DRM node limit of 64. This is a known limitation when multiple drivers are using the DRM nodes. The
ls /sys/class/drm
command can be used to see the number of DRM nodes, and the following steps can be used to remove unnecessary drivers:Unload AMDGPU:
sudo rmmod amdgpu
.Remove any unnecessary drivers using
rmmod
. For example, to remove an AST driver, runsudo rmmod ast
.Reload AMDGPU using
modprobe
:sudo modprobe amdgpu
.
rocPRIM (3.2.1)#
Optimizations#
Improved performance of
block_reduce_warp_reduce
when warp size equals block size.
ROCm known issues#
ROCm known issues are tracked on GitHub. Known issues related to individual components are listed in the Detailed component changes section.
Instinct MI300X GPU recovery failure on uncorrectable errors#
For the AMD Instinct MI300X accelerator, GPU recovery resets triggered by uncorrectable errors (UE) might not complete successfully, which can result in the system being left in an undefined state. A system reboot is needed to recover from this state. Additionally, error logging might fail in these situations, hindering diagnostics.
This issue is under investigation and will be resolved in a future ROCm release.
See issue #3766 on GitHub.
ROCm upcoming changes#
The following changes to the ROCm software stack are anticipated for future releases.
rocm-llvm-alt#
The rocm-llvm-alt
package will be removed in an upcoming release. Users relying on the functionality provided by the closed-source compiler should transition to the open-source compiler. Once the rocm-llvm-alt
package is removed, any compilation requesting functionality provided by the closed-source compiler will result in a Clang warning: “[AMD] proprietary optimization compiler has been removed”.
rccl-rdma-sharp-plugins#
The RCCL plugin package, rccl-rdma-sharp-plugins
, will be removed in an upcoming ROCm release.