ROCm 6.2.1 release notes#

Applies to Linux and Windows

2024-09-20

14 min read time

The release notes provide a summary of notable changes since the previous ROCm release.

The Compatibility matrix provides the full list of supported hardware, operating systems, ecosystems, third-party components, and ROCm components for each ROCm release.

Release notes for previous ROCm releases are available in earlier versions of the documentation. See the ROCm documentation release history.

Release highlights#

The following are notable new features and improvements in ROCm 6.2.1. For changes to individual components, see Detailed component changes.

rocAL major version change#

The new version of rocAL introduces many new features, but does not modify any of the existing public API functions. However, the version number was incremented from 1.3 to 2.0. Applications linked to version 1.3 must be recompiled to link against version 2.0.

See the rocAL detailed changes for more information.

New support for FBGEMM (Facebook General Matrix Multiplication)#

As of ROCm 6.2.1, ROCm supports Facebook General Matrix Multiplication (FBGEMM) and the related FBGEMM_GPU library.

FBGEMM is a low-precision, high-performance CPU kernel library for convolution and matrix multiplication. It is used for server-side inference and as a back end for PyTorch quantized operators. FBGEMM_GPU includes a collection of PyTorch GPU operator libraries for training and inference. For more information, see the ROCm Model acceleration libraries guide and PyTorch’s FBGEMM GitHub repository.

ROCm Offline Installer Creator changes#

The ROCm Offline Installer Creator 6.2.1 introduces several new features and improvements including:

  • Logging support for create and install logs

  • More stringent checks for Linux versions and distributions

  • Updated prerequisite repositories

  • Fixed CTest issues

ROCm documentation changes#

The HIP documentation has been updated with several new topics aimed at improving usability and providing more detailed information.

Note

To contribute to ROCm documentation, see the ROCm documentation contribution guidelines.

Operating system and hardware support changes#

ROCm 6.2.1 adds support for Ubuntu 24.04.1 (kernel: 6.8 [GA]).

See the Compatibility matrix for the full list of supported operating systems and hardware architectures.

ROCm components#

The following table lists the versions of ROCm components for ROCm 6.2.1, including any version changes from 6.2.0 to 6.2.1.

Click the component’s updated version to go to a detailed list of its changes. Click to go to the component’s source code on GitHub.

Category Group Name Version
Libraries Machine learning and computer vision Composable Kernel 1.1.0
MIGraphX 2.10
MIOpen 3.2.0
MIVisionX 3.0.0
rocAL 1.0.0 ⇒ 2.0.0
rocDecode 0.6.0
rocPyDecode 0.1.0
RPP 1.8.0
Communication RCCL 2.20.5 ⇒ 2.20.5
Math hipBLAS 2.2.0
hipBLASLt 0.8.0
hipFFT 1.0.15
hipfort 0.4.0
hipRAND 2.11.0
hipSOLVER 2.2.0
hipSPARSE 3.1.1
hipSPARSELt 0.2.1
rocALUTION 3.2.0
rocBLAS 4.1.2 ⇒ 4.2.1
rocFFT 1.0.28 ⇒ 1.0.29
rocRAND 3.1.0
rocSOLVER 3.26.0
rocSPARSE 3.2.0
rocWMMA 1.5.0
Tensile 4.41.0
Primitives hipCUB 3.2.0
hipTensor 1.3.0
rocPRIM 3.2.0 ⇒ 3.2.1
rocThrust 3.1.0
Tools System management AMD SMI 24.6.2 ⇒ 24.6.3
rocminfo 1.0.0
ROCm Data Center Tool 0.3.0
ROCm SMI 7.3.0 ⇒ 7.3.0
ROCm Validation Suite 1.0.0
Performance Omniperf 2.0.1 ⇒ 2.0.1
Omnitrace 1.11.2 ⇒ 1.11.2
ROCm Bandwidth Test 1.4.0
ROCProfiler 2.0.0
ROCprofiler-SDK 0.4.0
ROCTracer 4.1.0
Development HIPIFY 18.0.0 ⇒ 18.0.0
ROCdbgapi 0.76.0
ROCm CMake 0.13.0
ROCm Debugger (ROCgdb) 14.2
ROCr Debug Agent 2.0.3
Compilers HIPCC 1.1.1
llvm-project 18.0.0
Runtimes HIP 6.2 ⇒ 6.2.1
ROCr Runtime 1.14.0

Detailed component changes#

The following sections describe key changes to ROCm components.

AMD SMI (24.6.3)#

Changes#

  • Added amd-smi static --ras on Guest VMs. Guest VMs can view enabled/disabled RAS features on Host cards.

Removals#

  • Removed amd-smi metric --ecc & amd-smi metric --ecc-blocks on Guest VMs. Guest VMs do not support getting current ECC counts from the Host cards.

Resolved issues#

  • Fixed TypeError in amd-smi process -G.

  • Updated CLI error strings to handle empty and invalid GPU/CPU inputs.

  • Fixed Guest VM showing passthrough options.

  • Fixed firmware formatting where leading 0s were missing.

HIP (6.2.1)#

Resolved issues#

  • Soft hang when using AMD_SERIALIZE_KERNEL

  • Memory leak in hipIpcCloseMemHandle

HIPIFY (18.0.0)#

Changes#

  • Added CUDA 12.5.1 support.

  • Added cuDNN 9.2.1 support.

  • Added LLVM 18.1.8 support.

  • Added hipBLAS 64-bit APIs support.

  • Added Support for math constants math_constants.h.

Omniperf (2.0.1)#

Changes#

  • Enabled rocprofv1 for MI300 hardware.

  • Added dependency checks on application launch.

  • Updated Omniperf packaging.

  • Rolled back Grafana version in Dockerfile for Angular plugin compatibility.

  • Added GPU model distinction for MI300 systems.

  • Refactored and updated documemtation.

Resolved issues#

  • Fixed an issue with analysis output.

  • Fixed issues with profiling multi-process and multi-GPU applications.

Optimizations#

  • Reduced running time of Omniperf when profiling.

  • Improved console logging.

Omnitrace (1.11.2)#

Known issues#

Perfetto can no longer open Omnitrace proto files. Loading Perfetto trace output .proto files in the latest version of ui.perfetto.dev can result in a dialog with the message, “Oops, something went wrong! Please file a bug.” The information in the dialog will refer to an “Unknown field type.” The workaround is to open the files with the previous version of the Perfetto UI found at https://ui.perfetto.dev/v46.0-35b3d9845/#!/.

See issue #3767 on GitHub.

RCCL (2.20.5)#

Known issues#

On systems running Linux kernel 6.8.0, such as Ubuntu 24.04, Direct Memory Access (DMA) transfers between the GPU and NIC are disabled and impacts multi-node RCCL performance. This issue was reproduced with RCCL 2.20.5 (ROCm 6.2.0 and 6.2.1) on systems with Broadcom Thor-2 NICs and affects other systems with RoCE networks using Linux 6.8.0 or newer. Older RCCL versions are also impacted.

This issue will be addressed in a future ROCm release.

See issue #3772 on GitHub.

rocAL (2.0.0)#

Changes#

  • The new version of rocAL introduces many new features, but does not modify any of the existing public API functions.However, the version number was incremented from 1.3 to 2.0. Applications linked to version 1.3 must be recompiled to link against version 2.0.

  • Added development and test packages.

  • Added C++ rocAL audio unit test and Python script to run and compare the outputs.

  • Added Python support for audio decoders.

  • Added Pytorch iterator for audio.

  • Added Python audio unit test and support to verify outputs.

  • Added rocDecode for HW decode.

  • Added support for:

    • Audio loader and decoder, which uses libsndfile library to decode wav files

    • Audio augmentation - PreEmphasis filter, Spectrogram, ToDecibels, Resample, NonSilentRegionDetection, MelFilterBank

    • Generic augmentation - Slice, Normalize

    • Reading from file lists in file reader

    • Downmixing audio channels during decoding

    • TensorTensorAdd and TensorScalarMultiply operations

    • Uniform and Normal distribution nodes

  • Image to tensor updates

  • ROCm install - use case graphics removed

Known issues#

  • Dependencies are not installed with the rocAL package installer. Dependencies must be installed with the prerequisite setup script provided. See the rocAL README on GitHub for details.

rocBLAS (4.2.1)#

Removals#

  • Removed Device_Memory_Allocation.pdf link in documentation.

Resolved issues#

  • Fixed error/warning message during rocblas_set_stream() call.

rocFFT (1.0.29)#

Optimizations#

  • Implemented 1D kernels for factorizable sizes less than 1024.

ROCm SMI (7.3.0)#

Optimizations#

  • Improved handling of UnicodeEncodeErrors with non UTF-8 locales. Non UTF-8 locales were causing crashes on UTF-8 special characters.

Resolved issues#

  • Fixed an issue where the Compute Partition tests segfaulted when AMDGPU was loaded with optional parameters.

Known issues#

  • When setting CPX as a partition mode, there is a DRM node limit of 64. This is a known limitation when multiple drivers are using the DRM nodes. The ls /sys/class/drm command can be used to see the number of DRM nodes, and the following steps can be used to remove unnecessary drivers:

    1. Unload AMDGPU: sudo rmmod amdgpu.

    2. Remove any unnecessary drivers using rmmod. For example, to remove an AST driver, run sudo rmmod ast.

    3. Reload AMDGPU using modprobe: sudo modprobe amdgpu.

rocPRIM (3.2.1)#

Optimizations#

  • Improved performance of block_reduce_warp_reduce when warp size equals block size.

ROCm known issues#

ROCm known issues are tracked on GitHub. Known issues related to individual components are listed in the Detailed component changes section.

Instinct MI300X GPU recovery failure on uncorrectable errors#

For the AMD Instinct MI300X accelerator, GPU recovery resets triggered by uncorrectable errors (UE) might not complete successfully, which can result in the system being left in an undefined state. A system reboot is needed to recover from this state. Additionally, error logging might fail in these situations, hindering diagnostics.

This issue is under investigation and will be resolved in a future ROCm release.

See issue #3766 on GitHub.

ROCm upcoming changes#

The following changes to the ROCm software stack are anticipated for future releases.

rocm-llvm-alt#

The rocm-llvm-alt package will be removed in an upcoming release. Users relying on the functionality provided by the closed-source compiler should transition to the open-source compiler. Once the rocm-llvm-alt package is removed, any compilation requesting functionality provided by the closed-source compiler will result in a Clang warning: “[AMD] proprietary optimization compiler has been removed”.

rccl-rdma-sharp-plugins#

The RCCL plugin package, rccl-rdma-sharp-plugins, will be removed in an upcoming ROCm release.