Installation Troubleshooting#

Applies to Linux

2024-06-20

5 min read time

Troubleshooting describes issues that some users encounter when installing the ROCm tools or libraries.

Issue #1: Installation Methods#

As an example, the latest version of ROCm is 6.0.2, but the installation instructions result in release 6.0.0 being installed.

Solution: You may have used the quick-start installation method which only installs the latest major release. Use one of the other available installation methods:

Refer to ROCm Issue #2422 for additional details.

Issue #2: Install Prerequisites#

When installing, I see the following message: Problem: nothing provides perl-URI-Encode needed to be installed by ...

Solution: Ensure that the Installation prerequisites are installed. There are prerequisite PERL packages required for SUSE. RHEL also requires Extra Packages for Enterprise Linux (EPEL) to be installed, which is also mentioned in prerequisites. Be sure to install those first, then repeat your installation steps.

Refer to ROCm Issue #1827.

Issue #3: PATH variable#

After successfully installing ROCm, when I run rocminfo (or another ROCm tool) the command is not found.

Solution: You may need to update your PATH environment variable as described in Post-installation instructions.

Refer to ROCm Issue #1607.

Issue #4: C++ libraries#

When compiling HIP programs, I get a linking error for -lstdc++, or fatal error: 'cmath' file not found.

Solution: You can install C++ libraries using your package manager. The following is an Ubuntu example:

sudo apt-get install libstdc++-12-dev

Refer to ROCm Issue #2031.

Issue #5: Application hangs on Multi-GPU systems#

Running on a system with multiple GPUs the application hangs with the GPU use at 100%, but without the expected GPU temperature buildup

This issue often results in the following message in the application transcript:

NCCL WARN Missing "iommu=pt" from kernel command line which can lead to system instablity or hang!

Solution: To resolve this issue add iommu=pt to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub. Then run the following command:

sudo update-grub

Reboot the system, and run the following command:

cat /proc/cmdline

The returned information should reflect the addition of iommu:

BOOT_IMAGE=/vmlinuz-5.15.0-101-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro iommu=pt

Refer to RCCL Issue #1129 for more information.

Issue #6: Additional packages for Docker installations#

Docker images often come with minimal installations, meaning some essential packages might be missing. When installing ROCm within a Docker container, you might need to install additional packages for a successful ROCm installation. Use the following commands to install the prerequisite packages.

apt update
apt install sudo wget
dnf install sudo wget
subscription-manager register --username <username> --password <password>
subscription-manager attach --auto
subscription-manager repos --enable codeready-builder-for-rhel-9-x86_64-rpms
zypper install sudo wget SUSEConnect
SUSEConnect -r <REGCODE>
SUSEConnect -p sle-module-desktop-application/15.4/x86_64
SUSEConnect -p sle-module-development-tools/15.4/x86_64
SUSEConnect -p PackageHub/15.4/x86_64

After installing these packages and registering using your license for Enterprise Linux (if applicable), install ROCm following the Quick start installation guide in your Docker container.