Limitations and recommended settings#
This section provides information on software and configuration limitations.
Important!
Radeon™ PRO Series graphics cards are not designed nor recommended for datacenter usage. Use in a datacenter setting may adversely affect manageability, efficiency, reliability, and/or performance. GD-239.
Important!
ROCm is not officially supported on any mobile SKUs.
Multi-GPU configuration#
Windows Subsystem for Linux (WSL)#
WSL recommended settings and limitations.
WSL recommended settings#
Optimizing GPU utilization
WSL overhead is a noted bottleneck for GPU utilization. Increasing the batch size of operations will load the GPU more optimally, reducing time required for AI workloads. Optimal batch sizes vary by model, and macro-parameters.
WSL limitations#
Due to limited validation of ROCm™ on Radeon™ WSL configuration at this time, common errors and applicable recommendations are identified.
Important! ROCm 6.1.3 release is limited to preview support for WSL configuration.
At this time, only a limited amount of validation has been performed. AMD recommends only proceeding with advanced know-how and at user discretion.
Visit the AI community to share feedback, and Report a bug if you find any issues.
Note
Refer to Microsoft WSL Documentation for latest information on WSL support for mGPU configurations.
ROCm support in WSL environments#
Due to WSL architectural limitations for native Linux User Kernel Interface (UKI), rocm-smi is not supported.
Issue |
Limitations |
---|---|
UKI does not currently support rocm-smi |
No current support for: |
Not currently supported.
Not currently supported.
Running PyTorch in virtual environments#
Running PyTorch in virtual environments requires a manual libhsa-runtime64.so update.
When using the WSL usecase and hsa-runtime-rocr4wsl-amdgpu package (installed with PyTorch wheels), users are required to update to a WSL compatible runtime lib.
Solution:
Enter the following commands:
location=`pip show torch | grep Location | awk -F ": " '{print $2}'`
cd ${location}/torch/lib/
rm libhsa-runtime64.so*
cp /opt/rocm/lib/libhsa-runtime64.so.1.2 libhsa-runtime64.so
6.1.3 release known issues#
Radeon GPUs do not support large amounts of simultaneous parallel workloads. We do not recommend exceeding 2 simultaneous compute workloads. This assumes workloads are running alongside a graphics environment (eg: Linux desktop).
Intermittent gpureset errors may be seen with Automatic 1111 webUI with IOMMU enabled. Refer to the AMD community knowledge base for suggested resolutions.
ROCm debugger is unstable and not fully supported in this release.
“Automatic suspend state when idle” is not recommended when running AI workloads.
Connecting monitors to multiple GPUs on an Intel Sapphire Rapids powered system may cause displays to not display correctly. We recommend connecting all the monitors into one GPU. For configurations where monitors must be plugged into multiple GPUs, please add the following to your grub file: “intel_iommu=on iommu=on”
WSL specific issues#
Some long running rocsparse kernels may trigger a TDR.