Radeon Limitations and recommended settings#
This section provides information on software and configuration limitations.
Note
For ROCm on Instinct known issues, refer to AMD ROCm DocumentationFor OpenMPI limitations, see ROCm UCX OpenMPI on Github
7.0.2 release known issues#
Note
ROCm 7.0.2 is a preview release, meaning that stability and performance are not yet optimized. Furthermore, only Pytorch is currently available on Windows - the rest of the ROCm stack is only supported on Linux.
AMD is aware and actively working on resolving these issues for future releases.
Linux#
Known issues#
Intermittent script failure may be observed while running text-to-image inference workloads with PyTorch.
Intermittent system crash may be observed while running Llama 3 inference workloads with PyTorch and vLLM on multiple (4x) AMD Radeon™ AI PRO R9700 graphics products.
Intermittent script failure may be observed while running Mixtral-8x7b-instruct-v0.1 inference workloads with PyTorch and vLLM on Radeon™ PRO W7900 series graphics products.
Lower than expected performance may be observed while running inference workloads with vLLM on AMD Radeon™ graphics products.
Intermittent script failure (out of memory) may be observed while running FP8 LLM inference workloads on Radeon™ RX 9060 series graphics products.
Multi-GPU configuration#
AMD has identified common errors when running ROCm™ on Radeon™ multi-GPU configuration at this time, along with the applicable recommendations.
See mGPU known issues and limitations for a complete list of mGPU known issues and limitations.
Windows#
Note
The following Windows known issues and limitations are applicable to the 6.4.4 preview release.
Note
If you encounter errors related to missing .dll libraries, install Visual C++ 2015-2022 Redistributables.
Known issues#
If you encounter an error relating to Application Control Policy blocking DLL loading, check that Smart App Control is OFF. Note that to re-enable Smart App Control, you will need to reinstall windows. A future release will fix this requirement.
Intermittent application crash or driver timeout may be observed while running inference workloads with PyTorch on Windows while also running other applications (such as games or web browsers).
Failure to launch may be observed after installation while running ComfyUI with Smart App Control enabled.
Limitations#
No backward pass support (essential for ML training).
Only Python 3.12 is supported.
On Windows, only Pytorch is supported, not the entire ROCm stack.
On Windows, the latest version of transformers should be installed, via pip install. Some older versions of transformers (<4.55.5) might not be supported.
On Windows, only LLM batch sizes of 1 are officially supported.
On Windows, the torch.distributed module is currently not supported (may impact A1111 as an example. Some functions from diffusers and accelerate module may get affected).
WSL#
Known issues#
Intermittent script failure may be observed while running Llama 3 inference workloads with vLLM in WSL2. End users experiencing this issue are recommended to follow vLLM setup instructions here.
Intermittent script failure or driver timeout may be observed while running Stable Diffusion 3 inference workloads with JAX.
Lower than expected performance may be observed while running inference workloads with JAX in WSL2.
Intermittent script failure may be observed while running Resnet50, BERT, or InceptionV3 training workloads with ONNX runtime.
Output error message (resource leak) may be observed while running Llama 3.2 workloads with vLLM.
Output error message (VaMgr) may be observed while running PyTorch workloads in WSL2.
Intermittent script failure or driver timeout may be observed while running Stable Diffusion inference workloads with TensorFlow.
Intermittent application crash may be observed while running Stable Diffusion workloads with ComfyUI and MIGraphX on Radeon™ RX 9060 series graphics products.
Intermittent script failure may occur while running Stable Diffusion 2 workloads with PyTorch and MIGraphX
Intermittent script failure may occur while running LLM workloads with PyTorch on Radeon™ PRO W7700 graphics products.
Lower than expected performance (compared to native Linux) may be observed while running inference workloads (eg. Llama2, BERT) in WSL2.
Important!
Radeon™ PRO Series graphics cards are not designed nor recommended for datacenter usage. Use in a datacenter setting may adversely affect manageability, efficiency, reliability, and/or performance. GD-239.
Important!
ROCm is not officially supported on any mobile SKUs.
WSL recommended settings#
Optimizing GPU utilization
WSL overhead is a noted bottleneck for GPU utilization. Increasing the batch size of operations will load the GPU more optimally, reducing time required for AI workloads. Optimal batch sizes vary by model, and macro-parameters.
ROCm support in WSL environments#
Due to WSL architectural limitations for native Linux User Kernel Interface (UKI), rocm-smi is not supported.
Issue |
Limitations |
|---|---|
UKI does not currently support rocm-smi |
No current support for: |
Not currently supported.
Not currently supported.
Running PyTorch in virtual environments
Running PyTorch in virtual environments requires a manual libhsa-runtime64.so update.
When using the WSL usecase and hsa-runtime-rocr4wsl-amdgpu package (installed with PyTorch wheels), users are required to update to a WSL compatible runtime lib.
Solution:
Enter the following commands:
location=`pip show torch | grep Location | awk -F ": " '{print $2}'`
cd ${location}/torch/lib/
rm libhsa-runtime64.so*
cp /opt/rocm/lib/libhsa-runtime64.so.1.2 libhsa-runtime64.so