This section provides information on software and configuration limitations.

Multi-GPU configuration#

Due to limited validation of ROCm on Radeon multi-GPU configuration at this time, we have identified common errors, and applicable recommendations.

Important! ROCm 5.7 release is limited to preview support for multi-GPU configuration.

At this time, only a limited amount of validation has been performed. AMD recommends only proceeding with advanced know-how and at user discretion.

Visit the AI community to share feedback, and Report a bug if you find any issues.

Errors due to GPU and PCIe configuration#

When using two AMD Radeon 7900XTX GPUs, the following HIP error is observed when running PyTorch micro-benchmarking if any one of the two GPUs are connected to a non-CPU PCIe slot (PCIe on chipset):

RuntimeError: HIP error: the operation cannot be performed in the present state
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.