System setup for AI workloads on ROCm#

2025-09-30

3 min read time

Applies to Linux and Windows

Before you begin training or inference on AMD Instinct™ GPUs, complete the following system setup and validation steps to ensure optimal performance.

Prerequisite system validation#

First, confirm that your system meets all software and hardware prerequisites. See Prerequisite system validation before running AI workloads.

Docker images for AMD Instinct GPUs#

AMD provides prebuilt Docker images for AMD Instinct™ MI300X and MI325X GPUs. These images include ROCm-enabled deep learning frameworks and essential software components. They support single-node and multi-node configurations and are ready for training and inference workloads out of the box.

Multi-node training#

For instructions on enabling multi-node training, see Multi-node setup for AI workloads.

System optimization and validation#

Before running workloads, verify that the system is configured correctly and operating at peak efficiency. Recommended steps include:

  • Disabling NUMA auto-balancing

  • Running system benchmarks to validate hardware performance

For details on running system health checks, see System health benchmarks for AI workloads.