xDiT diffusion inference 26.4

xDiT diffusion inference 26.4#

Caution

This documentation does not reflect the latest version of the xDiT diffusion inference performance documentation. See xDiT diffusion inference version history for the latest version.

The rocm/pytorch-xdit Docker image offers a prebuilt, optimized environment based on xDiT for benchmarking diffusion model video and image generation on gfx942 and gfx950 series (AMD Instinct™ MI300X, MI325X, MI350X, and MI355X) GPUs. The image runs ROCm 7.12.0 (preview) based on TheRock and includes the following components:

Software components - xdit:v26.4

Software component	Version
TheRock	9b611c6
rocm-libraries	7567d83
rocm-systems	93bc019
torch	ff65f5b
torchaudio	e3c6ee2
torchvision	b919bd0
triton	a272dfa
accelerate	46ba481
aiter	a169e14
diffusers	a80b192
distvae	bf7531e
xfuser	45c44e7
yunchang	631bdfd

Follow this guide to pull the required image, spin up a container, download the model, and run a benchmark. For preview and development releases, see amdsiloai/pytorch-xdit.

What’s new#

Qwen-Image-2512 support
Z-Image support
Parallel VAE decode support for Wan models
Batch inference and data parallel support

Supported models#

The following models are supported for inference performance benchmarking. Some instructions, commands, and recommendations in this documentation might vary by model – select one to get started.

Model

Hunyuan Video

Wan-AI

FLUX

StableDiffusion

Z-Image

LTX

Qwen-Image

Variant

Hunyuan Video

Hunyuan Video 1.5

Variant

Wan2.1

Wan2.2

Variant

FLUX.1

FLUX.1 Kontext

FLUX.2

FLUX.2 Klein

Variant

stable-diffusion-3.5-large

Variant

Z-Image

Variant

LTX-2

Variant

Qwen-Image

Qwen-Image-Edit

System validation#

Before running AI workloads, it’s important to validate that your AMD hardware is configured correctly and performing optimally.

If you have already validated your system settings, including aspects like NUMA auto-balancing, you can skip this step. Otherwise, complete the procedures in the System validation and optimization guide to properly configure your system settings before starting.

To test for optimal performance, consult the recommended System health benchmarks. This suite of tests will help you verify and fine-tune your system’s configuration.

Pull the Docker image#

For this tutorial, it’s recommended to use the latest rocm/pytorch-xdit:v26.4 Docker image. Pull the image using the following command:

docker pull rocm/pytorch-xdit:v26.4

Previous versions#

See xDiT diffusion inference version history to find documentation for previous releases of xDiT diffusion inference performance testing.

xDiT diffusion inference 26.4

Contents

xDiT diffusion inference 26.4#

What’s new#

Supported models#

System validation#

Pull the Docker image#

Validate and benchmark#

Choose your setup method#

Run inference#

Previous versions#