xDiT diffusion inference#

The rocm/pytorch-xdit Docker image offers a prebuilt, optimized environment based on xDiT for benchmarking diffusion model video and image generation on gfx942 and gfx950 series (AMD Instinct™ MI300X, MI325X, MI350X, and MI355X) GPUs. The image runs ROCm 7.13.0 based on TheRock and includes the following components:

Software components - xdit:v26.5

Software component

Version

TheRock

cbff3d1

rocm-libraries

a668483b

rocm-systems

c76140fa

torch

ff65f5b

torchaudio

e3c6ee2

torchvision

b919bd0

triton

a272dfa

accelerate

46ba481

aiter

bc5ea32c

diffusers

447e571a

distvae

5a0fcbb

xfuser

051db68f

yunchang

631bdfd

Follow this guide to pull the required image, spin up a container, download the model, and run a benchmark. For preview and development releases, see amdsiloai/pytorch-xdit.

What’s new#

  • Hunyuan Video 1.5 sparse attention (SSTA) support

  • Support fp8 MLA for MI355

  • Block wise sparsity support for AMD triton FAv3 Sage attention

Supported models#

The following models are supported for inference performance benchmarking. Some instructions, commands, and recommendations in this documentation might vary by model – select one to get started.

Model
Variant
Variant
Variant
Variant
Variant
Variant
Variant

System validation#

Before running AI workloads, it’s important to validate that your AMD hardware is configured correctly and performing optimally.

If you have already validated your system settings, including aspects like NUMA auto-balancing, you can skip this step. Otherwise, complete the procedures in the System validation and optimization guide to properly configure your system settings before starting.

To test for optimal performance, consult the recommended System health benchmarks. This suite of tests will help you verify and fine-tune your system’s configuration.

Pull the Docker image#

For this tutorial, it’s recommended to use the latest rocm/pytorch-xdit:v26.5 Docker image. Pull the image using the following command:

docker pull rocm/pytorch-xdit:v26.5

Validate and benchmark#

Once the image has been downloaded you can follow these steps to run benchmarks and generate outputs.

Choose your setup method#

You can either use an existing Hugging Face cache or download the model fresh inside the container.

Run inference#

Previous versions#

See xDiT diffusion inference version history to find documentation for previous releases of xDiT diffusion inference performance testing.