xDiT diffusion inference

xDiT diffusion inference#

2026-02-17

27 min read time

Applies to Linux

The rocm/pytorch-xdit Docker image offers a prebuilt, optimized environment based on xDiT for benchmarking diffusion model video and image generation on gfx942 and gfx950 series (AMD Instinct™ MI300X, MI325X, MI350X, and MI355X) GPUs. The image runs ROCm 7.11.0 (preview) based on TheRock and includes the following components:

Software components - xdit:v26.1

Software component	Version
TheRock	1728a81
rccl	d23d18f
composable_kernel	ab0101c
rocm-libraries	a2f7c35
rocm-systems	659737c
torch	91be249
torchvision	b919bd0
triton	a272dfa
accelerate	b521400f
aiter	de14bec0
diffusers	6708f5
xfuser	0a3d7a
yunchang	2c9b712

Follow this guide to pull the required image, spin up a container, download the model, and run a benchmark. For preview and development releases, see amdsiloai/pytorch-xdit.

What’s new#

HunyuanVideo 1.5 support
Z-Image Turbo support
Wan model sharding

Supported models#

The following models are supported for inference performance benchmarking. Some instructions, commands, and recommendations in this documentation might vary by model – select one to get started.

Model

Hunyuan Video

Wan-AI

FLUX

StableDiffusion

Z-Image

Variant

Hunyuan Video

Hunyuan Video 1.5

Wan2.1

Wan2.2

FLUX.1

FLUX.1 Kontext

FLUX.2

stable-diffusion-3.5-large

Z-Image Turbo

Note

To learn more about your specific model see the Hunyuan Video model card on Hugging Face or visit the GitHub page. Note that some models require access authorization before use via an external license agreement through a third party.

Note

To learn more about your specific model see the Hunyuan Video 1.5 model card on Hugging Face or visit the GitHub page. Note that some models require access authorization before use via an external license agreement through a third party.

Note

To learn more about your specific model see the Wan2.1 model card on Hugging Face or visit the GitHub page. Note that some models require access authorization before use via an external license agreement through a third party.

Note

To learn more about your specific model see the Wan2.2 model card on Hugging Face or visit the GitHub page. Note that some models require access authorization before use via an external license agreement through a third party.

Note

To learn more about your specific model see the FLUX.1 model card on Hugging Face or visit the GitHub page. Note that some models require access authorization before use via an external license agreement through a third party.

Note

To learn more about your specific model see the FLUX.1 Kontext model card on Hugging Face or visit the GitHub page. Note that some models require access authorization before use via an external license agreement through a third party.

Note

To learn more about your specific model see the FLUX.2 model card on Hugging Face or visit the GitHub page. Note that some models require access authorization before use via an external license agreement through a third party.

Note

To learn more about your specific model see the stable-diffusion-3.5-large model card on Hugging Face or visit the GitHub page. Note that some models require access authorization before use via an external license agreement through a third party.

Note

To learn more about your specific model see the Z-Image Turbo model card on Hugging Face or visit the GitHub page. Note that some models require access authorization before use via an external license agreement through a third party.

System validation#

Before running AI workloads, it’s important to validate that your AMD hardware is configured correctly and performing optimally.

If you have already validated your system settings, including aspects like NUMA auto-balancing, you can skip this step. Otherwise, complete the procedures in the System validation and optimization guide to properly configure your system settings before starting.

To test for optimal performance, consult the recommended System health benchmarks. This suite of tests will help you verify and fine-tune your system’s configuration.

Pull the Docker image#

For this tutorial, it’s recommended to use the latest rocm/pytorch-xdit:v26.1 Docker image. Pull the image using the following command:

docker pull rocm/pytorch-xdit:v26.1

Previous versions#

See xDiT diffusion inference performance testing version history to find documentation for previous releases of xDiT diffusion inference performance testing.

xDiT diffusion inference

Contents

xDiT diffusion inference#

What’s new#

Supported models#

System validation#

Pull the Docker image#

Validate and benchmark#

Choose your setup method#

Run inference#

Previous versions#