Running ComfyUI generative workflows from Python on AMD Instinct GPUs

Running ComfyUI generative workflows from Python on AMD Instinct GPUs#

Authors: Vicky Tsang, Dipto Deb
Knowledge level: Intermediate

This tutorial demonstrates how to drive ComfyUI generative-AI workflows entirely from Python (without using a browser) on AMD Instinct™ data-center GPUs running the ROCm™ software stack. Instead of working with the graphical interface, you will simply submit your workflow to the ComfyUI server as a JSON document through its HTTP API, wait for the task to finish, and then retrieve the results. This headless, scriptable approach is what you need when working with a remote server with no display, and it is the basis for automating generative pipelines in scripts, CI, or batch jobs on HPC and cloud infrastructure.

This tutorial runs two practical workflows on the same input photos: the first one turns the photos into 3D meshes with the Hunyuan3D v2.1 model, and the second one turns them into short videos with the Wan2.2 5B model. With a Jupyter notebook, you submit each task from Python and see the output directly in the notebook.

This tutorial assumes no prior experience with ComfyUI. If some of the terms in this tutorial are new to you, don’t worry—the Key concepts section below will explain the terms in plain language.

Companion tutorial: This notebook is a follow-up to Text-to-video generation with ComfyUI and an AMD Radeon GPU. If you are brand new to ComfyUI, the Radeon tutorial is a good starting point: it familiarizes you with ComfyUI by walking you through its interactive browser UI (see the comparison table below).

Both tutorials use ComfyUI on AMD GPUs, but they target different audiences and workflows:

Radeon tutorial This tutorial

Hardware Consumer AMD Radeon GPU (e.g., RX 7900 XTX) Data-center AMD Instinct GPUs (MI300X / MI325X / MI350X / MI355X)

How you drive ComfyUI Interactively, in the browser UI Headless, from Python over the HTTP API (no browser)

Workflows shown A single text-to-video workflow Two workflows: image-to-3D and image-to-video

Best for Learning ComfyUI hands-on at your desk Automating generative workflows on remote servers and in pipelines

In short: the Radeon tutorial teaches you how to use ComfyUI interactively; this one shows you how to operate it as a service on server-class hardware.

Companion blog post: For performance information of a number of ComfyUI workflows on AMD Instinct GPUs, see the AMD ROCm blog Accelerating ComfyUI Workflows on AMD Instinct MI355X GPUs with ROCm.

What you will learn#

Core skills: Driving ComfyUI programmatically on AMD Instinct

How to launch the ComfyUI server and confirm a ROCm + PyTorch runtime.
How ComfyUI’s HTTP API works, and how to use a few small helpers to submit a workflow, poll for completion, and retrieve outputs.
How to run two real diffusion workflows from Python and display their outputs:
- Image-to-3D using Hunyuan3D v2.1: Turning a photo into a 3D mesh (.glb).
- Image-to-video using Wan2.2 5B: Turning a photo into a short MP4 camera orbit.

Optional skill: Measuring performance

How to wrap a workflow in a small harness and measure throughput on your own hardware.

Tutorial at a glance#

This tutorial consists of three parts:

Part 1: Set up ComfyUI. Launch the ComfyUI server inside a ROCm container and load the API helpers.
Part 2: Generate with two diffusion workflows. Run the image-to-3D and image-to-video workflows on three photos and visualize the results.
Part 3: Measure throughput. (optional) Benchmark a workflow on your own GPU.

Key concepts#

This section introduces the main technical concepts involved in this tutorial.

Diffusion models#

Diffusion models are the generative-AI models behind most modern image, video, and 3D generation. They learn to start from random noise and iteratively “denoise” it into a coherent result, guided by your prompt or input image. Each iteration is called a step.

ComfyUI and workflows#

ComfyUI is an open-source, node-based interface for diffusion models. You build a pipeline by wiring together nodes—“load a model”, “encode a prompt”, “sample”, “decode”, “save”—into a graph called a workflow. ComfyUI can export any workflow to a JSON document (known as the API format). This tutorial submits the workflows directly in the API format, so there’s no need to use a visual editor for them.

Driving ComfyUI headlessly (the HTTP API)#

ComfyUI runs as a long-lived HTTP server. Besides the graphical browser UI, it exposes an HTTP API: you POST a workflow to /prompt, poll /history/<id> until it finishes, and download generated files from /view. “Headless” here means using only this API, with no display attached—the standard way to run on a remote data-center node.

The two workflows#

Image-to-3D using Hunyuan3D v2.1: From one photo it produces a 3D mesh saved as a .glb file.
Image-to-video using Wan2.2 5B: From one photo (plus a text prompt) it synthesizes a short video by generating every frame.

ROCm and AMD Instinct GPUs#

ROCm™ is AMD’s open software platform for GPU computing. AMD Instinct™ GPUs (MI300X, MI325X, MI350X, MI355X) are AMD’s data-center accelerators for AI. Everything is run inside a ROCm Docker container so the environment is easily reproducible.

Prerequisites#

This tutorial was developed and tested using the following setup.

Hardware#

An AMD Instinct GPU: This tutorial was tested on an AMD Instinct MI300X and MI355X. Ensure you are using an AMD Instinct GPU or compatible hardware with ROCm support, and that your system meets the official requirements. Your GPU also needs to have enough VRAM to load the model checkpoints listed below.

Software#

The table below lists the exact versions of the software this tutorial was tested against. Other versions may work but have not been validated.

Component	Version (tested)	Reference
ROCm	7.2.0	Quick-start install guide
Docker	24.0.7	Docker install guide
`rocm/comfyui` image	`comfyui-0.18.2.amd0_rocm7.2.0_ubuntu24.04`	ComfyUI-on-ROCm image (pulled in Environment setup)
PyTorch	2.10 (ROCm 7.2 build)	Ships inside the image

You do not need to build anything manually—the container image used in this tutorial ships ComfyUI and a ROCm-enabled PyTorch. However, the model checkpoints (~27 GB total) are not bundled, but can be downloaded from Hugging Face automatically (see Part 2) if you opt in by setting DOWNLOAD_MODELS = True (see the configuration cell in Part 1). Make sure you have enough free disk space before enabling the option.

Tip: Confirm that Docker is installed and it can be run without sudo by running docker run hello-world in a terminal before you start.

Environment setup#

This section sets up the notebook environment inside a ComfyUI-on-ROCm container.

Note: This notebook is available at the AI Developer Hub GitHub repository.

All workflows in this notebook run inside the container, where the COMFYUI_PATH environment variable points at the ComfyUI source tree (typically /workload/ComfyUI).

1. Pull the image and launch the container#

Pull the pre-built ComfyUI-on-ROCm image and start a container with the AMD GPUs exposed. Two ports are forwarded to the host (8888 for JupyterLab and 8188 for the ComfyUI server) and the directory that contains this notebook is mounted.

docker pull rocm/comfyui:comfyui-0.18.2.amd0_rocm7.2.0_ubuntu24.04

# Run from the directory where you cloned the AI Developer Hub GitHub repository.
docker run -it --rm \
  --device=/dev/kfd \
  --device=/dev/dri \
  --group-add video \
  --cap-add=SYS_PTRACE \
  --security-opt seccomp=unconfined \
  --ipc=host \
  --shm-size=16g \
  -p 8888:8888 \
  -p 8188:8188 \
  -e COMFYUI_PATH=/workload/ComfyUI \
  -v "$PWD:/work" \
  rocm/comfyui:comfyui-0.18.2.amd0_rocm7.2.0_ubuntu24.04

This opens a shell inside the container, with this notebook available under /work. Run the remaining steps there.

2. Open this notebook in JupyterLab#

The image does not start JupyterLab automatically. From the container shell, install JupyterLab and start it bound to all interfaces so Docker’s port forwarding can reach it:

pip install jupyterlab
jupyter lab --no-browser --ip=0.0.0.0 --port=8888 --allow-root --notebook-dir=/work

On startup it prints a URL containing a one-time token, in the format of http://127.0.0.1:8888/lab?token=<long-token>.

If the GPU is on a remote machine you accessed via SSH, open a second terminal on your local system and forward both ports so you can reach the ComfyUI server and its browser UI from your local browser:

ssh -L 8888:localhost:8888 -L 8188:localhost:8188 <user>@<gpu-host>

Paste the printed http://127.0.0.1:8888/lab?token=<long-token> URL into your local browser, then open this notebook from docs/notebooks/inference/t2v_comfyui_api_mode_instinct.ipynb. To stop the server when you are done, press Ctrl+C twice in the container shell.

3. Confirm the GPUs are visible#

ROCm includes amd-smi, a command-line tool that lists the AMD GPUs the container can see. Run the cell below to confirm your AMD Instinct GPUs’ visibility. If the command is not found or no GPU appears, recheck the container’s --device flags and your ROCm installation before continuing.

# List the AMD GPUs visible inside the container. You should see your Instinct GPU(s).
# For ROCm 6.4 and earlier, use `rocm-smi` instead of `amd-smi`.
!amd-smi list || rocm-smi

Part 1: Set up and verify ComfyUI on Instinct#

With the container running and this notebook opened inside it, Part 1 establishes the runtime and the server we’ll talk to for the rest of the tutorial. There are three short steps:

Verify that PyTorch is built with ROCm support and that it sees the GPU, and set up the notebook’s working directories.
Stage the input photos and workflow files used in this tutorial.
Launch the ComfyUI server as a background process and load a small set of HTTP API helpers that will be used in Parts 2 and 3.

Install the notebook’s Python dependencies#

The ComfyUI-on-ROCm image ships ComfyUI and a ROCm-enabled PyTorch. The cell below installs a few additional Python packages this notebook uses for visualization:

trimesh + fast-simplification: For loading and decimating the Hunyuan3D .glb mesh for the turntable preview (Part 2).
imageio + imageio-ffmpeg: For encoding the preview to an MP4 (it bundles its own ffmpeg, so no system-level tools are required).
pillow: For image handling in the inline result grid.

None of these tools use the GPU and they only render the outputs for display.

%pip install --quiet trimesh fast-simplification "imageio[ffmpeg]" pillow

Verifying the runtime: PyTorch ROCm and GPU visibility#

The next cell confirms that PyTorch was built with ROCm support, prints the detected device name, and locates $COMFYUI_PATH. Run it before the workflow sections. If the runtime cannot be verified, fix the installation before proceeding.

import os
import sys
import json
import shutil
import time
from copy import deepcopy
from pathlib import Path

import torch

assert torch.cuda.is_available(), (
    "No HIP / CUDA device visible to PyTorch. Check container --device flags and ROCm install."
)
print("python:", sys.version.split()[0])
print("torch: ", torch.__version__, "| HIP:", getattr(torch.version, "hip", None) or "(not a ROCm build)")
print("GPU 0: ", torch.cuda.get_device_name(0))

COMFYUI_PATH = Path(os.environ.get("COMFYUI_PATH", "/workload/ComfyUI"))
assert (COMFYUI_PATH / "main.py").is_file(), (
    f"{COMFYUI_PATH}/main.py not found - set COMFYUI_PATH to your ComfyUI checkout."
)

NOTEBOOK_DIR = Path(os.getcwd()).resolve()  # also where utils_comfyui.py lives
WORKFLOWS_DIR = NOTEBOOK_DIR / "workflows"
ASSETS_DIR = NOTEBOOK_DIR / "assets"
OUTPUTS_DIR = NOTEBOOK_DIR / "outputs"
OUTPUTS_DIR.mkdir(exist_ok=True)

# Set to True to let Part 2 download the model checkpoints (~27 GB total) from
# Hugging Face into ComfyUI/models/. DOWNLOAD_MODELS is False by default so a first read does
# not trigger a large download; flip it once you are ready to run the workflows.
DOWNLOAD_MODELS = False

print("COMFYUI_PATH:", COMFYUI_PATH)
print("Notebook dir:", NOTEBOOK_DIR)
print("DOWNLOAD_MODELS:", DOWNLOAD_MODELS)

Stage the tutorial inputs and workflows#

Part 2 needs two groups of small files that are versioned with this tutorial:

Input photos: Three 768x768 PNGs (apple, banana, pineapple).
Workflow definitions: Two ComfyUI API-format JSON files, one for each of the two workflows.

The photos are sourced from Wikimedia Commons; see the input photographs README for applicable image licenses and attribution.

These files, together with the utils_comfyui.py helper module, are versioned in the ROCm/ComfyUI repository under examples/, not in this notebook repository. The next cell downloads the helper module next to this notebook (if missing) and stages the assets and workflows under the notebook’s assets/ and workflows/ directories. If you are running inside a ComfyUI checkout that already ships these files, it finds them locally; otherwise it downloads them from RAW_BASE_URL.

import urllib.request

# This tutorial's helper module and supporting files (the workflow JSONs and
# input photos) are versioned in the ROCm/ComfyUI repo under examples/, not in
# this notebook repo. RAW_BASE_URL points there; the helper module is fetched next
# to this notebook if it is not already present (a no-op inside a checkout that
# already ships it).
RAW_BASE_URL = "https://raw.githubusercontent.com/ROCm/ComfyUI/amd-integration/examples"

if not (NOTEBOOK_DIR / "utils_comfyui.py").is_file():
    print("Fetching utils_comfyui.py from ROCm/ComfyUI ...")
    urllib.request.urlretrieve(f"{RAW_BASE_URL}/utils_comfyui.py",
                               str(NOTEBOOK_DIR / "utils_comfyui.py"))

from utils_comfyui import stage_files

ASSET_FILES = [
    "amd_comfyui_tutorial_apple_seed.png",
    "amd_comfyui_tutorial_banana_seed.png",
    "amd_comfyui_tutorial_pineapple_seed.png",
]
WORKFLOW_FILES = [
    "hunyuan3d_v2_1_image_to_3d.json",
    "wan2_2_5b_image_to_video.json",
]

print("Input photographs:")
stage_files(ASSET_FILES, ASSETS_DIR,
            [NOTEBOOK_DIR / "assets"], f"{RAW_BASE_URL}/assets")
print("Workflow definitions:")
stage_files(WORKFLOW_FILES, WORKFLOWS_DIR,
            [NOTEBOOK_DIR / "workflows"], f"{RAW_BASE_URL}/workflows")

missing = [f for f in ASSET_FILES if not (ASSETS_DIR / f).is_file()]
missing += [f for f in WORKFLOW_FILES if not (WORKFLOWS_DIR / f).is_file()]
assert not missing, f"Could not stage {missing}. Set RAW_BASE_URL or place the files manually."
print("All inputs and workflows are in place.")

Launching the ComfyUI server in the background#

ComfyUI runs as a long-lived HTTP server. The next cell starts the server by running python $COMFYUI_PATH/main.py --listen --port 8188 in the background, captures its PID, and waits for the /system_stats endpoint to return 200 (which signals “ready to accept prompts”). If a server is already running on 127.0.0.1:8188 (for example, you started it manually before opening Jupyter), the cell detects that and this becomes a no-op.

The PID is stored in server_pid. The final cell of the notebook will terminate the process.

from utils_comfyui import start_comfyui_server

SERVER_PORT = int(os.environ.get("COMFYUI_PORT_HOST", "8188"))
SERVER_URL = f"http://127.0.0.1:{SERVER_PORT}"

server_pid = start_comfyui_server(
    COMFYUI_PATH, SERVER_URL, SERVER_PORT, OUTPUTS_DIR / "comfyui_server.log"
)

The ComfyUI HTTP API and a tiny client#

We drive the server through three endpoints:

POST /prompt submits a workflow (returns a prompt_id).
GET /history/<prompt_id> checks whether a run has finished.
GET /view downloads a generated file.

To keep this notebook readable, the tiny client that wraps these calls is in the companion module utils_comfyui.py, staged next to this notebook by the cell above. Its ComfyUIClient exposes queue_prompt, wait_for_completion, and download_outputs, plus a run_workflow convenience method that chains all three (submit → wait → download).

The full ComfyUI Server API is documented in ComfyUI server docs.

from utils_comfyui import ComfyUIClient

client = ComfyUIClient(SERVER_URL)
print("ComfyUI client ready ->", client.server_url)

Part 2: Generate with two diffusion workflows on real photographs#

Part 2 demonstrates two independent ComfyUI generative workflows, both driven from Python over the HTTP API set up in Part 1 and run on the same set of input photos (apple, banana, pineapple, 768x768 each).

	Workflow	Input	AI output
Example A	Hunyuan3D v2.1 image-to-3D: Flow-based diffusion transformer, `octree_resolution=4096`	One photo	A watertight 3D mesh in glTF (`.glb`) format. The geometry is the AI output.
Example B	Wan2.2 5B image-to-video (text-image-to-video variant): 5B-parameter video diffusion transformer	The same photo plus a text prompt	An MP4 clip (81 frames at 24 fps), each frame of which is generated by the model.

The visualisation cell at the end of Part 2 lays out a 3-row grid:

Row 1 (input photo): The source photograph fed to both workflows.
Row 2 (mesh turntable): A deterministic CPU rendering of the Hunyuan3D .glb. This row is not the actual mesh output, but a fixed-camera orthographic walk-around of the mesh that the AI produced. A 3D viewer would be needed to truly inspect the output.
Row 3 (AI camera orbit): The Wan2.2 5B image-to-video output. This row contains the clip which is the actual product of AI.

Photo credits and licences for the inputs are in the input photographs README.

Example A: Hunyuan3D v2.1 image-to-3D#

Our first workflow turns each photo into a 3D mesh. The cell below stages the three input images into ComfyUI’s input/ directory, makes sure the Hunyuan3D checkpoint (~10 GB) is available, and loads the workflow JSON staged earlier.

Tip: If the model checkpoint is missing, setting DOWNLOAD_MODELS = True in the configuration cell above causes the model checkpoint to be downloaded automatically from Hugging Face into ComfyUI/models/checkpoints/. With the default DOWNLOAD_MODELS = False in the script, the cell reports that the checkpoint is missing and skips the generation step; either flip the flag to True, or download the checkpoint manually to the mentioned path before running the workflow.

from utils_comfyui import download_models

# Three real-world subjects. Photo credits/licences: examples/assets/README.md in ROCm/ComfyUI.
INPUTS = [
    ("apple",     ASSETS_DIR / "amd_comfyui_tutorial_apple_seed.png"),
    ("banana",    ASSETS_DIR / "amd_comfyui_tutorial_banana_seed.png"),
    ("pineapple", ASSETS_DIR / "amd_comfyui_tutorial_pineapple_seed.png"),
]

# Stage the photos where ComfyUI's LoadImage node looks for them.
COMFYUI_INPUT_DIR = COMFYUI_PATH / "input"
COMFYUI_INPUT_DIR.mkdir(exist_ok=True)
for _, asset in INPUTS:
    shutil.copy2(asset, COMFYUI_INPUT_DIR / asset.name)

# Checkpoint -> Hugging Face URL. Downloaded only when DOWNLOAD_MODELS is True.
HUNYUAN_CKPT = COMFYUI_PATH / "models" / "checkpoints" / "hunyuan_3d_v2.1.safetensors"
HUNYUAN_MODELS = {
    HUNYUAN_CKPT: "https://huggingface.co/Comfy-Org/hunyuan3D_2.1_repackaged/"
                  "resolve/main/hunyuan_3d_v2.1.safetensors",
}
HUNYUAN_READY = not download_models(HUNYUAN_MODELS, DOWNLOAD_MODELS)

with (WORKFLOWS_DIR / "hunyuan3d_v2_1_image_to_3d.json").open() as f:
    hunyuan_workflow = json.load(f)
print(f"Staged {len(INPUTS)} photos; workflow has {len(hunyuan_workflow)} nodes; checkpoint ready: {HUNYUAN_READY}")

Run the workflow over each input#

The following cell runs the workflow once as a “warm-up” (which loads ~10 GB of weights and compiles the diffusion and variational autoencoder (VAE) kernels), then once per input photo. Each call swaps the LoadImage filename and the SaveGLB prefix and bumps the KSampler seed so every run is fresh rather than returned from cache.

MESHES = {}
HUNYUAN_OUT = OUTPUTS_DIR / "hunyuan3d"


def customise_hunyuan(workflow, image_name, output_prefix):
    wf = deepcopy(workflow)
    wf["2"]["inputs"]["image"] = image_name
    wf["10"]["inputs"]["filename_prefix"] = output_prefix
    wf["7"]["inputs"]["seed"] = int(time.time() * 1_000_000) % (2**63 - 1)
    return wf


if HUNYUAN_READY:
    print("Warm-up (loads weights, compiles kernels)...")
    client.run_workflow(customise_hunyuan(hunyuan_workflow, INPUTS[0][1].name, "amd_tutorial/_warmup"),
                        HUNYUAN_OUT / "_warmup")
    for name, asset in INPUTS:
        wf = customise_hunyuan(hunyuan_workflow, asset.name, f"amd_tutorial/hunyuan3d_v2_1_{name}")
        saved, _ = client.run_workflow(wf, HUNYUAN_OUT / name)
        glbs = [p for p in saved if p.suffix.lower() == ".glb"]
        MESHES[name] = glbs[0] if glbs else None
        print(f"{name:<12} -> {MESHES[name].relative_to(NOTEBOOK_DIR) if MESHES[name] else '(no .glb)'}")
else:
    print("Skipping run; checkpoint not present.")

Visualize the AI-generated mesh: deterministic turntable render#

Hunyuan3D’s output is a .glb mesh on disk. To inspect it inline and confirm the model built a real 3D object, not a flat sticker, each mesh is rendered as a short MP4 turntable.

This render is deterministic and CPU-only (it is not an AI step): it loads the mesh, decimates it, and rasterizes 36 rotated frames with simple Lambert shading. This is implemented in utils_comfyui.py as render_mesh_turntable; the cell below just calls it on each mesh produced.

from utils_comfyui import render_mesh_turntable

TURNTABLE_OUT = OUTPUTS_DIR / "mesh_turntables"
TURNTABLE_VIDEOS = {}
for name, _ in INPUTS:
    glb = MESHES.get(name)
    if glb is None or not glb.is_file():
        print(f"{name:<12} (skip) - no mesh available")
        continue
    TURNTABLE_VIDEOS[name] = render_mesh_turntable(glb, TURNTABLE_OUT / f"{name}.mp4")
    print(f"{name:<12} -> {TURNTABLE_VIDEOS[name].relative_to(NOTEBOOK_DIR)}")

Example B: Wan2.2 5B image-to-video#

The second workflow animates each photograph into a short video (81 frames at 24 fps), in which every frame is generated by the diffusion model—a genuine AI camera orbit, not a rasterization of any 3D mesh.

The next cell contains each subject’s own positive prompt, which you can edit before re-running the generation. The prompts deliberately emphasize 3D camera motion (versus planar rotation), because Wan2.2 5B has weak parallax priors on cutout-style photos (for example, fruits on a pure-white background) and will default to spinning the image plane unless the prompt actively asks for a 3D arc. The negative prompt guards against failure modes observed during the prompt engineering process (multiple or peeled fruit, planar rotation, fingers in frame).

# Per-subject positive prompts for Wan2.2 5B I2V, keyed by INPUTS name.
# Edit these and re-run the workflow cell to experiment.
SUBJECT_PROMPTS: dict[str, str] = {
    "apple": (
        "Smooth slow 360 degree camera orbit around a single round red apple "
        "on a clean white background. The camera moves steadily and "
        "continuously around the apple in a wide 3D arc, revealing the back "
        "side of the apple. The apple itself is a single fruit and stays "
        "centred. Photorealistic studio product photography, sharp focus, "
        "soft natural lighting, shallow depth of field. No people, no hands, "
        "no text."
    ),
    "banana": (
        "Smooth slow 360 degree camera orbit around exactly one single yellow "
        "banana on a clean white background. There is only one banana - one "
        "single piece of fruit - lying on its side, stationary, intact, with "
        "the curve clearly visible. The camera arcs around the banana to "
        "reveal the back side of the curve in 3D. Photorealistic studio "
        "product photography, sharp focus, soft natural lighting."
    ),
    "pineapple": (
        "Smooth slow 360 degree camera orbit around a single tall pineapple "
        "with green crown leaves on a clean white background. The camera "
        "moves steadily and continuously around the pineapple in a wide 3D "
        "arc, revealing the back side of the pineapple body and the back of "
        "the crown leaves. The pineapple itself is a single fruit and stays "
        "centred. Photorealistic studio product photography, sharp focus, "
        "soft natural lighting. No people, no hands, no text."
    ),
}

NEGATIVE_PROMPT = (
    "multiple bananas, banana hand, banana cluster, multiple fruits, fruit "
    "cluster, duplicates, two apples, two bananas, two pineapples, sliced, "
    "peeled, cut open, chopped, broken fruit, mashed, deformed fruit, "
    "flat 2D rotation, planar rotation, sliding image, no parallax, "
    "human, person, hands, fingers, body, face, arms, lamp, text, watermark, "
    "logo, low quality, blurry, jpeg artifacts, distorted, deformed, extra "
    "limbs, motion blur, tilted, falling, flying, cartoon"
)

print(f"Defined positive prompts for {len(SUBJECT_PROMPTS)} subjects.")
print(f"Negative prompt: {len(NEGATIVE_PROMPT)} chars.")

With the prompts defined, the next cell makes sure the Wan2.2 checkpoints (~17 GB across the diffusion model, text encoder, and VAE) are available and prepares the per-subject workflow customizer. As in Example A, missing checkpoints are downloaded from Hugging Face only when DOWNLOAD_MODELS = True; otherwise the cell reports them as missing and the video generation step is skipped.

# Three checkpoints -> their Hugging Face URLs (Comfy-Org repackaged split files).
WAN_BASE = "https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/resolve/main/split_files"
WAN_MODELS = {
    COMFYUI_PATH / "models" / "diffusion_models" / "wan2.2_ti2v_5B_fp16.safetensors":
        f"{WAN_BASE}/diffusion_models/wan2.2_ti2v_5B_fp16.safetensors",
    COMFYUI_PATH / "models" / "text_encoders" / "umt5_xxl_fp8_e4m3fn_scaled.safetensors":
        f"{WAN_BASE}/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors",
    COMFYUI_PATH / "models" / "vae" / "wan2.2_vae.safetensors":
        f"{WAN_BASE}/vae/wan2.2_vae.safetensors",
}
WAN_I2V_READY = not download_models(WAN_MODELS, DOWNLOAD_MODELS)

with (WORKFLOWS_DIR / "wan2_2_5b_image_to_video.json").open() as f:
    wan_workflow = json.load(f)


def customise_wan(workflow, image_name, output_prefix, positive, negative=NEGATIVE_PROMPT):
    wf = deepcopy(workflow)
    wf["56"]["inputs"]["image"] = image_name
    wf["58"]["inputs"]["filename_prefix"] = output_prefix
    wf["3"]["inputs"]["seed"] = int(time.time() * 1_000_000) % (2**63 - 1)
    wf["6"]["inputs"]["text"] = positive
    wf["7"]["inputs"]["text"] = negative
    return wf


print(f"Loaded Wan2.2 workflow ({len(wan_workflow)} nodes); checkpoints ready: {WAN_I2V_READY}")

Now run the workflow for each subject:

WAN_OUT = OUTPUTS_DIR / "wan2_2_5b_i2v"
WAN_VIDEOS = {}
if WAN_I2V_READY:
    print("Warm-up (loads weights, compiles kernels)...")
    client.run_workflow(
        customise_wan(wan_workflow, INPUTS[0][1].name, "amd_tutorial/_wan_warmup", SUBJECT_PROMPTS[INPUTS[0][0]]),
        WAN_OUT / "_warmup",
    )
    for name, asset in INPUTS:
        wf = customise_wan(wan_workflow, asset.name, f"amd_tutorial/wan22_5b_i2v_{name}", SUBJECT_PROMPTS[name])
        saved, _ = client.run_workflow(wf, WAN_OUT / name)
        mp4s = [p for p in saved if p.suffix.lower() == ".mp4"]
        WAN_VIDEOS[name] = mp4s[0] if mp4s else None
        print(f"{name:<12} -> {WAN_VIDEOS[name].relative_to(NOTEBOOK_DIR) if WAN_VIDEOS[name] else '(no .mp4)'}")
else:
    print("Skipping Wan2.2 5B I2V; checkpoints not present.")

Side-by-side: input photo, AI mesh turntable, AI camera orbit#

Finally, lay out the results as a grid—three rows, one column per subject:

Row 1: Input photo. The 768×768 PNG fed to both workflows.
Row 2: Mesh turntable. The deterministic CPU render of the Hunyuan3D .glb (the mesh is the AI output; the rotation is scripted).
Row 3: AI camera orbit. The Wan2.2 5B output, where every frame is generated by the diffusion model.

result_grid_html (in utils_comfyui.py) embeds each item as base64, so the rendered notebook is self-contained.

from utils_comfyui import result_grid_html
from IPython.display import HTML, display

display(HTML(result_grid_html(INPUTS, MESHES, TURNTABLE_VIDEOS, WAN_VIDEOS)))

Part 3: Measure throughput on your own hardware (optional)#

This part is optional. Set RUN_BENCHMARK = True in the configuration cell below to run it (the default is False, so the notebook stages a quick demo for first-time readers).

The goal of this experiment is to answer a practical question on your hardware: how long does a given workflow take, end to end, from submitting the prompt to saving the output? The harness runs one warm-up pass (to load weights and compile kernels) followed by several timed iterations, then reports the mean and standard deviation of your own measurements. The timing logic lives in benchmark_workflow in utils_comfyui.py.

This notebook ships with no pre-recorded numbers—every figure below is produced by the run you trigger. If you would like to compare your results against AMD’s published reference figures, see the AMD ROCm blog Accelerating ComfyUI Workflows on AMD Instinct MI355X GPUs with ROCm.

By default the harness re-runs the Hunyuan3D workflow from Part 2. To benchmark another workflow (for example, FLUX.1-dev or Wan2.2 14B image-to-video), append an entry to BENCH_SPECS with its workflow JSON and checkpoint paths, which will be picked up automatically, or skipped if any referenced files are missing.

# Default OFF: flip to True to time iterations on your hardware.
RUN_BENCHMARK = False
ITERATIONS = 3   # timed passes per workflow, after one warm-up pass

# By default the benchmark is on the Hunyuan3D workflow from Part 2. Append more entries
# (same shape: key / workflow / checkpoints / image_input) to measure others.
BENCH_SPECS = [
    {
        "key": "hunyuan3d_v2_1",
        "workflow": WORKFLOWS_DIR / "hunyuan3d_v2_1_image_to_3d.json",
        "checkpoints": [HUNYUAN_CKPT],
        "image_input": INPUTS[0][1].name,
    },
]

BENCH_OUT = OUTPUTS_DIR / "part3_benchmarks"
print(f"Configured {len(BENCH_SPECS)} workflow(s); RUN_BENCHMARK={RUN_BENCHMARK}, ITERATIONS={ITERATIONS}.")

Run the timed iterations#

The next cell runs each configured workflow whose checkpoints are present: one warm-up pass, then ITERATIONS timed passes. Each pass gets a fresh seed, so nothing is served from cache. Timings are wall-clock seconds measured around the API call with time.perf_counter().

from utils_comfyui import benchmark_workflow

BENCH_RESULTS = {}
if RUN_BENCHMARK:
    for spec in BENCH_SPECS:
        print(f"\n=== {spec['key']} ===")
        result = benchmark_workflow(
            client, spec["workflow"], spec["checkpoints"],
            spec["image_input"], ITERATIONS, BENCH_OUT / spec["key"],
        )
        if result is not None:
            BENCH_RESULTS[spec["key"]] = result
    print(f"\nFinished {len(BENCH_RESULTS)}/{len(BENCH_SPECS)} workflow(s).")
else:
    print("RUN_BENCHMARK=False; skipping Part 3 timed runs.")
    print("Set RUN_BENCHMARK=True in the configuration cell to enable.")

Your results#

The next cell generates a table summarizing any iterations you have run on your own GPU. The table stays empty until you set RUN_BENCHMARK = True and re-run the cell above.

print(f"{'workflow':<20} {'iterations':>10} {'mean (s)':>12} {'std (s)':>12}")
print("-" * 56)
for spec in BENCH_SPECS:
    r = BENCH_RESULTS.get(spec["key"])
    if r is None:
        print(f"{spec['key']:<20} {'(skipped)':>10}")
    else:
        print(f"{spec['key']:<20} {len(r['times_s']):>10} {r['mean_s']:>12.2f} {r['std_s']:>12.2f}")

if not BENCH_RESULTS:
    print("\n(no local results - set RUN_BENCHMARK=True and re-run the runner cell)")

Clean up#

After finishing the tutorial, terminate the background ComfyUI server that was launched (a no-op if a preexisting server was reused).

import signal

if server_pid is not None:
    try:
        os.kill(server_pid, signal.SIGTERM)
        print(f"Sent SIGTERM to ComfyUI server pid {server_pid}.")
    except ProcessLookupError:
        print(f"ComfyUI server pid {server_pid} already exited.")
else:
    print("No server was launched by this notebook; nothing to clean up.")

Summary#

In this tutorial you drove ComfyUI generative workflows end-to-end from Python on AMD Instinct GPUs, with no browser:

Set up a ComfyUI-on-ROCm container, verified the PyTorch + ROCm runtime, launched the ComfyUI server in the background, and loaded a small set of HTTP API helpers (queue_prompt, wait_for_completion, download_outputs, run_workflow).
Generated two kinds of output on the same three photos: a 3D mesh per subject with Hunyuan3D v2.1 (image-to-3D), and a short camera-orbit video per subject with Wan2.2 5B (image-to-video), visualized inline as a grid.
Measured the end-to-end time of a workflow on your own hardware with a small timed harness (optional).

From here you could try submitting your own workflow JSON (export any workflow from the ComfyUI UI in API format), swapping in different models, or wiring these helpers into a script or service through the same HTTP API.

A natural next step is scaling beyond a single GPU. While ComfyUI has no native multi-GPU support today, community tooling such as xDiT, which supports sequence parallelism, unified sequence parallelism (USP), and PipeFusion, is a promising starting point.