ROCm XIO examples#
2026-04-27
4 min read time
The examples/ directory contains standalone projects that
demonstrate how to use the installed ROCm XIO library. Each
example is a self-contained CMake project that uses
find_package(rocm-xio) to locate the library.
Building examples#
All examples follow the same build pattern. First, install ROCm XIO to a temporary prefix, then configure and build the example against that prefix:
# Build and install rocm-xio
cmake -S . -B build
cmake --build build --target all
cmake --install build --prefix /tmp/rocm-xio
# Build an example (e.g. list-endpoints)
cmake -S examples/list-endpoints \
-B /tmp/list-endpoints-build \
-DCMAKE_PREFIX_PATH="/tmp/rocm-xio;/opt/rocm"
cmake --build /tmp/list-endpoints-build
CTest runs these automatically via the install-integration test fixture (see Run ROCm XIO tests).
Available examples#
find-package#
Configure-time smoke test. Verifies that
find_package(rocm-xio) resolves the installed package
and prints the discovered version and paths. Contains no
source files.
Requirements: none (configure-time only).
list-endpoints#
Minimal example that lists all registered endpoints. Calls
listAvailableEndpoints() from xio.h. This is
CPU-only code compiled as HIP.
Requirements: none (CPU-only).
/tmp/list-endpoints-build/list-endpoints
endpoint-info#
Iterates the endpoint registry and prints each endpoint’s
name, description, and type. Also validates that
getEndpointName() and isValidEndpoint() agree with
the registry data. CPU-only.
Requirements: none (CPU-only).
/tmp/endpoint-info-build/endpoint-info
sdma-ep-p2p#
GPU-initiated peer-to-peer DMA transfer via the SDMA
endpoint. Uses the public sdma_ep host-side API
(initEndpoint(), createConnection(),
createQueue()) and device-side operations
(putSignal(), waitSignal(), quiet()) to perform
a GPU-to-GPU memory copy driven entirely from shader code.
Requirements:
Two AMD GPUs with XGMI / Infinity Fabric P2P access
Root access (hsakmt requires
/dev/kfd)Supported GPU architecture (see
SDMA_EP_GFX_WHITELISTin the Alola test scripts)
sudo /tmp/sdma-ep-p2p-build/sdma-ep-p2p
The example fills a 4 KiB source buffer on GPU 0 with
0xAB, transfers it to GPU 1 via SDMA, and verifies the
destination buffer contents.
sdma-ep-allgather#
MPI allgather collective driven entirely from GPU shader
code via SDMA. Each MPI rank owns one GPU and uses the
sdma_ep device-side API to DMA its local input buffer
into the correct slot of every other rank’s output buffer.
IPC memory handles are exchanged via MPI_Allgather so
each GPU can write directly into remote GPU memory.
This example is derived from the original
shader-sdma-coll prototype and serves as a template for
building multi-process, multi-GPU SDMA collectives.
Requirements:
N AMD GPUs with XGMI / Infinity Fabric P2P access
MPI (OpenMPI, MPICH, etc.)
Root access (hsakmt requires
/dev/kfd)Supported GPU architecture (see
SDMA_EP_GFX_WHITELIST)
mpirun -np 2 sudo ./sdma-ep-allgather
mpirun -np 4 sudo ./sdma-ep-allgather 8192
The optional argument sets the per-rank chunk size in
integers (default: 1024 = 4 KiB per rank). Each rank fills
its chunk with rank + 1 and verifies that the gathered
output contains the correct values from all ranks.
nvme-ep-info#
Queries NVMe controller properties using the nvme_ep
host-side API. Prints LBA size, namespace capacity, maximum
queue ID, and SMART/Health log data. Accepts an optional
device path argument (defaults to /dev/nvme0).
Requirements:
An NVMe device (for example,
/dev/nvme0)Root access (NVMe admin commands require
CAP_SYS_ADMIN)
sudo /tmp/nvme-ep-info-build/nvme-ep-info /dev/nvme0
Returns exit code 77 (CTest SKIP convention) when the NVMe device cannot be opened, so it can be registered as a CTest with graceful skip behavior.
Write new examples#
To add a new example:
Create a directory under
examples/<name>/.Add a
CMakeLists.txtthat usesfind_package(rocm-xio REQUIRED)and links againstrocm-xio::rocm-xio.Add the source file(s) as
.hip(even for CPU-only code, since ROCm XIO headers use HIP types).Register the example in
tests/integration/CMakeLists.txtviaxio_add_install_test(). Use theRUNflag only if the example can run in CI without special hardware.Document the example in this file.