What is AQLprofile?#
2025-10-06
2 min read time
The Architected Queuing Language profiling library (AQLprofile) is an open source library that enables advanced GPU profiling and tracing on AMD platforms. It works in conjunction with ROCprofiler-SDK to support profiling methods such as performance counters (PMC) and SQ thread trace (SQTT). AQLprofile provides the foundational mechanisms for constructing AQL packets and managing profiling operations across multiple AMD GPU architecture families. The development of AQLprofile is aligned with ROCprofiler-SDK, ensuring compatibility and feature support for new GPU architectures and profiling requirements.
AQLprofile builds on concepts from the Heterogeneous System Architecture (HSA) and the AQL, which define the foundations for GPU command processing and profiling on AMD platforms. For more information, see:
Features#
Profiling AQL packets for GPU workloads.
Performance counters and SQ thread traces.
Support for GFX9, GFX10XX, GFX11XX, and GFX12XX architecture families.
Verbose tracing and error logging capabilities.
Thread trace binary data generated by AQLprofile can be decoded using rocprof-trace-decoder.
Who should use this library?#
End users: If you want to profile AMD GPUs, use ROCprofiler-SDK or tools that depend on it. You do not need to use AQLprofile directly.
Developers/integrators: If you’re building profiling tools, custom workflows, or need to extend profiling capabilities, you may use AQLprofile directly as a backend.
How does AQLprofile fit into the ROCm profiling stack?#
Here’s the typical workflow:
Application → ROCprofiler-SDK ⇄ AQLprofile ⇄ ROCprofiler-SDK → HSA/ROCR/KFD → AMD GPU hardware
AQLprofile generates profiling command packets (AQL/PM4) tailored to the GPU architecture. It doesn’t interact with hardware or drivers directly. It only produces the packets and buffer requirements requested by
ROCprofiler-SDK
.ROCprofiler-SDK provides a higher-level API and user-facing tools, using AQLprofile internally. It manages profiling sessions, submits packets to the GPU via ROCr/HSA/KFD, and collects results.