RDNA3#
ROCm Compute Profiler makes available an extensive list of metrics to better understand achieved application performance on RDNA3.5 architecture based AMD Ryzen™ APUs like AMD Ryzen AI Max Series - Strix Halo (gfx1151).
To best use profiling data, it’s important to understand the role of various hardware blocks of AMD RDNA3 architecture. Refer to the following top-level block diagram to understand the hardware blocks of RDNA3 architecture.
For more details on AMD RDNA3 architecture, see page 5 of RDNA3 shader instruction set architecture.
Note
For top-level metrics details on CDNA and RDNA architecture, see Performance model.
For details on metrics available for CDNA-CDNA4 based Instinct GPUs, see AMD CDNA architecture (CDNA-CDNA4).
For details on packaging, SIMD width, and generational differences between RDNA3, RDNA3.5, and later APUs, refer to GPU hardware specifications and the public architecture summaries.
ROCm Compute Profiler includes analysis panels targeting RDNA3.5 parts reporting as gfx1151 — for example, integrated graphics on AMD Ryzen AI Max Series - Strix Halo processors.
Memory hierarchy in the tool#
For gfx1151, the Memory Chart panel walks the path from instruction and scalar paths, TCP (GL0), LDS, interfaces to GL1C, GL2C, and GCEA toward system memory.
Workgroups and execution#
RDNA3 architecture based APUs organize compute around Workgroup Processors (WGPs) and Compute Units (CUs). Wavefronts are typically wave32-oriented in this configuration. The Workgroup processor (WGP), Shader Processor Input (SPI), and Command Processor Compute (CPC) panels in gfx1151 expose the dispatch, occupancy, and command-processor side metrics that complement the Instinct/CDNA Compute unit page, which uses terminology such as CU and Shader Engines (SE).
Hardware block chapters#
The RDNA3.5 architecture based metrics tables are categorized by the following blocks. Profiler concepts for RDNA use APU naming (such as, WGP, GL1/GL2) and embed the RDNA3.5 (gfx1151) metric tables under each block:
System Speed-of-Light — SoL table for gfx1151. It uses the same metric keys as the analysis panel.
Workgroup processor (WGP) — Roofline, WGP utilization, waves, instruction mix, WGP instruction and data caches.
GL0 — TCP (GL0): Panel tables and Memory Chart rows through TCP-GL1.
GL1 — GL1C: Panel tables and Memory Chart GL1C Cache.
GL2 — GL2C, GCEA / DRAM / arbiter, and related panel metrics.
Shader engine — GRBM GPU/SE utilization and SPI dispatch statistics.
Command processor (CP) — CPC / Micro Engine (ME) metrics (same role as CDNA CP, different tab layout in gfx1151).
References — Public references and link to Instinct citations.
Note
ROCm Compute Profiler currently has limited support for WMMA on Strix Halo.