Shader engine#
Shader engines (SEs) still partition the GPU on RDNA hardware; gfx1151 reports Shader Processor Input (SPI) utilization through GRBM-derived counters and dispatch statistics. This complements the WGP chapter, which focuses on per-WGP execution metrics.
Note
For Instinct-centric SE, sL1D, and L1I metric tabs, see Shader engine (SE).
Graphics Register Bus Manager (GRBM)#
GPU utilization#
Metric |
Description |
Unit |
|---|---|---|
GPU Busy |
Percentage of time the GPU was actively processing work. Per sample: 100 * GRBM_GUI_ACTIVE / GRBM_COUNT (when GRBM_COUNT != 0). Avg / min / max are taken across samples. |
Percent |
CP Busy |
Percentage of GUI-active cycles the Command Processor was busy. Per sample: 100 * GRBM_CP_BUSY / GRBM_GUI_ACTIVE (when GRBM_GUI_ACTIVE != 0). CP handles kernel dispatch and command stream processing. |
Percent |
GL2C Busy |
Percentage of GUI-active cycles the GL2 Cache was busy. Per sample: 100 * GRBM_GL2C_BUSY / GRBM_GUI_ACTIVE (when GRBM_GUI_ACTIVE != 0). High utilization indicates memory-intensive workload. |
Percent |
Shader engine utilization#
Metric |
Description |
Unit |
|---|---|---|
TA Busy |
Percentage of GUI-active cycles the Texture Addresser was busy. Per sample: 100 * GRBM_TA_BUSY / GRBM_GUI_ACTIVE (when GRBM_GUI_ACTIVE != 0). |
Percent |
Shader Processor Input (SPI)#
SPI utilization#
Metric |
Description |
Unit |
|---|---|---|
SPI Busy |
Percentage of GPU active (GUI) cycles where any SPI was busy. Computed as 100 * GRBM_SPI_BUSY / GRBM_GUI_ACTIVE (GRBM block). Do not use block-level SPI_BUSY here: it is aggregated over every SPI instance, so summed busy cycles can far exceed GUI-active cycles when multiple shader engines are active in parallel. |
Percent |
Wave dispatch statistics#
Metric |
Description |
Unit |
|---|---|---|
Compute Wave Dispatches |
SPI counter SPI_CSN_WAVE (block SPI, event 50): compute (CSN) path wave count as stored in pmc_perf (column SPI_CSN_WAVE). Same idea as WGP “Dispatched Waves” (SQ_WAVES_sum) but at SPI; values may differ slightly. Normalized by CLI $denom: per_kernel → ÷1; per_wave → ÷SQ_WAVES; per_cycle → ÷$GRBM_GUI_ACTIVE_PER_XCD; per_second → ÷kernel wall time (s). |
Count per Normalization Unit |
Wave Dispatch Rate |
SPI_CSN_WAVE / GRBM_GUI_ACTIVE — waves per GUI-active cycle (can exceed 1 when many SEs dispatch in parallel). Uses the same SPI_CSN_WAVE column as collection emits. |
Waves/cycle |