Command processor (CP)#
The command processor (CP) connects the host and kernel driver to on-GPU scheduling. During the process it pulls work from HSA queues, decodes packets, and dispatches the kernel launches to the front-end (SPI / WGP path). On Instinct GPUs, the profiler often seperates the metrics into command processor fetcher (CPF) and command processor compute (CPC). The gfx1151 analysis panels emphasize CPC and ME (Micro Engine) activity, including utilization, interface utilization, stall cycles, memory requests, and instruction cache.
For the complete CDNA architecture overview and the CPF and CPC metric tabs across MI-series GPUs, see Command processor (CP) under CDNA-CDNA4.
Command processor compute (CPC) — gfx1151#
CPC utilization#
Metric |
Description |
Unit |
|---|---|---|
CPC Busy |
Percentage of CPC always-count cycles the Command Processor Compute was busy processing commands. Computed as 100 * CPC_STAT_BUSY / CPC_ALWAYS_COUNT. |
Percent |
CPC Idle |
Percentage of CPC always-count cycles the Command Processor Compute was idle. Computed as 100 * CPC_STAT_IDLE / CPC_ALWAYS_COUNT. |
Percent |
CPC Stalled |
Percentage of CPC always-count cycles the Command Processor Compute was stalled. Computed as 100 * CPC_STAT_STALL / CPC_ALWAYS_COUNT. |
Percent |
CPC interface utilization#
Metric |
Description |
Unit |
|---|---|---|
TCIU Busy |
Percentage of CPC always-count cycles the TC (Texture Cache) Interface Unit was busy. Computed as 100 * CPC_TCIU_BUSY / CPC_ALWAYS_COUNT. |
Percent |
UTCL2 Busy |
Percentage of CPC always-count cycles the UTCL2 (Unified Translation Cache L2) interface was busy. Computed as 100 * CPC_UTCL2IU_BUSY / CPC_ALWAYS_COUNT. |
Percent |
GCRIU Busy |
Percentage of CPC always-count cycles the GCR (Graphics Cache Rinse) Interface Unit was busy. Computed as 100 * CPC_GCRIU_BUSY / CPC_ALWAYS_COUNT. |
Percent |
Micro Engine (ME) stall cycles#
Metric |
Description |
Unit |
|---|---|---|
ME1 Stall on RCIU Ready |
Cycles ME1 was stalled waiting for RCIU (Register Cache Interface Unit) to be ready (CPC_ME1_STALL_WAIT_ON_RCIU_READY). |
Cycles per Normalization Unit |
ME1 Stall on Memory Read |
Cycles ME1 was stalled waiting for memory read completion (CPC_ME1_STALL_WAIT_ON_MEM_READ). |
Cycles per Normalization Unit |
ME1 Stall on Memory Write |
Cycles ME1 was stalled waiting for memory write completion (CPC_ME1_STALL_WAIT_ON_MEM_WRITE). |
Cycles per Normalization Unit |
ME1 Stall on ROQ Data |
Cycles ME1 was stalled waiting for data from the Ring Output Queue (CPC_ME1_STALL_ON_DATA_FROM_ROQ). |
Cycles per Normalization Unit |
CPC memory requests#
Metric |
Description |
Unit |
|---|---|---|
TCIU Read Requests |
Number of read requests sent to the TC Interface Unit. |
Count per Normalization Unit |
TCIU Write Requests |
Number of write requests sent to the TC Interface Unit. |
Count per Normalization Unit |
GUS Read Requests |
Number of read requests sent to the Global Unified Shader memory interface. |
Count per Normalization Unit |
GUS Write Requests |
Number of write requests sent to the Global Unified Shader memory interface. |
Count per Normalization Unit |
Micro Engine (ME) instruction cache#
Metric |
Description |
Unit |
|---|---|---|
Instruction Cache Hits |
Number of MEC instruction cache hits. |
Count per Normalization Unit |
Instruction Cache Misses |
Number of MEC instruction cache misses. |
Count per Normalization Unit |
Instruction Cache Hit Rate |
Percentage of MEC instruction cache accesses that hit in the cache. |
Percent |