Command processor (CP)#
The command processor (CP) is the GPU front-end that connects the host and kernel driver to on-GPU scheduling: it pulls work from HSA queues, decodes packets, and dispatches kernel launches to the shader-engine SPI / WGP path. On Instinct GPUs, the profiler often separates the metrics into command processor fetcher (CPF) and command processor compute (CPC). The gfx115x analysis panels emphasize CPC and ME (Micro Engine) activity, including utilization, interface utilization, stall cycles, memory requests, and instruction cache.
For the complete CDNA architecture overview and the CPF and CPC metric tabs across MI-series GPUs, see Command processor (CP) under CDNA-CDNA4.
Command processor compute (CPC) - gfx115x#
CPC utilization#
Metric |
Description |
Unit |
|---|---|---|
CPC Busy |
Percentage of time the Command Processor Compute is actively processing commands. High busy percentage indicates efficient command stream processing. Low values may indicate gaps in workload submission or dispatch bottlenecks. |
Percent |
CPC Idle |
Percentage of time the Command Processor Compute has no work to process. High idle time indicates underutilization, often due to gaps between kernel dispatches or insufficient workload to keep the GPU busy. |
Percent |
CPC Stalled |
Percentage of time the Command Processor Compute is stalled waiting for resources. High stall rates may indicate memory bottlenecks, synchronization issues, or contention for internal command processor resources. |
Percent |
CPC interface utilization#
Metric |
Description |
Unit |
|---|---|---|
TCIU Busy |
Percentage of time the Texture Cache Interface Unit is busy handling command processor memory requests. High utilization indicates significant memory traffic from command processing activities. |
Percent |
UTCL2 Busy |
Percentage of time the Unified Translation Cache L2 interface is busy handling address translation requests. High utilization may indicate heavy virtual memory activity or TLB pressure. |
Percent |
GCRIU Busy |
Percentage of time the Graphics Cache Rinse Interface Unit is busy. This unit handles cache coherency operations. High utilization may indicate frequent cache flushes or synchronization points. |
Percent |
Micro Engine (ME) stall cycles#
Metric |
Description |
Unit |
|---|---|---|
ME1 Stall on RCIU Ready |
Cycles the Micro Engine is stalled waiting for the Register Cache Interface Unit. High stall counts may indicate register allocation bottlenecks or excessive register pressure during command processing. |
Cycles per Normalization Unit |
ME1 Stall on Memory Read |
Cycles the Micro Engine is stalled waiting for memory read operations to complete. High stall counts indicate memory latency is impacting command processing throughput. |
Cycles per Normalization Unit |
ME1 Stall on Memory Write |
Cycles the Micro Engine is stalled waiting for memory write operations to complete. High stall counts may indicate memory bandwidth saturation or write queue pressure. |
Cycles per Normalization Unit |
ME1 Stall on ROQ Data |
Cycles the Micro Engine is stalled waiting for data from the Ring Output Queue. High stall counts may indicate command stream processing bottlenecks. |
Cycles per Normalization Unit |
CPC memory requests#
Metric |
Description |
Unit |
|---|---|---|
TCIU Read Requests |
Number of read requests issued by the command processor through the texture cache interface. These requests fetch command buffer data and kernel arguments. |
Count per Normalization Unit |
TCIU Write Requests |
Number of write requests issued by the command processor through the texture cache interface. These handle completion signals and status updates. |
Count per Normalization Unit |
GUS Read Requests |
Number of read requests to the Global Unified Shader memory interface. This path handles direct memory access for command processor operations. |
Count per Normalization Unit |
GUS Write Requests |
Number of write requests to the Global Unified Shader memory interface. High write counts may indicate frequent state updates or completion notifications. |
Count per Normalization Unit |
Micro Engine (ME) instruction cache#
Metric |
Description |
Unit |
|---|---|---|
Instruction Cache Hits |
Number of Micro Engine instruction fetches serviced from the instruction cache. Higher hit counts indicate good instruction locality in command processing code. |
Count per Normalization Unit |
Instruction Cache Misses |
Number of Micro Engine instruction fetches that missed in the instruction cache. High miss counts increase command processing latency due to memory fetches. |
Count per Normalization Unit |
Instruction Cache Hit Rate |
Percentage of Micro Engine instruction fetches that hit in cache. High hit rates are essential for efficient command processing. Low rates may indicate complex or large command processing routines. |
Percent |