Command processor (CP)

Command processor (CP)#

The command processor (CP) is the GPU front-end that connects the host and kernel driver to on-GPU scheduling: it pulls work from HSA queues, decodes packets, and dispatches kernel launches to the Shader engine Workgroup Manager (SPI) or Workgroup processor (WGP) path. On Instinct GPUs, the profiler often separates the metrics into Command Processor Fetcher (CPF) and Command Processor Compute (CPC). The gfx115x analysis panels emphasize CPC and ME (Micro Engine) activity, including utilization, interface utilization, stall cycles, memory requests, and instruction cache.

For the complete CDNA architecture overview and the CPF and CPC metric tabs across AMD Instinct MI-Series GPUs, see Command processor (CP) under CDNA-CDNA4.

Command processor compute (CPC) - gfx115x#

CPC utilization#

RDNA3.5 (gfx115x)

Metric	Description	Unit
CPC Busy	Percentage of time the Command Processor Compute is actively processing commands. High busy percentage indicates efficient command stream processing. Low values may indicate gaps in workload submission or dispatch bottlenecks.	Percent
CPC Idle	Percentage of time the Command Processor Compute has no work to process. High idle time indicates underutilization, often due to gaps between kernel dispatches or insufficient workload to keep the GPU busy.	Percent
CPC Stalled	Percentage of time the Command Processor Compute is stalled waiting for resources. High stall rates may indicate memory bottlenecks, synchronization issues, or contention for internal command processor resources.	Percent

CPC interface utilization#

RDNA3.5 (gfx115x)

Metric	Description	Unit
TCIU Busy	Percentage of time the Texture Cache Interface Unit is busy handling command processor memory requests. High utilization indicates significant memory traffic from command processing activities.	Percent
UTCL2 Busy	Percentage of time the Unified Translation Cache L2 interface is busy handling address translation requests. High utilization may indicate heavy virtual memory activity or TLB pressure.	Percent
GCRIU Busy	Percentage of time the Graphics Cache Rinse Interface Unit is busy. This unit handles cache coherency operations. High utilization may indicate frequent cache flushes or synchronization points.	Percent

Micro Engine (ME) stall cycles#

RDNA3.5 (gfx115x)

Metric	Description	Unit
ME1 Stall on RCIU Ready	Cycles the Micro Engine is stalled waiting for the Register Cache Interface Unit. High stall counts may indicate register allocation bottlenecks or excessive register pressure during command processing.	Cycles per Normalization Unit
ME1 Stall on Memory Read	Cycles the Micro Engine is stalled waiting for memory read operations to complete. High stall counts indicate memory latency is impacting command processing throughput.	Cycles per Normalization Unit
ME1 Stall on Memory Write	Cycles the Micro Engine is stalled waiting for memory write operations to complete. High stall counts may indicate memory bandwidth saturation or write queue pressure.	Cycles per Normalization Unit
ME1 Stall on ROQ Data	Cycles the Micro Engine is stalled waiting for data from the Ring Output Queue. High stall counts may indicate command stream processing bottlenecks.	Cycles per Normalization Unit

CPC memory requests#

RDNA3.5 (gfx115x)

Metric	Description	Unit
TCIU Read Requests	Number of read requests issued by the command processor through the texture cache interface. These requests fetch command buffer data and kernel arguments.	Count per Normalization Unit
TCIU Write Requests	Number of write requests issued by the command processor through the texture cache interface. These handle completion signals and status updates.	Count per Normalization Unit
GUS Read Requests	Number of read requests to the Global Unified Shader memory interface. This path handles direct memory access for command processor operations.	Count per Normalization Unit
GUS Write Requests	Number of write requests to the Global Unified Shader memory interface. High write counts may indicate frequent state updates or completion notifications.	Count per Normalization Unit

Micro Engine (ME) instruction cache#

RDNA3.5 (gfx115x)

Metric	Description	Unit
Instruction Cache Hits	Number of Micro Engine instruction fetches serviced from the instruction cache. Higher hit counts indicate good instruction locality in command processing code.	Count per Normalization Unit
Instruction Cache Misses	Number of Micro Engine instruction fetches that missed in the instruction cache. High miss counts increase command processing latency due to memory fetches.	Count per Normalization Unit
Instruction Cache Hit Rate	Percentage of Micro Engine instruction fetches that hit in cache. High hit rates are essential for efficient command processing. Low rates may indicate complex or large command processing routines.	Percent

Command processor (CP)

Contents

Command processor (CP)#

Command processor compute (CPC) - gfx115x#

CPC utilization#

CPC interface utilization#

Micro Engine (ME) stall cycles#

CPC memory requests#

Micro Engine (ME) instruction cache#