Command processor (CP)#

The command processor (CP) is the GPU front-end that connects the host and kernel driver to on-GPU scheduling: it pulls work from HSA queues, decodes packets, and dispatches kernel launches to the shader-engine SPI / WGP path. On Instinct GPUs, the profiler often separates the metrics into command processor fetcher (CPF) and command processor compute (CPC). The gfx115x analysis panels emphasize CPC and ME (Micro Engine) activity, including utilization, interface utilization, stall cycles, memory requests, and instruction cache.

For the complete CDNA architecture overview and the CPF and CPC metric tabs across MI-series GPUs, see Command processor (CP) under CDNA-CDNA4.

Command processor compute (CPC) - gfx115x#

CPC utilization#

Metric

Description

Unit

CPC Busy

Percentage of time the Command Processor Compute is actively processing commands. High busy percentage indicates efficient command stream processing. Low values may indicate gaps in workload submission or dispatch bottlenecks.

Percent

CPC Idle

Percentage of time the Command Processor Compute has no work to process. High idle time indicates underutilization, often due to gaps between kernel dispatches or insufficient workload to keep the GPU busy.

Percent

CPC Stalled

Percentage of time the Command Processor Compute is stalled waiting for resources. High stall rates may indicate memory bottlenecks, synchronization issues, or contention for internal command processor resources.

Percent

CPC interface utilization#

Metric

Description

Unit

TCIU Busy

Percentage of time the Texture Cache Interface Unit is busy handling command processor memory requests. High utilization indicates significant memory traffic from command processing activities.

Percent

UTCL2 Busy

Percentage of time the Unified Translation Cache L2 interface is busy handling address translation requests. High utilization may indicate heavy virtual memory activity or TLB pressure.

Percent

GCRIU Busy

Percentage of time the Graphics Cache Rinse Interface Unit is busy. This unit handles cache coherency operations. High utilization may indicate frequent cache flushes or synchronization points.

Percent

Micro Engine (ME) stall cycles#

Metric

Description

Unit

ME1 Stall on RCIU Ready

Cycles the Micro Engine is stalled waiting for the Register Cache Interface Unit. High stall counts may indicate register allocation bottlenecks or excessive register pressure during command processing.

Cycles per Normalization Unit

ME1 Stall on Memory Read

Cycles the Micro Engine is stalled waiting for memory read operations to complete. High stall counts indicate memory latency is impacting command processing throughput.

Cycles per Normalization Unit

ME1 Stall on Memory Write

Cycles the Micro Engine is stalled waiting for memory write operations to complete. High stall counts may indicate memory bandwidth saturation or write queue pressure.

Cycles per Normalization Unit

ME1 Stall on ROQ Data

Cycles the Micro Engine is stalled waiting for data from the Ring Output Queue. High stall counts may indicate command stream processing bottlenecks.

Cycles per Normalization Unit

CPC memory requests#

Metric

Description

Unit

TCIU Read Requests

Number of read requests issued by the command processor through the texture cache interface. These requests fetch command buffer data and kernel arguments.

Count per Normalization Unit

TCIU Write Requests

Number of write requests issued by the command processor through the texture cache interface. These handle completion signals and status updates.

Count per Normalization Unit

GUS Read Requests

Number of read requests to the Global Unified Shader memory interface. This path handles direct memory access for command processor operations.

Count per Normalization Unit

GUS Write Requests

Number of write requests to the Global Unified Shader memory interface. High write counts may indicate frequent state updates or completion notifications.

Count per Normalization Unit

Micro Engine (ME) instruction cache#

Metric

Description

Unit

Instruction Cache Hits

Number of Micro Engine instruction fetches serviced from the instruction cache. Higher hit counts indicate good instruction locality in command processing code.

Count per Normalization Unit

Instruction Cache Misses

Number of Micro Engine instruction fetches that missed in the instruction cache. High miss counts increase command processing latency due to memory fetches.

Count per Normalization Unit

Instruction Cache Hit Rate

Percentage of Micro Engine instruction fetches that hit in cache. High hit rates are essential for efficient command processing. Low rates may indicate complex or large command processing routines.

Percent