Command processor (CP)#

The command processor (CP) is responsible for interacting with the AMDGPU kernel driver – the Linux kernel – on the CPU and for interacting with user-space HSA clients when they submit commands to HSA queues. Basic tasks of the CP include reading commands (such as, corresponding to a kernel launch) out of HSA queues, scheduling work to subsequent parts of the scheduler pipeline, and marking kernels complete for synchronization events on the host.

The command processor consists of two sub-components:

  • Fetcher (CPF): Fetches commands out of memory to hand them over to the CPC for processing.

  • Packet processor (CPC): Micro-controller running the command processing firmware that decodes the fetched commands and (for kernels) passes them to the workgroup processors for scheduling.

Before scheduling work to the accelerator, the command processor can first acquire a memory fence to ensure system consistency (Section 2.6.4). After the work is complete, the command processor can apply a memory-release fence. Depending on the AMD CDNA™ accelerator under question, either of these operations might initiate a cache write-back or invalidation.

Analyzing command processor performance is most interesting for kernels that you suspect to be limited by scheduling or launch rate. The command processor’s metrics therefore are focused on reporting, for example:

  • Utilization of the fetcher

  • Utilization of the packet processor, and decoding processing packets

  • Stalls in fetching and processing

Command processor fetcher (CPF)#

Metric

Description

Unit

CPF Utilization

Percent of total cycles where the CPF was busy actively doing any work. The ratio of CPF busy cycles over total cycles counted by the CPF.

Percent

CPF Stall

Percent of CPF busy cycles where the CPF was stalled for any reason.

Percent

CPF-L2 Utilization

Percent of total cycles counted by the CPF-L2 interface where the CPF-L2 interface was active doing any work. The ratio of CPF-L2 busy cycles over total cycles counted by the CPF-L2.

Percent

CPF-L2 Stall

Percent of CPF-L2 L2 busy cycles where the CPF-L2 interface was stalled for any reason.

Percent

CPF-UTCL1 Stall

Percent of CPF busy cycles where the CPF was stalled by address translation.

Percent

Command processor packet processor (CPC)#

Metric

Description

Unit

CPC Utilization

Percent of total cycles where the CPC was busy actively doing any work. The ratio of CPC busy cycles over total cycles counted by the CPC.

Percent

CPC Stall

Percent of CPC busy cycles where the CPC was stalled for any reason.

Percent

CPC Packet Decoding Utilization

Percent of CPC busy cycles spent decoding commands for processing.

Percent

CPC-Workgroup Manager Utilization

Percent of CPC busy cycles spent dispatching workgroups to the workgroup manager.

Percent

CPC-L2 Utilization

Percent of total cycles counted by the CPC-L2 interface where the CPC-L2 interface was active doing any work.

Percent

CPC-UTCL1 Stall

Percent of CPC busy cycles where the CPC was stalled by address translation.

Percent

CPC-UTCL2 Utilization

Percent of total cycles counted by the CPC’s L2 address translation interface where the CPC was busy doing address translation work.

Percent