GL0#

On gfx1151, TCP is the vector L1 data cache (RDNA GL0) in front of GL1C. For GL1C panels and the GL1C Memory Chart table, see GL1. The handoff toward GL2C is under GL2.

Note

GL0 is the same as TCP on the RDNA3.5 architecture. TCP is used on this page for consistency with the counters’ names.

TCP cache panels#

TCP utilization#

Metric

Description

Unit

TCP Busy

Estimated TCP utilization: 100 × TCP_GATE_EN2 / (GRBM_GUI_ACTIVE × NUM_TCP), with NUM_TCP = max(1, ⌊$cu_per_gpu / 2⌋) (one TCP per WGP, 2 CUs per WGP on RDNA3.5). Uses aggregate TCP_GATE_EN2 (not TCP_GATE_EN2_sum). GRBM_GUI_ACTIVE is scaled by NUM_TCP so the denominator matches multi-TCP capacity. GATE_EN2 is TCP_PERF_SEL_GATE_EN2 (core clocks on, not windowed). If only per-instance counter sums (typically _sum columns) are available, align the numerator/denominator with your CSV layout.

Percent

TCP request statistics#

Metric

Description

Unit

Total Requests

Total number of requests to the TCP cache including reads, writes, and atomics.

Count per Normalization Unit

Read Requests

Number of read requests to the TCP cache.

Count per Normalization Unit

Write Requests

Number of write requests to the TCP cache.

Count per Normalization Unit

Miss Requests

Number of requests that missed in the TCP cache and required fetching from GL1C.

Count per Normalization Unit

TCP cache performance#

Metric

Description

Unit

Hit Rate

Percentage of TCP requests that hit in the cache. Higher hit rates reduce traffic to GL1C and improve performance.

Percent

TCP-GL1 interface#

Metric

Description

Unit

GL1 Read Requests

Number of read requests sent from TCP to GL1C due to cache misses.

Count per Normalization Unit

GL1 Read 128B Requests

TCP→GL1 read requests that transferred 128 bytes (TCP_GL1_REQ_READ_128B_sum / $denom). Subset of total GL1 read requests; large-line fills show up here.

Count per Normalization Unit

GL1 Write Requests

TCP→GL1 write or writeback requests (TCP_GL1_REQ_WRITE_sum / $denom).

Count per Normalization Unit

TCP stalls#

Metric

Description

Unit

TA Req Stall

Cycles TCP stalled the Texture Addresser request interface due to backpressure.

Cycles per Normalization Unit

GL1 Back Pressure

Cycles TCP was stalled due to back pressure from GL1C.

Cycles per Normalization Unit

Data FIFO Stall

Cycles TCP stalled because the internal data FIFO could not accept progress (TCP_DATA_FIFO_STALL_sum / $denom). Often correlates with downstream backpressure.

Cycles per Normalization Unit

Memory chart: path up to GL1#

The following Memory Chart tables align with the on-screen flow through instruction and scalar paths, TCP (GL0), LDS, and the TCP-GL1 interface.

Memory chart - instruction cache#

Metric

Description

Unit

ICache Utilization

Definition: Percentage of cycles the instruction cache is busy. Formula: 100 * SQC_ICACHE_BUSY_CYCLES_sum / SQ_BUSY_CYCLES_sum

Percent

ICache Hit Rate

Definition: Share of instruction cache accesses that hit, using hit+miss as denominator. Formula: 100 * SQC_ICACHE_HITS_sum / (SQC_ICACHE_HITS_sum + SQC_ICACHE_MISSES_sum) when the denominator is non-zero. HITS/REQ alone can exceed 100% (per-SQ per-bank counters).

Percent

ICache Miss Rate

Definition: Share of instruction cache accesses that miss (same denominator as hit rate). Formula: 100 * SQC_ICACHE_MISSES_sum / (SQC_ICACHE_HITS_sum + SQC_ICACHE_MISSES_sum)

Percent

ICache Requests

Count of instruction-cache (SQC ICache) requests issued, per normalization unit. Formula: SQC_ICACHE_REQ_sum / $denom

Requests per Normalization Unit

ICache Request Stall Rate

Percent of shader busy cycles where the instruction cache input interface was stalled (valid without ready). Formula: 100 * SQC_ICACHE_INPUT_VALID_READYB_sum / SQ_BUSY_CYCLES_sum.

Percent

ICache-GL1 Read Bandwidth

Bytes per second of read traffic from the instruction path toward GL1 (texture-cache path instruction requests), using 128 B per SQC_TC_INST_REQ event.

Bytes/s

Memory chart - scalar data cache#

Metric

Description

Unit

Dcache Utilization

Definition: Percentage of cycles the scalar data cache is busy. Formula: 100 * SQC_DCACHE_BUSY_CYCLES_sum / SQ_BUSY_CYCLES_sum

Percent

Dcache Hit Rate

Definition: Share of scalar data cache accesses that hit (hit+miss denominator). Formula: 100 * SQC_DCACHE_HITS_sum / (SQC_DCACHE_HITS_sum + SQC_DCACHE_MISSES_sum) when the denominator is non-zero.

Percent

Dcache Requests

Count of scalar data-cache (SQC DCache) requests, per normalization unit. Formula: SQC_DCACHE_REQ_sum / $denom

Requests per Normalization Unit

Dcache Request Stall Rate

Percent of shader busy cycles where the scalar data cache input interface was stalled. Formula: 100 * SQC_DCACHE_INPUT_VALID_READYB_sum / SQ_BUSY_CYCLES_sum.

Percent

Dcache-GL1 Read Bandwidth

Bytes per second of scalar read traffic from SQC toward GL1 (SQC_TC_DATA_READ_REQ at 128 B per request).

Bytes/s

Memory chart - TCP cache (vector data cache)#

No documented metrics are currently available for this section.

Memory chart - LDS (local data share)#

Metric

Description

Unit

LDS Atomic Instructions

Definition: Number of LDS atomic instructions executed. Description: Atomic operations on Local Data Share memory with return values. Formula: SQ_LDS_ATOMIC_RETURN_sum / $denom

Instructions per Normalization Unit

LDS Bank Conflict Rate

Definition: Percentage of LDS accesses that resulted in bank conflicts. Description: Bank conflicts occur when multiple work-items access the same LDS bank simultaneously. Formula: 100 * SQC_LDS_BANK_CONFLICT_sum / SQC_LDS_IDX_ACTIVE_sum

Percent

LDS Estimated Bandwidth

Definition: Estimated LDS bandwidth based on active index cycles minus bank conflicts. Description: RDNA3.5 has 32 LDS banks per CU, each capable of 4 bytes per cycle. Formula: ((SQC_LDS_IDX_ACTIVE_sum - SQC_LDS_BANK_CONFLICT_sum) * 4 * 32) / time

Bytes/s

LDS Instructions

Total LDS (Local Data Share) instructions executed per normalization unit (SQ_INSTS_LDS_sum).

Instructions per Normalization Unit

LDS Instruction Cycles

Cycles spent executing LDS instructions per normalization unit (SQ_INST_CYCLES_LDS_sum).

Cycles per Normalization Unit

LDS Wait Cycles

Cycles waves spent waiting on LDS (wait-state attribution), per normalization unit (SQ_WAIT_INST_LDS_sum).

Cycles per Normalization Unit

Memory chart - TCP-GL1 interface#

Metric

Description

Unit

TCP-GL1 Read Requests

Read requests from TCP to GL1C (miss/refill path), per normalization unit.

Requests per Normalization Unit

TCP-GL1 Write Requests

Write-related requests from TCP toward GL1C, per normalization unit.

Requests per Normalization Unit

TCP-GL1 Read Bandwidth

Bytes per second for TCP→GL1 read traffic (64 B per TCP_GL1_REQ_READ event).

Bytes/s

TCP-GL1 Write Bandwidth

Bytes per second for TCP→GL1 write traffic (64 B per TCP_GL1_REQ_WRITE event).

Bytes/s