GL0

GL0#

On gfx1151, TCP is the vector L1 data cache (RDNA GL0) in front of GL1C. For GL1C panels and the GL1C Memory Chart table, see GL1. The handoff toward GL2C is under GL2.

Note

GL0 is the same as TCP on the RDNA3.5 architecture. TCP is used on this page for consistency with the counters’ names.

TCP cache panels#

TCP utilization#

RDNA 3.5 (gfx1151)

Metric	Description	Unit
TCP Busy	Estimated TCP utilization: 100 × TCP_GATE_EN2 / (GRBM_GUI_ACTIVE × NUM_TCP), with NUM_TCP = max(1, ⌊$cu_per_gpu / 2⌋) (one TCP per WGP, 2 CUs per WGP on RDNA3.5). Uses aggregate TCP_GATE_EN2 (not TCP_GATE_EN2_sum). GRBM_GUI_ACTIVE is scaled by NUM_TCP so the denominator matches multi-TCP capacity. GATE_EN2 is TCP_PERF_SEL_GATE_EN2 (core clocks on, not windowed). If only per-instance counter sums (typically `_sum` columns) are available, align the numerator/denominator with your CSV layout.	Percent

TCP request statistics#

RDNA 3.5 (gfx1151)

Metric	Description	Unit
Total Requests	Total number of requests to the TCP cache including reads, writes, and atomics.	Count per Normalization Unit
Read Requests	Number of read requests to the TCP cache.	Count per Normalization Unit
Write Requests	Number of write requests to the TCP cache.	Count per Normalization Unit
Miss Requests	Number of requests that missed in the TCP cache and required fetching from GL1C.	Count per Normalization Unit

TCP cache performance#

RDNA 3.5 (gfx1151)

Metric	Description	Unit
Hit Rate	Percentage of TCP requests that hit in the cache. Higher hit rates reduce traffic to GL1C and improve performance.	Percent

TCP-GL1 interface#

RDNA 3.5 (gfx1151)

Metric	Description	Unit
GL1 Read Requests	Number of read requests sent from TCP to GL1C due to cache misses.	Count per Normalization Unit
GL1 Read 128B Requests	TCP→GL1 read requests that transferred 128 bytes (TCP_GL1_REQ_READ_128B_sum / $denom). Subset of total GL1 read requests; large-line fills show up here.	Count per Normalization Unit
GL1 Write Requests	TCP→GL1 write or writeback requests (TCP_GL1_REQ_WRITE_sum / $denom).	Count per Normalization Unit

TCP stalls#

RDNA 3.5 (gfx1151)

Metric	Description	Unit
TA Req Stall	Cycles TCP stalled the Texture Addresser request interface due to backpressure.	Cycles per Normalization Unit
GL1 Back Pressure	Cycles TCP was stalled due to back pressure from GL1C.	Cycles per Normalization Unit
Data FIFO Stall	Cycles TCP stalled because the internal data FIFO could not accept progress (TCP_DATA_FIFO_STALL_sum / $denom). Often correlates with downstream backpressure.	Cycles per Normalization Unit

Memory chart: path up to GL1#

The following Memory Chart tables align with the on-screen flow through instruction and scalar paths, TCP (GL0), LDS, and the TCP-GL1 interface.

Memory chart - instruction cache#

RDNA 3.5 (gfx1151)

Metric	Description	Unit
ICache Utilization	Definition: Percentage of cycles the instruction cache is busy. Formula: 100 * SQC_ICACHE_BUSY_CYCLES_sum / SQ_BUSY_CYCLES_sum	Percent
ICache Hit Rate	Definition: Share of instruction cache accesses that hit, using hit+miss as denominator. Formula: 100 * SQC_ICACHE_HITS_sum / (SQC_ICACHE_HITS_sum + SQC_ICACHE_MISSES_sum) when the denominator is non-zero. HITS/REQ alone can exceed 100% (per-SQ per-bank counters).	Percent
ICache Miss Rate	Definition: Share of instruction cache accesses that miss (same denominator as hit rate). Formula: 100 * SQC_ICACHE_MISSES_sum / (SQC_ICACHE_HITS_sum + SQC_ICACHE_MISSES_sum)	Percent
ICache Requests	Count of instruction-cache (SQC ICache) requests issued, per normalization unit. Formula: SQC_ICACHE_REQ_sum / $denom	Requests per Normalization Unit
ICache Request Stall Rate	Percent of shader busy cycles where the instruction cache input interface was stalled (valid without ready). Formula: 100 * SQC_ICACHE_INPUT_VALID_READYB_sum / SQ_BUSY_CYCLES_sum.	Percent
ICache-GL1 Read Bandwidth	Bytes per second of read traffic from the instruction path toward GL1 (texture-cache path instruction requests), using 128 B per SQC_TC_INST_REQ event.	Bytes/s

Memory chart - scalar data cache#

RDNA 3.5 (gfx1151)

Metric	Description	Unit
Dcache Utilization	Definition: Percentage of cycles the scalar data cache is busy. Formula: 100 * SQC_DCACHE_BUSY_CYCLES_sum / SQ_BUSY_CYCLES_sum	Percent
Dcache Hit Rate	Definition: Share of scalar data cache accesses that hit (hit+miss denominator). Formula: 100 * SQC_DCACHE_HITS_sum / (SQC_DCACHE_HITS_sum + SQC_DCACHE_MISSES_sum) when the denominator is non-zero.	Percent
Dcache Requests	Count of scalar data-cache (SQC DCache) requests, per normalization unit. Formula: SQC_DCACHE_REQ_sum / $denom	Requests per Normalization Unit
Dcache Request Stall Rate	Percent of shader busy cycles where the scalar data cache input interface was stalled. Formula: 100 * SQC_DCACHE_INPUT_VALID_READYB_sum / SQ_BUSY_CYCLES_sum.	Percent
Dcache-GL1 Read Bandwidth	Bytes per second of scalar read traffic from SQC toward GL1 (SQC_TC_DATA_READ_REQ at 128 B per request).	Bytes/s

Memory chart - TCP cache (vector data cache)#

RDNA 3.5 (gfx1151)

No documented metrics are currently available for this section.

Memory chart - LDS (local data share)#

RDNA 3.5 (gfx1151)

Metric	Description	Unit
LDS Atomic Instructions	Definition: Number of LDS atomic instructions executed. Description: Atomic operations on Local Data Share memory with return values. Formula: SQ_LDS_ATOMIC_RETURN_sum / $denom	Instructions per Normalization Unit
LDS Bank Conflict Rate	Definition: Percentage of LDS accesses that resulted in bank conflicts. Description: Bank conflicts occur when multiple work-items access the same LDS bank simultaneously. Formula: 100 * SQC_LDS_BANK_CONFLICT_sum / SQC_LDS_IDX_ACTIVE_sum	Percent
LDS Estimated Bandwidth	Definition: Estimated LDS bandwidth based on active index cycles minus bank conflicts. Description: RDNA3.5 has 32 LDS banks per CU, each capable of 4 bytes per cycle. Formula: ((SQC_LDS_IDX_ACTIVE_sum - SQC_LDS_BANK_CONFLICT_sum) * 4 * 32) / time	Bytes/s
LDS Instructions	Total LDS (Local Data Share) instructions executed per normalization unit (SQ_INSTS_LDS_sum).	Instructions per Normalization Unit
LDS Instruction Cycles	Cycles spent executing LDS instructions per normalization unit (SQ_INST_CYCLES_LDS_sum).	Cycles per Normalization Unit
LDS Wait Cycles	Cycles waves spent waiting on LDS (wait-state attribution), per normalization unit (SQ_WAIT_INST_LDS_sum).	Cycles per Normalization Unit

Memory chart - TCP-GL1 interface#

RDNA 3.5 (gfx1151)

Metric	Description	Unit
TCP-GL1 Read Requests	Read requests from TCP to GL1C (miss/refill path), per normalization unit.	Requests per Normalization Unit
TCP-GL1 Write Requests	Write-related requests from TCP toward GL1C, per normalization unit.	Requests per Normalization Unit
TCP-GL1 Read Bandwidth	Bytes per second for TCP→GL1 read traffic (64 B per TCP_GL1_REQ_READ event).	Bytes/s
TCP-GL1 Write Bandwidth	Bytes per second for TCP→GL1 write traffic (64 B per TCP_GL1_REQ_WRITE event).	Bytes/s

GL0

Contents

GL0#

TCP cache panels#

TCP utilization#

TCP request statistics#

TCP cache performance#

TCP-GL1 interface#

TCP stalls#

Memory chart: path up to GL1#

Memory chart - instruction cache#

Memory chart - scalar data cache#

Memory chart - TCP cache (vector data cache)#

Memory chart - LDS (local data share)#

Memory chart - TCP-GL1 interface#