GL0#
On gfx1151, TCP is the vector L1 data cache (RDNA GL0) in front of GL1C. For GL1C panels and the GL1C Memory Chart table, see GL1. The handoff toward GL2C is under GL2.
Note
GL0 is the same as TCP on the RDNA3.5 architecture. TCP is used on this page for consistency with the counters’ names.
TCP cache panels#
TCP utilization#
Metric |
Description |
Unit |
|---|---|---|
TCP Busy |
Estimated TCP utilization: 100 × TCP_GATE_EN2 / (GRBM_GUI_ACTIVE × NUM_TCP), with NUM_TCP = max(1, ⌊$cu_per_gpu / 2⌋) (one TCP per WGP, 2 CUs per WGP on RDNA3.5). Uses aggregate TCP_GATE_EN2 (not TCP_GATE_EN2_sum). GRBM_GUI_ACTIVE is scaled by NUM_TCP so the denominator matches multi-TCP capacity. GATE_EN2 is TCP_PERF_SEL_GATE_EN2 (core clocks on, not windowed). If only per-instance counter sums (typically |
Percent |
TCP request statistics#
Metric |
Description |
Unit |
|---|---|---|
Total Requests |
Total number of requests to the TCP cache including reads, writes, and atomics. |
Count per Normalization Unit |
Read Requests |
Number of read requests to the TCP cache. |
Count per Normalization Unit |
Write Requests |
Number of write requests to the TCP cache. |
Count per Normalization Unit |
Miss Requests |
Number of requests that missed in the TCP cache and required fetching from GL1C. |
Count per Normalization Unit |
TCP cache performance#
Metric |
Description |
Unit |
|---|---|---|
Hit Rate |
Percentage of TCP requests that hit in the cache. Higher hit rates reduce traffic to GL1C and improve performance. |
Percent |
TCP-GL1 interface#
Metric |
Description |
Unit |
|---|---|---|
GL1 Read Requests |
Number of read requests sent from TCP to GL1C due to cache misses. |
Count per Normalization Unit |
GL1 Read 128B Requests |
TCP→GL1 read requests that transferred 128 bytes (TCP_GL1_REQ_READ_128B_sum / $denom). Subset of total GL1 read requests; large-line fills show up here. |
Count per Normalization Unit |
GL1 Write Requests |
TCP→GL1 write or writeback requests (TCP_GL1_REQ_WRITE_sum / $denom). |
Count per Normalization Unit |
TCP stalls#
Metric |
Description |
Unit |
|---|---|---|
TA Req Stall |
Cycles TCP stalled the Texture Addresser request interface due to backpressure. |
Cycles per Normalization Unit |
GL1 Back Pressure |
Cycles TCP was stalled due to back pressure from GL1C. |
Cycles per Normalization Unit |
Data FIFO Stall |
Cycles TCP stalled because the internal data FIFO could not accept progress (TCP_DATA_FIFO_STALL_sum / $denom). Often correlates with downstream backpressure. |
Cycles per Normalization Unit |
Memory chart: path up to GL1#
The following Memory Chart tables align with the on-screen flow through instruction and scalar paths, TCP (GL0), LDS, and the TCP-GL1 interface.
Memory chart - instruction cache#
Metric |
Description |
Unit |
|---|---|---|
ICache Utilization |
Definition: Percentage of cycles the instruction cache is busy. Formula: 100 * SQC_ICACHE_BUSY_CYCLES_sum / SQ_BUSY_CYCLES_sum |
Percent |
ICache Hit Rate |
Definition: Share of instruction cache accesses that hit, using hit+miss as denominator. Formula: 100 * SQC_ICACHE_HITS_sum / (SQC_ICACHE_HITS_sum + SQC_ICACHE_MISSES_sum) when the denominator is non-zero. HITS/REQ alone can exceed 100% (per-SQ per-bank counters). |
Percent |
ICache Miss Rate |
Definition: Share of instruction cache accesses that miss (same denominator as hit rate). Formula: 100 * SQC_ICACHE_MISSES_sum / (SQC_ICACHE_HITS_sum + SQC_ICACHE_MISSES_sum) |
Percent |
ICache Requests |
Count of instruction-cache (SQC ICache) requests issued, per normalization unit. Formula: SQC_ICACHE_REQ_sum / $denom |
Requests per Normalization Unit |
ICache Request Stall Rate |
Percent of shader busy cycles where the instruction cache input interface was stalled (valid without ready). Formula: 100 * SQC_ICACHE_INPUT_VALID_READYB_sum / SQ_BUSY_CYCLES_sum. |
Percent |
ICache-GL1 Read Bandwidth |
Bytes per second of read traffic from the instruction path toward GL1 (texture-cache path instruction requests), using 128 B per SQC_TC_INST_REQ event. |
Bytes/s |
Memory chart - scalar data cache#
Metric |
Description |
Unit |
|---|---|---|
Dcache Utilization |
Definition: Percentage of cycles the scalar data cache is busy. Formula: 100 * SQC_DCACHE_BUSY_CYCLES_sum / SQ_BUSY_CYCLES_sum |
Percent |
Dcache Hit Rate |
Definition: Share of scalar data cache accesses that hit (hit+miss denominator). Formula: 100 * SQC_DCACHE_HITS_sum / (SQC_DCACHE_HITS_sum + SQC_DCACHE_MISSES_sum) when the denominator is non-zero. |
Percent |
Dcache Requests |
Count of scalar data-cache (SQC DCache) requests, per normalization unit. Formula: SQC_DCACHE_REQ_sum / $denom |
Requests per Normalization Unit |
Dcache Request Stall Rate |
Percent of shader busy cycles where the scalar data cache input interface was stalled. Formula: 100 * SQC_DCACHE_INPUT_VALID_READYB_sum / SQ_BUSY_CYCLES_sum. |
Percent |
Dcache-GL1 Read Bandwidth |
Bytes per second of scalar read traffic from SQC toward GL1 (SQC_TC_DATA_READ_REQ at 128 B per request). |
Bytes/s |
Memory chart - TCP cache (vector data cache)#
No documented metrics are currently available for this section.
Memory chart - TCP-GL1 interface#
Metric |
Description |
Unit |
|---|---|---|
TCP-GL1 Read Requests |
Read requests from TCP to GL1C (miss/refill path), per normalization unit. |
Requests per Normalization Unit |
TCP-GL1 Write Requests |
Write-related requests from TCP toward GL1C, per normalization unit. |
Requests per Normalization Unit |
TCP-GL1 Read Bandwidth |
Bytes per second for TCP→GL1 read traffic (64 B per TCP_GL1_REQ_READ event). |
Bytes/s |
TCP-GL1 Write Bandwidth |
Bytes per second for TCP→GL1 write traffic (64 B per TCP_GL1_REQ_WRITE event). |
Bytes/s |