GL0 (TCP Vector Cache)#
GL0 is the vector-side TCP cache immediately in front of GL1 inside each shader engine datapath (hardware counters keep the TCP_* prefix on gfx1151).
For GL1 panels and the GL1 Cache Memory Chart table, see GL1. The handoff toward GL2 cache is under GL2.
Note
On RDNA3.5, GL0 and TCP refer to the same cache. Hardware counter names
(for example, TCP_REQ_sum) retain the TCP prefix.
GL0 cache panels#
GL0 utilization#
Metric |
Description |
Unit |
|---|---|---|
GL0 Busy (TCP) |
Percentage of cycles the GL0 vector cache (TCP) is actively processing requests. Each Workgroup Processor has its own GL0/TCP instance. Low utilization may indicate compute-bound workloads with minimal memory traffic or idle shader engines. |
Percent |
GL0 request statistics#
Metric |
Description |
Unit |
|---|---|---|
Total Requests |
Total number of requests to the GL0 vector cache, including reads, writes, and atomics. This represents the overall memory traffic generated by vector memory instructions from the shader cores. |
Count per Normalization Unit |
Read Requests |
Number of read requests to the GL0 vector cache. High read counts indicate memory-intensive load operations. Compare with hit rate to assess cache effectiveness for read traffic. |
Count per Normalization Unit |
Write Requests |
Number of write requests to the GL0 vector cache. Write traffic may include global memory stores and cache writebacks. High write counts indicate write-intensive workloads. |
Count per Normalization Unit |
Miss Requests |
Number of GL0 cache requests that missed and required fetching from the GL1 cache. High miss counts increase memory access latency. Consider improving data locality or access patterns to reduce misses. |
Count per Normalization Unit |
GL0 cache performance#
Metric |
Description |
Unit |
|---|---|---|
Hit Rate |
Percentage of GL0 cache requests serviced from cache without accessing GL1 cache. Higher hit rates indicate better data locality and lower memory access latency. Low hit rates may indicate working sets exceeding GL0 capacity or poor access patterns. |
Percent |
GL0-GL1 interface#
Metric |
Description |
Unit |
|---|---|---|
GL1 Read Requests |
Number of read requests forwarded from GL0 (TCP) to GL1 cache due to misses. This represents GL0 miss traffic that must be serviced by higher cache levels. |
Count per Normalization Unit |
GL1 Read 128B Requests |
Number of 128-byte read requests forwarded from GL0 (TCP) to GL1 cache. This represents large cache line fetches for memory-intensive workloads. |
Count per Normalization Unit |
GL1 Write Requests |
Number of write requests forwarded from GL0 (TCP) to GL1 cache. This includes writebacks and stores that missed in GL0. |
Count per Normalization Unit |
GL0 stalls#
Metric |
Description |
Unit |
|---|---|---|
TA Req Stall |
Cycles the Texture Addresser was stalled waiting for the GL0 cache to accept requests. High stall counts indicate GL0 cache backpressure limiting memory request throughput. |
Cycles per Normalization Unit |
GL1 Back Pressure |
Cycles the GL0 cache was stalled due to backpressure from the GL1 cache. High values indicate GL1 cache contention or bandwidth limitations impacting GL0 throughput. |
Cycles per Normalization Unit |
Data FIFO Stall |
Cycles the GL0 cache data FIFO was stalled. High stall counts may indicate data path congestion or insufficient buffering for high-throughput workloads. |
Cycles per Normalization Unit |
Memory chart: path up to GL1#
The following Memory Chart tables align with the on-screen flow through instruction and scalar paths, GL0 (TCP), LDS, and the TCP-GL1 interface.
Memory chart - instruction cache#
Metric |
Description |
Unit |
|---|---|---|
ICache Utilization |
Percentage of shader busy cycles spent actively servicing instruction fetch requests. High utilization indicates the instruction cache is keeping pace with shader execution. Low utilization may indicate instruction cache misses causing stalls, or idle shaders. |
Percent |
ICache Hit Rate |
Percentage of instruction cache accesses that are serviced from cache without fetching from the GL1 cache. High hit rates indicate good instruction locality. Low hit rates may indicate large kernels exceeding instruction cache capacity or divergent branches. |
Percent |
ICache Miss Rate |
Percentage of instruction cache accesses that miss and require fetching from the GL1 cache. High miss rates increase instruction fetch latency and may cause shader stalls. Consider reducing kernel code size or improving branch coherence. |
Percent |
ICache Requests |
Count of instruction-cache (SQC ICache) requests issued, per normalization unit. Formula: SQC_ICACHE_REQ_sum / $denom |
Requests per Normalization Unit |
ICache Request Stall Rate |
Percent of shader busy cycles where the instruction cache input interface was stalled (valid without ready). Formula: 100 * SQC_ICACHE_INPUT_VALID_READYB_sum / SQ_BUSY_CYCLES_sum. |
Percent |
ICache-GL1 Read Bandwidth |
Bytes per second of read traffic from the instruction path toward GL1 (texture-cache path instruction requests), using 128 B per SQC_TC_INST_REQ event. |
Bytes/s |
Memory chart - scalar data cache#
Metric |
Description |
Unit |
|---|---|---|
Dcache Utilization |
Percentage of shader busy cycles spent actively servicing scalar data cache requests. The scalar data cache holds uniform data accessed by scalar instructions. High utilization indicates active use of scalar memory operations. |
Percent |
Dcache Hit Rate |
Percentage of scalar data cache accesses that hit in cache. High hit rates indicate efficient reuse of constant and uniform data. Low hit rates may indicate excessive unique constant data or poor temporal locality. |
Percent |
Dcache Requests |
Count of scalar data-cache (SQC DCache) requests, per normalization unit. Formula: SQC_DCACHE_REQ_sum / $denom |
Requests per Normalization Unit |
Dcache Request Stall Rate |
Percent of shader busy cycles where the scalar data cache input interface was stalled. Formula: 100 * SQC_DCACHE_INPUT_VALID_READYB_sum / SQ_BUSY_CYCLES_sum. |
Percent |
Dcache-GL1 Read Bandwidth |
Bytes per second of scalar read traffic from SQC toward GL1 (SQC_TC_DATA_READ_REQ at 128 B per request). |
Bytes/s |
Memory chart - TCP cache (GL0 vector cache)#
Metric |
Description |
Unit |
|---|---|---|
GL0 Cache Hit Rate (TCP Cache) |
Percentage of GL0 vector cache (TCP Cache) requests serviced from cache. TCP Cache is the first-level cache for vector memory operations. Higher hit rates reduce traffic to the GL1 cache and lower memory access latency for vector operations. |
Percent |
TCP Total Requests |
Total TCP (vector L0) requests per normalization unit (reads, writes and related traffic aggregated in TCP_REQ_sum). |
Requests per Normalization Unit |
TCP Read Requests |
TCP read requests per normalization unit (TCP_REQ_READ_sum). |
Requests per Normalization Unit |
TCP Write Requests |
TCP write requests per normalization unit (TCP_REQ_WRITE_sum). |
Requests per Normalization Unit |
TCP Miss Requests |
TCP requests that missed in the L0 vector cache and required a fill from GL1C (TCP_REQ_MISS_sum / $denom). |
Requests per Normalization Unit |
Memory chart - TCP-GL1 interface#
Metric |
Description |
Unit |
|---|---|---|
TCP-GL1 Read Requests |
Read requests from TCP to GL1C (miss/refill path), per normalization unit. |
Requests per Normalization Unit |
TCP-GL1 Write Requests |
Write-related requests from TCP toward GL1C, per normalization unit. |
Requests per Normalization Unit |
TCP-GL1 Read Bandwidth |
Bytes per second for TCP→GL1 read traffic (64 B per TCP_GL1_REQ_READ event). |
Bytes/s |
TCP-GL1 Write Bandwidth |
Bytes per second for TCP→GL1 write traffic (64 B per TCP_GL1_REQ_WRITE event). |
Bytes/s |