GL1#

GL1 Cache is the shared L1 vector cache inside each shader engine on gfx115x, supplied by GL0 (TCP) and forwarding misses toward GL2. For GL0 panels and Memory Chart rows through the TCP-GL1 boundary, see GL0 (TCP Vector Cache). For downstream GL2 panels after GL1, see GL2; for DRAM / GCEA interfaces beyond GL2, see GCEA.

Note

The GL1 Cache is also referred to as GL1C in some contexts. Hardware counter names (for example, GL1C_REQ_sum) retain the GL1C prefix.

GL1 Cache panels#

GL1 Cache utilization#

Metric

Description

Unit

GL1 Cache Busy

Percentage of cycles the GL1 cache is actively processing requests. GL1 cache is shared across multiple Workgroup Processors within a Shader Engine. High utilization indicates active memory traffic from the GL0 caches.

Percent

GL1 Cache Starve

Percentage of cycles the GL1 cache had no pending requests. High starvation indicates either compute-bound workloads with minimal memory traffic, or effective GL0 caching reducing traffic to GL1.

Percent

GL1 Cache request statistics#

Metric

Description

Unit

Total Requests

Total number of requests received by the GL1 cache from all GL0 caches (TCP instances) within the shader engine. This represents aggregated memory traffic from multiple Workgroup Processors.

Count per Normalization Unit

Read Requests

Number of read requests to the GL1 cache. High read counts indicate memory-intensive load operations that missed in GL0. Compare with miss requests to assess GL1 cache effectiveness.

Count per Normalization Unit

Write Requests

Number of write requests to the GL1 cache. Write traffic includes stores that missed in GL0 and cache writebacks. High write counts may indicate write-intensive workloads.

Count per Normalization Unit

Miss Requests

Number of GL1 cache requests that missed and required fetching from GL2. High miss counts increase memory latency and traffic to GL2. Consider improving data locality at the shader engine level.

Count per Normalization Unit

GL1 Cache performance#

Metric

Description

Unit

Hit Rate

Percentage of GL1 cache requests serviced from cache. Higher hit rates reduce traffic to GL2 and improve performance. Low hit rates may indicate working sets exceeding GL1 capacity or poor data locality across Workgroup Processors.

Percent

GL1-GL2 interface#

Metric

Description

Unit

GL2 Read Requests

Number of read requests forwarded from GL1 to GL2 cache due to misses. This represents GL1 miss traffic that consumes GL2 bandwidth. High counts may indicate GL1 capacity limitations.

Count per Normalization Unit

GL2 Read 128B Requests

Number of 128-byte read requests forwarded from GL1 to GL2 cache. This represents large cache line fetches for memory-intensive workloads.

Count per Normalization Unit

GL2 Write Requests

Number of write requests forwarded from GL1 to GL2 cache. This includes writebacks and stores that missed in GL1.

Count per Normalization Unit

GL1 Cache stalls#

Metric

Description

Unit

GL2 Stall

Cycles the GL1 cache was stalled waiting for GL2 to accept requests. High stall counts indicate GL2 bandwidth saturation or contention, limiting GL1 throughput.

Cycles per Normalization Unit

LFIFO Full Stall

Cycles the GL1 cache was stalled due to the LFIFO (Load FIFO) being full. High stall counts indicate data return path congestion from GL2 to GL1.

Cycles per Normalization Unit

Memory chart: GL1 cache and GL1-GL2 interface#

The following Memory Chart tables align with the on-screen flow through GL1 and the GL1-GL2 interface.

Memory chart - GL1 cache#

Metric

Description

Unit

GL1 Cache Utilization

Percentage of cycles the GL1 cache is actively processing requests. GL1 cache is shared across multiple workgroup processors within a shader engine. High utilization indicates active memory traffic through the GL1 cache.

Percent

GL1 Cache Hit Rate

Percentage of L1 cache requests that hit in cache. Higher hit rates reduce traffic to the L2 cache and improve memory access latency. Low hit rates may indicate poor data locality or working sets exceeding L1 capacity.

Percent

Memory chart - GL1-GL2 interface#

Metric

Description

Unit

GL1-GL2 Read Requests

Read requests from GL1C to GL2C per normalization unit (GL1C_GL2_REQ_READ_sum).

Requests per Normalization Unit

GL1-GL2 Write Requests

Write requests from GL1C to GL2C per normalization unit (GL1C_GL2_REQ_WRITE_sum).

Requests per Normalization Unit

GL1-GL2 Read Bandwidth

Bytes per second on the GL1C→GL2C read interface (32/64/128 B request bins).

Bytes/s

GL1-GL2 Write Bandwidth

Bytes per second on the GL1C→GL2C write interface (32/64 B request bins).

Bytes/s

GL1-GL2 Read Latency

Average cycles from GL1C read request to response (GL1C_GL2_REQ_READ_LATENCY_sum / GL1C_GL2_REQ_READ_sum) when the denominator is non-zero.

Cycles

GL1-GL2 Write Latency

Average cycles from GL1C write request to completion (GL1C_GL2_REQ_WRITE_LATENCY_sum / GL1C_GL2_REQ_WRITE_sum) when the denominator is non-zero.

Cycles