System Speed-of-Light

System Speed-of-Light#

This page lists System Speed-of-Light metrics for RDNA3.5 (gfx1151) when you profile with the shipped gfx1151 analysis configuration. The same metric keys appear in the RDNA3.5 (gfx1151) tab of the analysis report.

Note

For AMD Instinct accelerators (CDNA-CDNA4), see System Speed-of-Light.

Other gfx1151 metric tables grouped by hardware block live under Workgroup processor (WGP), GL0, GL1, GL2, Shader engine, and Command processor (CP).

Warning

Theoretical peaks use the maximum clock frequency reported for the GPU (for example via rocminfo). That may not match sustained clocks under your workload.

RDNA 3.5 (gfx1151)

Metric	Description	Unit
VALU FLOPs (FP16)	Sixteen-bit VALU throughput (GFLOP/s) from SQ_INSTS_VALU × 64 / kernel time. Theoretical peak uses packed FP16 at 2× the FP32 per-CU rate: max_sclk (MHz) × $cu_per_gpu × 256 / 1000 GFLOP/s. $cu_per_gpu is total CU count, not WGP (1 WGP = 2 CUs on RDNA 3.5).	GFLOP/s
VALU FLOPs (FP32)	Thirty-two-bit VALU throughput (GFLOP/s) from SQ_INSTS_VALU × 64 / kernel time. Theoretical peak: max_sclk (MHz) × $cu_per_gpu × 128 / 1000 GFLOP/s (wave64 / dual-issue VALU class used for Ryzen APUs). $cu_per_gpu is the total number of Compute Units from system info, not WGP count (on RDNA 3.5, one WGP pairs two CUs).	GFLOP/s
IPC	Instructions Per Cycle — ratio of instructions executed over busy cycles. Calculated as (Total Instructions - Internal Instructions) / Busy Cycles. Higher IPC indicates better instruction throughput.	Instr/cycle
Wavefront Occupancy	Average wavefront occupancy proxy: SQ_WAVE_CYCLES_sum / SQ_BUSY_CYCLES_avr per sample, where SQ_BUSY_CYCLES_avr is the mean of per–shader-engine SQ_BUSY_CYCLES (active-wave cycles per SE). Peak is $max_waves_per_cu × $cu_per_gpu (theoretical in-flight cap). Pct of peak scales the average against that ceiling.	Wavefronts
L2 Cache Hit Rate	GL2C last-level cache hit rate (same as Memory Chart / GL2C panel). Formula: 100 * GL2C_HIT_sum / (GL2C_HIT_sum + GL2C_MISS_sum) when the denominator is non-zero.	Percent
L2-Fabric Read BW	GL2C/EA read bandwidth from sized request counters (gl2c_perf_sel_ea_rdreq_32b/64B/96B/128B): 32·GL2C_EA_RDREQ_32B_sum + 64·GL2C_EA_RDREQ_64B_sum + 96·GL2C_EA_RDREQ_96B_sum + 128·GL2C_EA_RDREQ_128B_sum, divided by kernel wall time. Sums over GL2C instances. Peak / % of peak are N/A (no sysinfo memory BW).	Bytes/s
L2-Fabric Write BW	GL2C/EA write bandwidth: GL2C_MC_WRREQ_sum is the EA write transaction count (32 B or 64 B per txn; the spec names this path MC_WRREQ). Bandwidth = 32·(GL2C_MC_WRREQ_sum − GL2C_EA_WRREQ_64B_sum) + 64·GL2C_EA_WRREQ_64B_sum, over time. Peak / % of peak are N/A (no sysinfo memory BW).	Bytes/s
LDS Bank Conflicts	Number of LDS bank conflicts. Peak / % of Peak are N/A for raw counts.	Conflicts
Wave Dependency Wait	Percent of wave lifetime spent waiting for data dependencies (e.g. VMEM/LDS results).	Percent
Wave Issue Wait	Percent of wave lifetime spent waiting for any instruction to be ready to issue.	Percent
TCP Cache Hit Rate	Texture Cache Per-pipe (TCP) hit rate. TCP is the L0 vector cache in RDNA3.5. Same as Memory Chart TCP Hit Rate: 100 * (TCP_REQ_sum - TCP_REQ_MISS_sum) / TCP_REQ_sum.	Percent
TCP Cache BW	Same as Memory Chart “TCP Request Bandwidth” (table 303): TCP_REQ_sum × 64 B / kernel wall time. Peak / % of peak are N/A (same as Memory Chart — no theoretical ceiling row).	Bytes/s
GL1C Hit Rate	GL1 Cache hit rate (same as Memory Chart / GL1C panel). Formula: 100 * (GL1C_REQ_sum - GL1C_REQ_MISS_sum) / GL1C_REQ_sum when GL1C_REQ_sum != 0.	Percent
GL1C Read BW	Same as Memory Chart table 307 “GL1C-GL2 Read Bandwidth”: 32·GL1C_GL2_REQ_READ_32B_sum + 64·GL1C_GL2_REQ_READ_64B_sum + 128·GL1C_GL2_REQ_READ_128B_sum, over time. Peak / % of peak are N/A.	Bytes/s
GL2C Cache BW	Same as Memory Chart table 308 “GL2C Read Bandwidth”: sized client read bins plus 96 B compressed reads, over time. Peak / % of peak are N/A.	Bytes/s
Scalar Data Cache Hit Rate	Same as Memory Chart Dcache Hit Rate: 100 * SQC_DCACHE_HITS_sum / (SQC_DCACHE_HITS_sum + SQC_DCACHE_MISSES_sum) when the denominator is non-zero.	Percent
Scalar Data Cache BW	Same as Memory Chart table 302 “Dcache-GL1 Read Bandwidth”: SQC_TC_DATA_READ_REQ_sum × 128 B / kernel time. Peak / % of peak are N/A.	Bytes/s
Instruction Cache Hit Rate	Same as Memory Chart Icache Hit Rate: 100 * SQC_ICACHE_HITS_sum / (SQC_ICACHE_HITS_sum + SQC_ICACHE_MISSES_sum) when the denominator is non-zero.	Percent
Instruction Cache BW	Same as Memory Chart table 301 “ICache-GL1 Read Bandwidth”: SQC_TC_INST_REQ_sum × 128 B / kernel time. Peak / % of peak are N/A.	Bytes/s