Performance analysis glossary

Performance analysis glossary#

2026-02-20

3 min read time

Applies to Linux and Windows

This section provides brief definitions of performance analysis concepts and optimization techniques.

Active cycle#: An active cycle is a clock cycle in which a compute unit has at least one active wavefront resident. See Warp (Wavefront) execution states for details.
Arithmetic bandwidth#: Arithmetic bandwidth is the peak rate at which arithmetic work can be performed, defining the compute roof in roofline models. See Compute-bound performance for details.
Arithmetic intensity#: Arithmetic intensity is the ratio of arithmetic operations to memory operations in a kernel, and determines performance characteristics. See Arithmetic intensity for intensity analysis.
Bank conflict#: A bank conflict occurs when multiple threads simultaneously access different addresses in the same LDS bank, serializing accesses. See Bank conflict theory for details.
Branch efficiency#: Branch efficiency measures how often all threads within a wavefront take the same execution path, quantifying control-flow uniformity. See Branch efficiency for branch analysis.
Compute-bound#: Compute-bound kernels are limited by the arithmetic bandwidth of the GPU’s compute units rather than memory bandwidth. See Compute-bound performance for compute-bound analysis.
CU utilization#: CU utilization measures the percentage of time that compute units are actively executing instructions. See CU utilization for utilization analysis.
Issue efficiency#: Issue efficiency measures how effectively the wavefront scheduler keeps execution pipelines busy by issuing instructions. See Issue efficiency for efficiency metrics.
Latency hiding#: Latency hiding masks long-latency operations by running many concurrent threads, keeping execution pipelines busy. See Latency hiding mechanisms for details.
Little’s Law#: Little’s Law relates concurrency, latency, and throughput, determining how much independent work must be in flight to hide latency. See Little’s Law for latency hiding details.
Memory bandwidth#: Memory bandwidth is the maximum rate at which data can be transferred between memory hierarchy levels, typically measured in bytes per second. See Memory-bound performance for details.
Memory coalescing#: Memory coalescing improves memory bandwidth by servicing many logical loads or stores with fewer physical memory transactions. See Memory coalescing theory for coalescing patterns.
Memory-bound#: Memory-bound kernels are limited by memory bandwidth rather than arithmetic bandwidth, typically due to low arithmetic intensity. See Memory-bound performance for memory-bound analysis.
Occupancy#: Occupancy is the ratio of active wavefronts to the maximum number of wavefronts that can be active on a compute unit. See Occupancy theory for occupancy analysis.
Overhead#: Overhead latency is the time spent with no useful work being done, often due to CPU-side bottlenecks or kernel launch delays. See Performance bottlenecks for details.
Peak rate#: Peak rate is the theoretical maximum throughput at which a hardware system can complete work under ideal conditions. See Theoretical performance limits for details.
Pipe utilization#: Pipe utilization measures how effectively a kernel uses the execution pipelines within each compute unit. See Pipe utilization for utilization details.
Register pressure#: Register pressure occurs when excessive register demand limits the number of active wavefronts per compute unit, reducing occupancy. See Register pressure theory for details.
Roofline model#: The roofline model is a visual performance model that determines whether a program is compute-bound or memory-bound. See Roofline model for roofline analysis.
Wavefront divergence#: Wavefront divergence occurs when threads within a wavefront take different execution paths due to conditional statements. See Branch efficiency for divergence handling details.
Wavefront execution state#: Wavefront execution states (active, stalled, eligible, selected) describe the scheduling status of wavefronts on AMD GPUs. See Warp (Wavefront) execution states for state definitions.