Performance analysis glossary

Performance analysis glossary#

2026-02-20

5 min read time

Applies to Linux and Windows

This section provides brief definitions of performance analysis concepts and optimization techniques.

Active cycle#

An active cycle is a clock cycle in which a compute unit has at least one active wavefront resident. See Wavefront execution states for details.

Arithmetic bandwidth#

Arithmetic bandwidth is the peak rate at which arithmetic work can be performed, defining the compute roof in roofline models. See Compute-bound performance for details.

Arithmetic intensity#

Arithmetic intensity is the ratio of arithmetic operations to memory operations in a kernel, and determines performance characteristics. See Arithmetic intensity for intensity analysis.

Bank conflict#

A bank conflict occurs when multiple threads simultaneously access different addresses in the same LDS bank, serializing accesses. See Bank conflict theory for details.

Branch efficiency#

Branch efficiency measures how often all threads within a wavefront take the same execution path, quantifying control-flow uniformity. See Branch efficiency for branch analysis.

Compute-bound#

Compute-bound kernels are limited by the arithmetic bandwidth of the GPU’s compute units rather than memory bandwidth. See Compute-bound performance for compute-bound analysis.

CU utilization#

CU utilization measures the percentage of time that compute units are actively executing instructions. See CU utilization for utilization analysis.

Issue efficiency#

Issue efficiency measures how effectively the wavefront scheduler keeps execution pipelines busy by issuing instructions. See Issue efficiency for efficiency metrics.

Latency hiding#

Latency hiding masks long-latency operations by running many concurrent threads, keeping execution pipelines busy. See Latency hiding mechanisms for details.

Little’s Law#

Little’s Law relates concurrency, latency, and throughput, determining how much independent work must be in flight to hide latency. See Little’s Law for latency hiding details.

Memory bandwidth#

Memory bandwidth is the maximum rate at which data can be transferred between memory hierarchy levels, typically measured in bytes per second. See Memory-bound performance for details.

Memory coalescing#

Memory coalescing improves memory bandwidth by servicing many logical loads or stores with fewer physical memory transactions. See Memory coalescing theory for coalescing patterns.

Memory-bound#

Memory-bound kernels are limited by memory bandwidth rather than arithmetic bandwidth, typically due to low arithmetic intensity. See Memory-bound performance for memory-bound analysis.

Occupancy#

Occupancy is the ratio of active wavefronts to the maximum number of wavefronts that can be active on a compute unit. See Occupancy theory for occupancy analysis.

Overhead#

Overhead latency is the time spent with no useful work being done, often due to CPU-side bottlenecks or kernel launch delays. See Performance bottlenecks for details.

Peak rate#

Peak rate is the theoretical maximum throughput at which a hardware system can complete work under ideal conditions. See Theoretical performance limits for details.

Pipe utilization#

Pipe utilization measures how effectively a kernel uses the execution pipelines within each compute unit. See Pipe utilization for utilization details.

Register pressure#

Register pressure occurs when excessive register demand limits the number of active wavefronts per compute unit, reducing occupancy. See Register pressure theory for details.

Roofline model#

The roofline model is a visual performance model that determines whether a program is compute-bound or memory-bound. See Roofline model for roofline analysis.

Wavefront divergence#

Wavefront divergence occurs when threads within a wavefront take different execution paths due to conditional statements. See Branch efficiency for divergence handling details.

Wavefront execution state#

Wavefront execution states (active, stalled, eligible, selected) describe the scheduling status of wavefronts on AMD GPUs. See Wavefront execution states for state definitions.