Performance analysis glossary

Performance analysis glossary#

2026-02-20

5 min read time

Applies to Linux and Windows

This section provides brief definitions of performance analysis concepts and optimization techniques.

Active cycle#

An active cycle is a clock cycle in which a compute unit has at least one active wavefront resident. See hip:wavefront_execution for details.

Arithmetic bandwidth#

Arithmetic bandwidth is the peak rate at which arithmetic work can be performed, defining the compute roof in roofline models. See hip:compute_bound for details.

Arithmetic intensity#

Arithmetic intensity is the ratio of arithmetic operations to memory operations in a kernel, and determines performance characteristics. See hip:arithmetic_intensity for intensity analysis.

Bank conflict#

A bank conflict occurs when multiple threads simultaneously access different addresses in the same LDS bank, serializing accesses. See hip:bank_conflicts_theory for details.

Branch efficiency#

Branch efficiency measures how often all threads within a wavefront take the same execution path, quantifying control-flow uniformity. See hip:branch_efficiency for branch analysis.

Compute-bound#

Compute-bound kernels are limited by the arithmetic bandwidth of the GPU’s compute units rather than memory bandwidth. See hip:compute_bound for compute-bound analysis.

CU utilization#

CU utilization measures the percentage of time that compute units are actively executing instructions. See hip:cu_utilization for utilization analysis.

Issue efficiency#

Issue efficiency measures how effectively the wavefront scheduler keeps execution pipelines busy by issuing instructions. See hip:issue_efficiency for efficiency metrics.

Latency hiding#

Latency hiding masks long-latency operations by running many concurrent threads, keeping execution pipelines busy. See hip:latency_hiding for details.

Little’s Law#

Little’s Law relates concurrency, latency, and throughput, determining how much independent work must be in flight to hide latency. See hip:littles_law for latency hiding details.

Memory bandwidth#

Memory bandwidth is the maximum rate at which data can be transferred between memory hierarchy levels, typically measured in bytes per second. See hip:memory_bound for details.

Memory coalescing#

Memory coalescing improves memory bandwidth by servicing many logical loads or stores with fewer physical memory transactions. See hip:memory_coalescing_theory for coalescing patterns.

Memory-bound#

Memory-bound kernels are limited by memory bandwidth rather than arithmetic bandwidth, typically due to low arithmetic intensity. See hip:memory_bound for memory-bound analysis.

Occupancy#

Occupancy is the ratio of active wavefronts to the maximum number of wavefronts that can be active on a compute unit. See hip:occupancy for occupancy analysis.

Overhead#

Overhead latency is the time spent with no useful work being done, often due to CPU-side bottlenecks or kernel launch delays. See hip:performance_bottlenecks for details.

Peak rate#

Peak rate is the theoretical maximum throughput at which a hardware system can complete work under ideal conditions. See hip:theoretical_performance_limits for details.

Pipe utilization#

Pipe utilization measures how effectively a kernel uses the execution pipelines within each compute unit. See hip:pipe_utilization for utilization details.

Register pressure#

Register pressure occurs when excessive register demand limits the number of active wavefronts per compute unit, reducing occupancy. See hip:register_pressure_theory for details.

Roofline model#

The roofline model is a visual performance model that determines whether a program is compute-bound or memory-bound. See hip:roofline_model for roofline analysis.

Wavefront divergence#

Wavefront divergence occurs when threads within a wavefront take different execution paths due to conditional statements. See hip:branch_efficiency for divergence handling details.

Wavefront execution state#

Wavefront execution states (active, stalled, eligible, selected) describe the scheduling status of wavefronts on AMD GPUs. See hip:wavefront_execution for state definitions.