Device hardware glossary

Device hardware glossary#

2026-03-29

9 min read time

Applies to Linux and Windows

This section provides concise definitions of hardware components and architectural features of AMD GPUs.

AccVGPR#: Accumulation General Purpose Vector Registers (AccVGPRs) are a special type of VGPRs used exclusively for matrix operations.
ALU#: Arithmetic logic units (ALUs) are the primary arithmetic engines that execute mathematical and logical operations within compute units. See Vector arithmetic logic unit (VALU) for details.
AMD device architecture#: AMD device architecture is based on unified, programmable compute engines known as compute units (CUs). See Hardware implementation for details.
Compute unit versioning#: Compute units are versioned with GFX IP identifiers that define their microarchitectural features and instruction set compatibility. See Target GPU architectures (GFX IP) for details.
Compute units#: Compute units (CUs) are the fundamental programmable execution engines in AMD GPUs capable of running complex programs. See Compute unit architecture for details.
Data movement engine#: Data movement engines (DMEs) are specialized hardware units in AMD Instinct MI300 and MI350 series GPUs that accelerate multi-dimensional tensor data copies between global memory and on-chip memory. See Data movement engine (CDNA 3 / CDNA 4) for details.
GCD#: On AMD Instinct MI100 and MI250 series GPUs and AMD Radeon GPUs, the Graphics Compute Die (GCD) contains the GPU’s computational elements and lower levels of the cache hierarchy. See AMD Instinct™ MI250 microarchitecture for details.
GFX IP#: GFX IP (Graphics IP) versions are identifiers that specify which instruction formats, memory models, and compute features are supported by each AMD GPU generation. See Target GPU architectures (GFX IP) for versioning information.
GFX IP major version#: The GFX IP major version represents the GPU’s core instruction set and architecture. For example, a GFX IP 11 major version corresponds to the RDNA3 architecture, influencing driver support and available compute features. See Target GPU architectures (GFX IP) for versioning information.
GFX IP minor version#: The GFX IP minor version represents specific variations within a GFX IP major version and affects feature sets, optimizations, and driver behavior. Different GPU models within the same major version can have unique capabilities, impacting performance and supported instructions. See Target GPU architectures (GFX IP) for versioning information.
GPU RAM (VRAM)#: GPU RAM, also known as global memory in the HIP programming model, is the large, high-capacity off-chip memory subsystem accessible by all compute units, forming the foundation of the device’s memory hierarchy.
Graphics L1 cache#: On AMD Radeon GPUs, the read-only graphics level 1 (L1) cache is local to groups of WGPs called shader arrays, providing fast access to recently used data. AMD Instinct GPUs do not feature the graphics L1 cache.
Infinity Cache (L3 cache)#: On AMD Instinct MI300 and MI350 series GPUs and AMD Radeon GPUs, the Infinity Cache is the last level cache of the cache hierarchy. It is shared by all compute units and WGPs on the GPU.
L0 instruction cache#: On AMD Radeon GPUs, the level 0 (L0) instruction cache is local to each WGP and thus shared between the WGP’s compute units.
L0 scalar cache#: On AMD Radeon GPUs, the level 0 (L0) scalar data cache is local to each WGP and thus shared between the WGP’s compute units. It provides the scalar ALU with fast access to recently used data.
L0 vector cache#: On AMD Radeon GPUs, the level 0 (L0) vector data cache is local to each WGP and thus shared between the WGP’s compute units. It provides the vector ALU with fast access to recently used data.
L1 instruction cache#: On AMD Instinct GPUs, the level 1 (L1) instruction cache is local to each compute unit. On AMD Radeon GPUs, the L1 instruction cache does not exist as a separate cache level, and instructions are stored in the L0 instruction cache.
L1 scalar cache#: On AMD Instinct GPUs, the level 1 (L1) scalar data cache is local to each compute unit, providing the scalar ALU with fast access to recently used data. On AMD Radeon GPUs, the L1 scalar cache does not exist as a separate cache level, and recently used scalar data is stored in the L0 scalar cache.
L1 vector cache#: On AMD Instinct GPUs, the level 1 (L1) vector data cache is local to each compute unit, providing the vector ALU with fast access to recently used data. On AMD Radeon GPUs, the L1 vector cache does not exist as a separate cache level, and recently used vector data is stored in the L0 vector cache.
L2 cache#: On AMD Instinct MI100 series GPUs, the L2 cache is shared across the entire chip, while for all other AMD GPUs the L2 caches are shared by the compute units on the same GCD or XCD.
Load/store unit#: Load/store units (LSUs) handle data transfer between compute units and the GPU’s memory subsystems, managing thousands of concurrent memory operations. See Load/store unit (LSU) for details.
Local data share#: Local data share (LDS) is fast on-chip memory local to each compute unit and shared among work-items in a work-group, enabling efficient coordination and data reuse. In the HIP programming model, the LDS is known as shared memory. See Local data share (LDS) for LDS programming details.
Matrix cores (MFMA units)#: Matrix cores (MFMA units) are specialized execution units that perform large-scale matrix operations in a single instruction, delivering high throughput for AI and HPC workloads. See Matrix fused multiply-add (MFMA) for details.
Register file#: The register file is the primary on-chip memory store in each compute unit, holding data between arithmetic and memory operations. See Memory model for details.
Registers#: Registers are the lowest level of the memory hierarchy, storing per-thread temporary variables and intermediate results. See Memory model for register usage details.
SALU#: Scalar ALUs (SALUs) operate on a single value per wavefront and manage all control flow.
SGPR#: Scalar general-purpose registers (SGPRs) hold data produced and consumed by a compute unit’s scalar ALU.
SGPR file#: The SGPR file is the register file that holds data used by the scalar ALU.
SIMD core#: SIMD cores are execution lanes that perform scalar and vector arithmetic operations inside each compute unit. See CDNA architecture and RDNA architecture for details.
Special function unit#: Special function units (SFUs) accelerate transcendental and reciprocal mathematical functions such as exp, log, sin, and cos. See Special function unit (SFU) for details.
VALU#: Vector ALUs (VALUs) perform an arithmetic or logical operation on data for each work-item in a wavefront, enabling data-parallel execution.
VGPR#: Vector general-purpose registers (VGPRs) hold data produced and consumed by a compute unit’s vector ALU.
VGPR file#: The VGPR file is the register file that holds data used by the vector ALU. GPUs with matrix cores also have AccVGPR files, used specifically for matrix instructions.
Wavefront (Warp)#: A wavefront (also called a warp) is a group of work-items that execute in parallel on a single compute unit, sharing one instruction stream. See Warp (or Wavefront) for execution details.
Wavefront scheduler#: The wavefront scheduler in each compute unit decides which wavefront to execute each clock cycle, enabling rapid context switching for latency hiding. See Sequencer and scheduling for details.
Wavefront size#: The wavefront size is the number of work-items that execute together in a single wavefront. For AMD Instinct GPUs, the wavefront size is 64 threads, while AMD Radeon GPUs have a wavefront size of 32 threads. See Warp (or Wavefront) for details.
WGP#: A Workgroup Processor (WGP) is a hardware unit on AMD Radeon GPUs that contains two compute units and their associated resources, enabling efficient scheduling and execution of wavefronts. See RDNA architecture for details.
Work-group (Block)#: A work-group (also called a block) is a collection of wavefronts scheduled together on a single compute unit that can coordinate through Local data share memory. See Block (Work-group) for work-group details.
Work-item (Thread)#: A work-item (also called a thread) is the smallest unit of execution on an AMD GPU and represents a single element of work. See Thread (Work-item) for thread hierarchy details.
XCD#: On AMD Instinct MI300 and MI350 series GPUs, the Accelerator Complex Die (XCD) contains the GPU’s computational elements and lower levels of the cache hierarchy. See AMD Instinct™ MI300 Series microarchitecture for details.