GPU architecture hardware specifications

GPU architecture hardware specifications#

Applies to Linux and Windows

2024-02-29

5 min read time

The following table provides an overview over the hardware specifications for the AMD Instinct accelerators.

Table 1 AMD Instinct architecture specification table#

Model

Architecture

LLVM target name

VRAM

Compute Units

Wavefront Size

LDS

L3 Cache

L2 Cache

L1 Vector Cache

L1 Scalar Cache

L1 Instruction Cache

VGPR File

SGPR File

MI300X

CDNA3

gfx941 or gfx942

192 GiB

304

64

64 KiB

256 MiB

32 MiB

32 KiB

16 KiB per 2 CUs

64 KiB per 2 CUs

512 KiB

12.5 KiB

MI300A

CDNA3

gfx940 or gfx942

128 GiB

228

64

64 KiB

256 MiB

24 MiB

32 KiB

16 KiB per 2 CUs

64 KiB per 2 CUs

512 KiB

12.5 KiB

MI250X

CDNA2

gfx90a

128 GiB

220 (110 per GCD)

64

64 KiB

16 MiB (8 MiB per GCD)

16 KiB

16 KiB per 2 CUs

32 KiB per 2 CUs

512 KiB

12.5 KiB

MI250

CDNA2

gfx90a

128 GiB

208

64

64 KiB

16 MiB (8 MiB per GCD)

16 KiB

16 KiB per 2 CUs

32 KiB per 2 CUs

512 KiB

12.5 KiB

MI210

CDNA2

gfx90a

64 GiB

104

64

64 KiB

8 MiB

16 KiB

16 KiB per 2 CUs

32 KiB per 2 CUs

512 KiB

12.5 KiB

MI100

CDNA

gfx908

32 GiB

120

64

64 KiB

8 MiB

16 KiB

16 KiB per 3 CUs

32 KiB per 3 CUs

256 KiB VGPR and 256 KiB AccVGPR

12.5 KiB

MI60

GCN 5.1

gfx906

32 GiB

64

64

64 KiB

4 MiB

16 KiB

16 KiB per 3 CUs

32 KiB per 3 CUs

256 KiB

12.5 KiB

MI50 (32GB)

GCN 5.1

gfx906

32 GiB

60

64

64 KiB

4 MiB

16 KiB

16 KiB per 3 CUs

32 KiB per 3 CUs

256 KiB

12.5 KiB

MI50 (16GB)

GCN 5.1

gfx906

16 GiB

60

64

64 KiB

4 MiB

16 KiB

16 KiB per 3 CUs

32 KiB per 3 CUs

256 KiB

12.5 KiB

MI25

GCN 5.0

gfx900

16 GiB

64

64

64 KiB

4 MiB

16 KiB

16 KiB per 3 CUs

32 KiB per 3 CUs

256 KiB

12.5 KiB

MI8

GCN 3.0

gfx803

4 GiB

64

64

64 KiB

2 MiB

16 KiB

16 KiB per 4 CUs

32 KiB per 4 CUs

256 KiB

12.5 KiB

MI6

GCN 4.0

gfx803

16 GiB

36

64

64 KiB

2 MiB

16 KiB

16 KiB per 4 CUs

32 KiB per 4 CUs

256 KiB

12.5 KiB

Glossary#

For a more detailed explanation refer to the specific documents and guides.

LLVM target name

Argument to pass to clang in –offload-arch to compile code for the given architecture.

VRAM

Amount of memory available on the GPU.

Compute Units

Number of compute units on the GPU.

Wavefront Size

Amount of work-items that execute in parallel on a single compute unit. This is equivalent to the warp size in HIP.

LDS

The Local Data Share (LDS) is a low-latency, high-bandwidth scratch pad memory. It is local to the compute units, shared by all work-items in a work group. In HIP this is the shared memory, which is shared by all threads in a block.

L3 Cache

Size of the level 3 cache. Shared by all compute units on the same GPU. Caches vector and scalar data and instructions.

L2 Cache

Size of the level 3 cache. Shared by all compute units on the same GCD. Caches vector and scalar data and instructions.

L1 Vector Cache

Size of the level 1 vector data cache. Local to a compute unit. Caches vector data.

L1 Scalar Cache

Size of the level 1 scalar data cache. Usually shared by several compute units. Caches scalar data.

L1 Instruction Cache

Size of the level 1 instruction cache. Usually shared by several compute units.

VGPR File

Size of the Vector General Purpose Register (VGPR) file. Holds data used in vector instructions. GPUs with matrix cores also have AccVGPRs, which are Accumulation General Purpose Vector Registers, specifically used in matrix instructions.

SGPR File

Size of the Scalar General Purpose Register (SGPR) file. Holds data used in scalar instructions.

GCD

Graphics Compute Die.