Hardware features

Hardware features#

This page gives an overview of the different hardware architectures and the features they implement. Hardware features do not imply performance, that depends on the specifications found in the Accelerator and GPU hardware specifications page.

Hardware feature support

RDNA1

CDNA1

RDNA2

CDNA2

RDNA3

CDNA3

Atomic functions on 32-bit integer values in global and shared memory

Atomic functions on 64-bit integer values in global and shared memory

Atomic addition on 32-bit floating point values in global and shared memory

Atomic addition on 64-bit floating point values in global memory and shared memory

Warp vote functions

Memory fence instructions

Synchronization functions

Surface functions

float16 half precision IEEE-conformant floating-point operations

bfloat16 16-bit floating-point operations

Support for 8-bit floating-point types

Support for tensor float32

Packed math with 16-bit floating point values

Packed math with 32-bit floating point values

Matrix Cores

On-Chip Error Correcting Code (ECC)

Maximum dimensionality of grid

3

3

3

3

3

3

Maximum x-, y- or z-dimension of a grid

\(2^{32} - 1\)

\(2^{32} - 1\)

\(2^{32} - 1\)

\(2^{32} - 1\)

\(2^{32} - 1\)

\(2^{32} - 1\)

Maximum number of threads per grid

\(2^{32} - 1\)

\(2^{32} - 1\)

\(2^{32} - 1\)

\(2^{32} - 1\)

\(2^{32} - 1\)

\(2^{32} - 1\)

Maximum x-, y- or z-dimension of a block

\(1024\)

\(1024\)

\(1024\)

\(1024\)

\(1024\)

\(1024\)

Maximum number of threads per block

\(1024\)

\(1024\)

\(1024\)

\(1024\)

\(1024\)

\(1024\)

Wavefront size

32 [1]

64

32 [1]

64

32 [1]

64

Maximum number of resident blocks per compute unit

40 [1]

32

32 [1]

32

32 [1]

32

Maximum number of resident wavefronts per compute unit

40 [1]

32

32 [1]

32

32 [1]

32

Maximum number of resident threads per compute unit

1280 [2]

2048

1024 [2]

2048

1024 [2]

2048

Maximum number of 32-bit vector registers per thread

256

256 (vector) + 256 (matrix)

256

256 (vector) + 256 (matrix)

256

256 (vector) + 256 (matrix)

Maximum number of 32-bit scalar accumulation registers per thread

106

104

106

104

106

104