Hardware features

Hardware features#

This page gives an overview of the different hardware architectures and the features they implement. Hardware features do not imply performance, that depends on the specifications found in the rocm:reference/gpu-arch-specs page.

Hardware feature support

RDNA1

CDNA1

RDNA2

CDNA2

RDNA3

CDNA3

Atomic functions on 32-bit integer values in global and shared memory

Atomic functions on 64-bit integer values in global and shared memory

Atomic addition on 32-bit floating point values in global and shared memory

Atomic addition on 64-bit floating point values in global memory and shared memory

Warp vote functions

Memory fence instructions

Synchronization functions

Surface functions

float16 half precision IEEE-conformant floating-point operations

bfloat16 16-bit floating-point operations

Support for 8-bit floating-point types

Support for tensor float32

Packed math with 16-bit floating point values

Packed math with 32-bit floating point values

Matrix Cores

On-Chip Error Correcting Code (ECC)

Maximum dimensionality of grid

3

3

3

3

3

3

Maximum x-, y- or z-dimension of a grid

2321

2321

2321

2321

2321

2321

Maximum number of threads per grid

2321

2321

2321

2321

2321

2321

Maximum x-, y- or z-dimension of a block

1024

1024

1024

1024

1024

1024

Maximum number of threads per block

1024

1024

1024

1024

1024

1024

Wavefront size

32 [1]

64

32 [1]

64

32 [1]

64

Maximum number of resident blocks per compute unit

40 [1]

32

32 [1]

32

32 [1]

32

Maximum number of resident wavefronts per compute unit

40 [1]

32

32 [1]

32

32 [1]

32

Maximum number of resident threads per compute unit

1280 [2]

2048

1024 [2]

2048

1024 [2]

2048

Maximum number of 32-bit vector registers per thread

256

256 (vector) + 256 (matrix)

256

256 (vector) + 256 (matrix)

256

256 (vector) + 256 (matrix)

Maximum number of 32-bit scalar accumulation registers per thread

106

104

106

104

106

104