Hardware features#
This page gives an overview of the different hardware architectures and the features they implement. Hardware features do not imply performance, that depends on the specifications found in the rocm:reference/gpu-arch-specs page.
Hardware feature support
RDNA1
CDNA1
RDNA2
CDNA2
RDNA3
CDNA3
Atomic functions on 32-bit integer values in global and shared memory
✅
✅
✅
✅
✅
✅
Atomic functions on 64-bit integer values in global and shared memory
✅
✅
✅
✅
✅
✅
Atomic addition on 32-bit floating point values in global and shared memory
❌
❌
✅
✅
✅
✅
Atomic addition on 64-bit floating point values in global memory and shared memory
❌
❌
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
float16 half precision IEEE-conformant floating-point operations
✅
✅
✅
✅
✅
✅
bfloat16 16-bit floating-point operations
❌
✅
❌
✅
✅
✅
Support for 8-bit floating-point types
❌
❌
❌
❌
❌
✅
Support for tensor float32
❌
❌
❌
❌
❌
✅
Packed math with 16-bit floating point values
✅
✅
✅
✅
✅
✅
Packed math with 32-bit floating point values
❌
❌
❌
✅
❌
✅
Matrix Cores
❌
✅
❌
✅
❌
✅
On-Chip Error Correcting Code (ECC)
✅
✅
✅
✅
✅
✅
Maximum dimensionality of grid
3
3
3
3
3
3
Maximum x-, y- or z-dimension of a grid
Maximum number of threads per grid
Maximum x-, y- or z-dimension of a block
Maximum number of threads per block
Wavefront size
32 [1]
64
32 [1]
64
32 [1]
64
Maximum number of resident blocks per compute unit
40 [1]
32
32 [1]
32
32 [1]
32
Maximum number of resident wavefronts per compute unit
40 [1]
32
32 [1]
32
32 [1]
32
Maximum number of resident threads per compute unit
1280 [2]
2048
1024 [2]
2048
1024 [2]
2048
Maximum number of 32-bit vector registers per thread
256
256 (vector) + 256 (matrix)
256
256 (vector) + 256 (matrix)
256
256 (vector) + 256 (matrix)
Maximum number of 32-bit scalar accumulation registers per thread
106
104
106
104
106
104