Hardware features#
This page gives an overview of the different hardware architectures and the features they implement. Hardware features do not imply performance, that depends on the specifications found in the Accelerator and GPU hardware specifications page.
Hardware feature support
RDNA1
CDNA1
RDNA2
CDNA2
RDNA3
CDNA3
Atomic functions on 32-bit integer values in global and shared memory
✅
✅
✅
✅
✅
✅
Atomic functions on 64-bit integer values in global and shared memory
✅
✅
✅
✅
✅
✅
Atomic addition on 32-bit floating point values in global and shared memory
❌
❌
✅
✅
✅
✅
Atomic addition on 64-bit floating point values in global memory and shared memory
❌
❌
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
✅
float16 half precision IEEE-conformant floating-point operations
✅
✅
✅
✅
✅
✅
❌
✅
❌
✅
✅
✅
Support for 8-bit floating-point types
❌
❌
❌
❌
❌
✅
Support for tensor float32
❌
❌
❌
❌
❌
✅
Packed math with 16-bit floating point values
✅
✅
✅
✅
✅
✅
Packed math with 32-bit floating point values
❌
❌
❌
✅
❌
✅
Matrix Cores
❌
✅
❌
✅
❌
✅
On-Chip Error Correcting Code (ECC)
✅
✅
✅
✅
✅
✅
Maximum dimensionality of grid
3
3
3
3
3
3
Maximum x-, y- or z-dimension of a grid
\(2^{32} - 1\)
\(2^{32} - 1\)
\(2^{32} - 1\)
\(2^{32} - 1\)
\(2^{32} - 1\)
\(2^{32} - 1\)
Maximum number of threads per grid
\(2^{32} - 1\)
\(2^{32} - 1\)
\(2^{32} - 1\)
\(2^{32} - 1\)
\(2^{32} - 1\)
\(2^{32} - 1\)
Maximum x-, y- or z-dimension of a block
\(1024\)
\(1024\)
\(1024\)
\(1024\)
\(1024\)
\(1024\)
Maximum number of threads per block
\(1024\)
\(1024\)
\(1024\)
\(1024\)
\(1024\)
\(1024\)
Wavefront size
32 [1]
64
32 [1]
64
32 [1]
64
Maximum number of resident blocks per compute unit
40 [1]
32
32 [1]
32
32 [1]
32
Maximum number of resident wavefronts per compute unit
40 [1]
32
32 [1]
32
32 [1]
32
Maximum number of resident threads per compute unit
1280 [2]
2048
1024 [2]
2048
1024 [2]
2048
Maximum number of 32-bit vector registers per thread
256
256 (vector) + 256 (matrix)
256
256 (vector) + 256 (matrix)
256
256 (vector) + 256 (matrix)
Maximum number of 32-bit scalar accumulation registers per thread
106
104
106
104
106
104