AMD CDNA architecture (CDNA-CDNA4)#
ROCm Compute Profiler makes available an extensive list of metrics to better understand achieved application performance on AMD Instinct™ MI-series GPUs including CDNA™ architecture based GPUs like the AMD Instinct MI100, CDNA2 architecture based GPUs such as the AMD Instinct MI210, MI250, and MI250X, CDNA3 architecture based GPUs such as the AMD Instinct MI300A, MI300X, and MI325X, and CDNA4 architecture based GPUs such as MI350X and MI355X.
Note
For AMD Ryzen™ / RDNA™ APUs (e.g. gfx1151/RDNA3.5), see RDNA3.
For top-level metrics details on CDNA and RDNA architecture, see Performance model.
The table provides key details and support available for the different CDNA architectures:
✅: Supported ❌: Unsupported
Architecture details
Architecture |
CDNA |
CDNA 2 |
CDNA 3 |
CDNA 4 |
|---|---|---|---|---|
Chip packaging |
Single Die |
Up to two graphics Compute Dies (GCDs) into single package. |
One logical processor with dozen chiplets, configurable with partition modes. |
Similar to CDNA3, Multi-Die chiplet, but with two I/O Dies (IODs) |
Supported series |
MI100 |
MI210 |
MI300A |
MI350X |
MI250 |
MI300X |
MI355X |
||
MI250X |
MI325X |
|||
Spatial partition mode |
❌ |
❌ |
Compute partition mode and Memory partition mode |
Compute partition mode and Memory partition mode |
Data type support
Architecture |
FP32 |
FP64 |
FP16 |
INT32 ADD/LOGIC/MAD |
INT8 DOT |
INT4 DOT |
FP32 GEMM |
FP64 GEMM |
FP16 GEMM |
BF16 GEMM |
INT8 GEMM |
Packed FP32 |
TF32 GEMM |
FP8/BF8 GEMM |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CDNA |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
CDNA2 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
❌ |
CDNA3 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
CDNA4 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
✅ |
To best use profiling data, it’s important to understand the role of various hardware blocks of AMD Instinct GPUs. Refer to the following top-level GPU architecture diagram to understand the hardware blocks of each architecture.
This section describes each hardware block on the GPUs as interacted with by a software developer to give a deeper understanding of the metrics reported by profiling data. Refer to Profiling by example for more practical examples and details on how to use ROCm Compute Profiler to optimize your code.
Note
In this documentation, MI2XX refers to any of the CDNA2 architecture-based MI200 series GPUs, such as AMD Instinct MI250X, MI250, and MI210 GPUs interchangeably in cases where the exact product at hand is not relevant. For product details, see AMD Instinct GPUs.
For a comparison of AMD Instinct GPU specifications, refer to Hardware specifications.
Hardware block chapters#
The AMD Instinct performance model is divided into the following blocks: