Performance model

Performance model#

ROCm Compute Profiler makes available an extensive list of metrics to better understand achieved application performance on AMD Instinct™ MI-series accelerators including Graphics Core Next™ (GCN) GPUs like the AMD Instinct MI50, CDNA™ accelerators like the MI100, CDNA2 accelerators such as the AMD Instinct MI250X, MI250, and MI210, CDNA3 accelerators such as the AMD Instinct MI300A, MI300X, MI325X, and CDNA4 accelerators such as MI350X and MI355X.

The table provides key details and support available for the different architectures:

✅: Supported ❌: Unsupported

Architecture details

Architecture

CDNA

CDNA 2

CDNA 3

CDNA 4

Chip packaging

Single Die

Two graphics Compute Dies (GCDs) into single package.

One logical processor with dozen chiplets, configurable with partition modes.

Similar to CDNA3, Multi-Die chiplet, but with two I/O Dies (IODs)

Supported series

MI100

MI200

MI300A

MI350X

MI210

MI300X

MI355X

MI250

MI325X

Spatial partition mode

Compute partition mode and Memory partition mode

Compute partition mode and Memory partition mode

Data type support

Architecture

FP32

FP64

FP16

INT32 ADD/LOGIC/MAD

INT8 DOT

INT4 DOT

FP32 GEMM

FP64 GEMM

FP16 GEMM

BF16 GEMM

INT8 GEMM

Packed FP32

TF32 GEMM

FP8/BF8

CDNA

CDNA2

CDNA3

CDNA4

To best use profiling data, it’s important to understand the role of various hardware blocks of AMD Instinct accelerators. Refer to the following top level GPU architecture diagram to understand the hardware blocks of each architectures.

CDNA top level architecture diagram with zoomed view of Compute unit
CDNA2 top level architecture diagram with zoomed view of Compute unit
CDNA3 top level architecture diagram with zoomed view of Accelerator Complex Dies (XCDs)
CDNA4 top level architecture diagram

This section describes each hardware block on the accelerator as interacted with by a software developer to give a deeper understanding of the metrics reported by profiling data. Refer to Profiling by example for more practical examples and details on how to use ROCm Compute Profiler to optimize your code.

Note

In this documentation, MI2XX refers to any of the CDNA2 architecture-based MI200 series accelerators such as AMD Instinct MI250X, MI250, and MI210 accelerators interchangeably in cases where the exact product at hand is not relevant. For product details, see AMD Instinct GPUs.

For a comparison of AMD Instinct accelerator specifications, refer to Hardware specifications.

In this chapter, the AMD Instinct performance model used by ROCm Compute Profiler is divided into a handful of key hardware blocks, each detailed in the following sections: