What is ROCm Compute Profiler?#

ROCm Compute Profiler is a kernel-level profiling tool for machine learning and high performance computing (HPC) workloads running on AMD Instinct™ accelerators.

AMD Instinct MI-series accelerators are data center-class GPUs designed for compute and have some graphics capabilities disabled or removed. ROCm Compute Profiler primarily targets use with accelerators in the MI300, MI200, and MI100 families. Development is in progress to support Radeon™ (RDNA) GPUs.

ROCm Compute Profiler is built on top of ROCProfiler to monitor hardware performance counters.

High-level design#

The architecture of ROCm Compute Profiler consists of three major components shown in the following diagram.

Core ROCm Compute Profiler#

Acquires raw performance counters via application replay using rocprof. Counters are stored in a comma-separated-values format for further analysis. It runs a set of accelerator-specific micro-benchmarks to acquire hierarchical roofline data. The roofline model is not available on accelerators pre-MI200.

Grafana server for ROCm Compute Profiler#

  • Grafana database import: All raw performance counters are imported into a backend MongoDB database to support analysis and visualization in the Grafana GUI. Compatibility with previously generated data using older ROCm Compute Profiler versions is not guaranteed.

  • Grafana analysis dashboard GUI: The Grafana dashboard retrieves the raw counters information from the backend database. It displays the relevant performance metrics and visualization.

ROCm Compute Profiler standalone GUI analyzer#

ROCm Compute Profiler provides a standalone GUI to enable basic performance analysis without the need to import data into a database instance. Find setup instructions in Setting up Grafana server for ROCm Compute Profiler

Architectural design of ROCm Compute Profiler

Features#

ROCm Compute Profiler offers comprehensive profiling based on all available hardware counters for the target accelerator. It delivers advanced performance analysis features, such as system Speed-of-Light (SOL) and hardware block-level SOL evaluations. Additionally, ROCm Compute Profiler provides in-depth memory chart analysis, roofline analysis, baseline comparisons, and more, ensuring a thorough understanding of system performance.

ROCm Compute Profiler supports analysis through both the command line or a GUI. The following list describes ROCm Compute Profiler’s features at a high level.