What is ROCm Compute Profiler?#
ROCm Compute Profiler is a kernel-level profiling tool for machine learning and high performance computing (HPC) workloads running on AMD Instinct™ accelerators.
AMD Instinct MI-series accelerators are data center-class GPUs designed for compute and have some graphics capabilities disabled or removed. ROCm Compute Profiler primarily targets use with accelerators in the MI300, MI200, and MI100 families. Development is in progress to support Radeon™ (RDNA) GPUs.
ROCm Compute Profiler is built on top of ROCProfiler to monitor hardware performance counters.
High-level design#
The architecture of ROCm Compute Profiler consists of three major components shown in the following diagram.
Core ROCm Compute Profiler#
Acquires raw performance counters via application replay using rocprof
.
Counters are stored in a comma-separated-values format for further
analysis. It runs a set of accelerator-specific
micro-benchmarks to acquire hierarchical roofline data. The roofline model is
not available on accelerators pre-MI200.
Grafana server for ROCm Compute Profiler#
Grafana database import: All raw performance counters are imported into a backend MongoDB database to support analysis and visualization in the Grafana GUI. Compatibility with previously generated data using older ROCm Compute Profiler versions is not guaranteed.
Grafana analysis dashboard GUI: The Grafana dashboard retrieves the raw counters information from the backend database. It displays the relevant performance metrics and visualization.
ROCm Compute Profiler standalone GUI analyzer#
ROCm Compute Profiler provides a standalone GUI to enable basic performance analysis without the need to import data into a database instance. Find setup instructions in Setting up Grafana server for ROCm Compute Profiler
Features#
ROCm Compute Profiler offers comprehensive profiling based on all available hardware counters for the target accelerator. It delivers advanced performance analysis features, such as system Speed-of-Light (SOL) and hardware block-level SOL evaluations. Additionally, ROCm Compute Profiler provides in-depth memory chart analysis, roofline analysis, baseline comparisons, and more, ensuring a thorough understanding of system performance.
ROCm Compute Profiler supports analysis through both the command line or a GUI. The following list describes ROCm Compute Profiler’s features at a high level.
Support for AMD Instinct MI300, MI200, and MI100 accelerators
GUI analyzer via Grafana and MongoDB
Roofline Analysis panel (Supported on MI200 only, Ubuntu 20.04, SLES 15 SP3 or RHEL8)
L1 Address Processing Unit or Texture Addresser (TA); and L1 Backend Data Processing Unit or Texture Data (TD) panels
Filtering to reduce profiling time
Filtering by dispatch
Filter by kernel
Filtering by GPU ID