Introduction to ROCProfiler

Introduction to ROCProfiler#

ROCProfiler is a powerful tool for profiling HIP and ROCm applications on AMD ROCm platforms. Profiling can be used to identify performance bottlenecks in applications and to optimize their performance. ROCProfiler provides command-line tools for profiling pre-compiled applications. The ROCProfiler tool is implemented using the ROCProfiler and ROCTracer libraries, and provides two primary features to profile GPU-based applications or kernels:

  • Application Tracing: This basic feature of the ROCProfiler is used to trace the execution of an application, with timestamps for the start and end of each API call, and kernel execution.

  • Performance Counter Collection: This feature collects performance counters for each API call and kernel execution.

There are two different versions of the ROCProfiler tool: rocprof and rocprofv2. The two versions are similar and provide the same application trace and kernel profiling features. However, there are some differences in the command-line options, and the default outputs supported by the tools, and some differences in the AMD GPUs supported.

  • rocprof: Is the original version of the ROCProfiler tool and library, and is the production tool. Refer to Using rocprof version 1 for additional information.

  • rocprofv2: ROCProfiler version 2 is the latest version of the tool and provides a hardware specific low-level performance analysis for profiling GPU applications. However, rocprofv2 is considered a beta version. It is described in Using rocprofv2.

To demonstrate the usage of ROCProfiler with various options, this document refers to the MatrixTranspose application as an example.