Introduction to ROCProfiler#
ROCProfiler is a powerful tool for profiling HIP and ROCm applications on AMD ROCm platforms. Profiling can be used to identify performance bottlenecks in applications and to optimize their performance. ROCProfiler provides command-line tools for profiling pre-compiled applications. The ROCProfiler tool is implemented using the ROCProfiler and ROCTracer libraries, and provides two primary features to profile GPU-based applications or kernels:
Application Tracing: This basic feature of the ROCProfiler is used to trace the execution of an application, with timestamps for the start and end of each API call, and kernel execution.
Performance Counter Collection: This feature collects performance counters for each API call and kernel execution.
There are two different versions of the ROCProfiler tool: rocprof
and rocprofv2
. The two
versions are similar and provide the same application trace and kernel profiling features. However,
there are some differences in the command-line options, and the default outputs supported by the
tools, and some differences in the AMD GPUs supported.
rocprof
: Is the original version of the ROCProfiler tool and library, and is the production tool. Refer to Using rocprof version 1 for additional information.rocprofv2
: ROCProfiler version 2 is the latest version of the tool and provides a hardware specific low-level performance analysis for profiling GPU applications. However,rocprofv2
is considered a beta version. It is described in Using rocprofv2.
To demonstrate the usage of ROCProfiler with various options, this document refers to the MatrixTranspose application as an example.