rocprof
Command reference#
Obtain command line help by typing the following:
rocprof -h
This returns the following information:
RPL: on '240505_115025' from '/opt/rocm-6.2.0-13748' in '/home/rocm'
ROCm Profiling Library (RPL) run script, a part of ROCprofiler library package.
Full path: /opt/rocm-6.2.0-13748/bin/rocprof
Metrics definition: /opt/rocm-6.2.0-13748/lib/rocprofiler/metrics.xml
Usage:
rocprof [-h] [--list-basic] [--list-derived] [-i <input .txt/.xml file>] [-o <output CSV file>] <app command line>
Options:
-h - this help
--tool-version <1|2> - to use specific version of rocprof tool, by default v1 is used
1 - rocprofiler tool v1
2 - rocprofiler tool v2
--verbose - verbose mode, dumping all base counters used in the input metrics
--list-basic - to print the list of basic HW counters
--list-derived - to print the list of derived metrics with formulas
--cmd-qts <on|off> - quoting profiled cmd line [on]
-i <.txt|.xml file> - input file
Input file .txt format, automatically rerun application for every profiling features line:
# Perf counters group 1
pmc : Wavefronts VALUInsts SALUInsts SFetchInsts FlatVMemInsts LDSInsts FlatLDSInsts GDSInsts VALUUtilization FetchSize
# Perf counters group 2
pmc : WriteSize L2CacheHit
# Filter by dispatches range, GPU index and kernel names
# supported range formats: "3:9", "3:", "3"
range: 1 : 4
gpu: 0 1 2 3
kernel: simple Pass1 simpleConvolutionPass2
Input file .xml format, for single profiling run:
# Metrics list definition, also the form "<block-name>:<event-id>" can be used
# All defined metrics can be found in the 'metrics.xml'
# There are basic metrics for raw HW counters and high-level metrics for derived counters
<metric name=SQ:4,SQ_WAVES,VFetchInsts
></metric>
# Filter by dispatches range, GPU index and kernel names
<metric
# range formats: "3:9", "3:", "3"
range=""
# list of gpu indexes "0,1,2,3"
gpu_index=""
# list of matched sub-strings "Simple1,Conv1,SimpleConvolution"
kernel=""
></metric>
-o <output file> - output CSV file [<input file base>.csv]
-d <data directory> - directory where profiler store profiling data including traces [/tmp]
The data directory is automatically removed if it is matching the default temporary directory.
-t <temporary directory> - to change the temporary directory [/tmp]
By changing the temporary directory you can prevent removing the profiling data from /tmp or enable removing from not '/tmp' directory.
-m <metric file> - file defining custom metrics to use in-place of defaults.
--basenames <on|off> - to turn on/off truncating of the kernel full function names till the base ones [off]
--timestamp <on|off> - to turn on/off the kernel dispatches timestamps, dispatch/begin/end/complete during kernel profiling [off]
--ctx-wait <on|off> - to wait for outstanding contexts on profiler exit [on]
--ctx-limit <max number> - maximum number of outstanding contexts [0 - unlimited]
--heartbeat <rate sec> - to print progress heartbeats [0 - disabled]
--obj-tracking <on|off> - to turn on/off kernels code objects tracking [on]
To support V3 code object
--stats - generating kernel execution stats, file <output name>.stats.csv
--roctx-trace - to enable rocTX application code annotation trace, "Markers and Ranges" JSON trace section.
--hip-trace - to trace HIP, generates API execution stats and JSON file chrome-tracing compatible
--hsa-trace - to trace HSA, generates API execution stats and JSON file chrome-tracing compatible
--sys-trace - to trace HIP/HSA APIs and GPU activity, generates stats and JSON trace chrome-tracing compatible
'--hsa-trace' can be used in addition to select activity tracing from HSA (ROCr runtime) level
Generated files: <output name>.<domain>_stats.txt <output name>.json
Traced API list can be set by input .txt or .xml files.
Input .txt:
hsa: hsa_queue_create hsa_amd_memory_pool_allocate
Input .xml:
<trace name="HSA">
<parameters list="hsa_queue_create, hsa_amd_memory_pool_allocate">
</parameters>
</trace>
--roctx-rename - to rename kernels with their enclosing rocTX range's message.
--trace-start <on|off> - to enable tracing on start [on]
--trace-period <dealy:length:rate> - to enable trace with initial delay, with periodic sample length and rate
Supported time formats: <number(m|s|ms|us)>
--flush-rate <rate> - to enable trace flush rate (time period)
Supported time formats: <number(m|s|ms|us)>
--parallel-kernels - to enable concurrent kernels
Configuration file:
You can set your parameters defaults preferences in the configuration file 'rpl_rc.xml'. The search path sequence: .:/home/rocm:<installation directory>
First the configuration file is searched in the current directory, then in the current user's home directory, and then in the installation directory.
Configurable options: 'basenames', 'timestamp', 'ctx-limit', 'heartbeat', 'obj-tracking'.
An example of 'rpl_rc.xml':
<defaults
basenames=off
timestamp=off
ctx-limit=0
heartbeat=0
obj-tracking=off
></defaults>
--merge-traces - Script for aggregating results from multiple rocprofiler out directries.
Usage: if running with rocprof
rocprof --merge-traces -o <outputdir> [<inputdir>...]