roprofv3 command-line options

roprofv3 command-line options#

The following table lists the commonly used rocprofv3 command-line options categorized according to their purpose.

Purpose Option Description
I/O options -i INPUT | --input INPUT Specifies the path to the input file. JSON and YAML formats support configuration of all command-line options for tracing and profiling whereas the text format supports only the specification of HW counters. See collecting traces using input file and counter collection using input file.
-o OUTPUT_FILE | --output-file OUTPUT_FILE Specifies output file name. If nothing is specified, the default path is %hostname%/%pid%. Read more...
-d OUTPUT_DIRECTORY | --output-directory OUTPUT_DIRECTORY Specifies the output path for saving the output files. If nothing is specified, the default path is %hostname%/%pid%. Read more...
-f {csv,json,pftrace,otf2,rocpd} [{csv,json,pftrace,otf2,rocpd} ...] | --output-format {csv,json,pftrace,otf2,rocpd} [{csv,json,pftrace,otf2,rocpd} ...] Specifies output format. Supported formats: CSV, JSON, PFTrace, OTF2 and rocpd. Read more...
--output-config [BOOL] Generates a configuration output file containing the resolved rocprofv3 settings and options used for the profiling session. Read more...
--log-level {fatal,error,warning,info,trace,env} Sets the desired log level.
-E EXTRA_COUNTERS | --extra-counters EXTRA_COUNTERS Specifies the path to a YAML file consisting of extra counter definitions. Read more...
Dynamic process attachment -p PID | --pid PID | --attach PID Attaches to a running process by process ID and profiles it dynamically. This enables profiling of applications that are already running without needing to restart them from the profiler. The profiler will instrument the target process and collect the specified tracing or counter data for the configured duration. Read more...
Aggregate tracing -r [BOOL] | --runtime-trace [BOOL] Collects tracing data for HIP runtime API, marker (ROCTx) API, RCCL API, memory operations (copies, scratch, and allocation), and kernel dispatches. Similar to --sys-trace without HIP compiler API and the underlying HSA API tracing. Read more...
-s [BOOL] | --sys-trace [BOOL] Collects tracing data for HIP API, HSA API, marker (ROCTx) API, RCCL API, memory operations (copies, scratch, and allocations), and kernel dispatches. Read more...
PC sampling --pc-sampling-beta-enabled [BOOL] Enables PC sampling and sets the ROCPROFILER_PC_SAMPLING_BETA_ENABLED environment variable. Note that PC sampling support is in beta version. For more details, see PC sampling.
--pc-sampling-unit {instructions,cycles,time} Specifies the unit for PC sampling type or method. Note that only units of time are supported. For more details, see PC sampling.
--pc-sampling-method {stochastic,host_trap} Specifies the PC sampling type. Note that only host trap method is supported. For more details, see PC sampling.
--pc-sampling-interval PC_SAMPLING_INTERVAL Specifies the PC sample generation frequency. For more details, see PC sampling.
Basic tracing --hip-trace [BOOL] Combination of --hip-runtime-trace and --hip-compiler-trace. This option enables only the HIP API tracing. Unlike previous iterations of rocprofv3, this option doesn’t enable kernel tracing, memory copy tracing, and so on. Read more...
--marker-trace [BOOL] Collects marker (ROCTx) traces. Similar to --roctx-trace option in earlier rocprof versions, but with improved ROCTx library with more features. Read more...
--kernel-trace [BOOL] Collects kernel dispatch traces. Read more...
--memory-copy-trace [BOOL] Collects memory copy traces. This was a part of the HIP and HSA traces in previous rocprof versions. Read more...
--memory-allocation-trace [BOOL] Collects memory allocation traces. Displays starting address, allocation size, and the agent where allocation occurs. Read more...
--kfd-trace Collects --kfd-page-migration-trace, --kfd-page-mapping-trace, --kfd-queue-trace, and --kfd-dropped-events-trace. KFD (Kernel Fusion Driver) traces capture low-level driver routines including mapping, unmapping, and migration of data between GPU and system memories, as well as eviction or restoration of GPU queues to facilitate such routines.
--scratch-memory-trace [BOOL] Collects scratch memory operations traces. Helps in determining scratch allocations and manage them efficiently. Read more...
--hsa-trace [BOOL] Collects --hsa-core-trace, --hsa-amd-trace, --hsa-image-trace, and --hsa-finalizer-trace. This option only enables the HSA API tracing. Unlike previous iterations of rocprof, this doesn’t enable kernel tracing, memory copy tracing, and so on. Read more...
--rccl-trace [BOOL] Collects traces for RCCL (ROCm Communication Collectives Library), which is also pronounced as ‘Rickle’. Read more...
--kokkos-trace [BOOL] Enables builtin Kokkos tools support, which implies enabling --marker-trace collection and --kernel-rename. Read more...
--rocdecode-trace [BOOL] Collects traces for rocDecode APIs. Read more...
Granular tracing --hip-runtime-trace [BOOL] Collects HIP Runtime API traces. For example, public HIP API functions starting with hip such as hipSetDevice.
--hip-compiler-trace [BOOL] Collects HIP compiler generated code traces. For example, HIP API functions starting with __hip such as __hipRegisterFatBinary.
--hsa-core-trace [BOOL] Collects HSA API traces (core API). For example, HSA functions prefixed with only hsa_ such as hsa_init. For more details, see HSA trace
--hsa-amd-trace [BOOL] Collects HSA API traces (AMD-extension API). For example, HSA functions prefixed with hsa_amd_ such as hsa_amd_coherency_get_type. For more details, see HSA trace
--hsa-image-trace [BOOL] Collects HSA API traces (image-extenson API). For example, HSA functions prefixed with only hsa_ext_image_ such as hsa_ext_image_get_capability. For more details, see HSA trace
--hsa-finalizer-trace [BOOL] Collects HSA API traces (Finalizer-extension API). For example, HSA functions prefixed with only hsa_ext_program_ such as hsa_ext_program_create. For more details, see HSA trace
--kfd-page-migration-trace Collects traces of KFD events including migration of pages across device memories.
--kfd-page-mapping-trace Collects traces of KFD events including faulting, mapping, and page validation.
--kfd-queue-trace Collects traces of KFD events including GPU queue eviction and restoration operations.
--kfd-dropped-events-trace Collects traces of KFD events dropped by the KFD device driver.
Counter collection --pmc [PMC …] Specifies performance monitoring counters to be collected. Use comma or space to specify more than one counter. For multi-pass collection, use multiple --pmc flags where each flag defines a separate counter group. The job fails if a counter group can't be collected in a single pass. Read more...
Post-processing tracing --stats [BOOL] Collects statistics of enabled tracing types. Must be combined with one or more tracing options. Doesn’t include default kernel stats unlike previous rocprof versions. Read more...
-S [BOOL] | --summary [BOOL] Displays single summary of tracing data for the enabled tracing type, after conclusion of the profiling session. Displays a summary of tracing data for the enabled tracing type, after conclusion of the profiling session. Read more...
-D [BOOL] | --summary-per-domain [BOOL] Displays a summary of each tracing domain for the enabled tracing type, after conclusion of the profiling session. Read more...
--summary-groups REGULAR_EXPRESSION [REGULAR_EXPRESSION …] Displays a summary for each set of domains matching the specified regular expression. For example, ‘KERNEL_DISPATCH|MEMORY_COPY’ generates a summary of all the tracing data in the KERNEL_DISPATCH and MEMORY_COPY domains. Similarly ‘*._API’ generates a summary of all the tracing data in the HIP_API, HSA_API, and MARKER_API domains. Read more...
Summary --summary-output-file SUMMARY_OUTPUT_FILE Outputs summary to a file, stdout, or stderr. By default, outputs to stderr. Read more...
-u {sec,msec,usec,nsec} | --summary-units {sec,msec,usec,nsec} Specifies timing unit for output summary.
Kernel naming -M [BOOL] | --mangled-kernels [BOOL] Overrides the default demangling of kernel names. Read more...
-T [BOOL] | --truncate-kernels [BOOL] Truncates the demangled kernel names for improved readability. In earlier rocprof versions, this was known as --basenames [on/off]. Read more...
--kernel-rename [BOOL] Uses region names defined using roctxRangePush or roctxRangePop to rename the kernels. Was known as --roctx-rename in earlier rocprof versions. Read more...
Filtering --kernel-include-regex REGULAR_EXPRESSION Filters counter-collection and thread-trace data to include the kernels matching the specified regular expression. Non-matching kernels are excluded. For more details, see kernel filtering
--kernel-exclude-regex REGULAR_EXPRESSION Filters counter-collection and thread-trace data to exclude the kernels matching the specified regular expression. It is applied after --kernel-include-regex option. For more details, see kernel filtering
--kernel-iteration-range KERNEL_ITERATION_RANGE [KERNEL_ITERATION_RANGE …] Specifies iteration range for each kernel matching the filter [start-stop]. For more details, see kernel filtering
-P (START_DELAY_TIME):(COLLECTION_TIME):(REPEAT) [(START_DELAY_TIME):(COLLECTION_TIME):(REPEAT) …] | --collection-period (START_DELAY_TIME):(COLLECTION_TIME):(REPEAT) [(START_DELAY_TIME):(COLLECTION_TIME):(REPEAT) …] START_DELAY_TIME: Time in seconds before the data collection begins.
COLLECTION_TIME: Duration of data collection in seconds.
REPEAT: Number of times the data collection cycle is repeated.
The default unit for time is seconds, which can be changed using the --collection-period-unit option. To repeat the cycle indefinitely, specify repeat as 0. You can specify multiple configurations, each defined by a triplet in the format start_delay_time:collection_time:repeat. For example, the command -P 10:10:1 5:3:0 specifies two configurations, the first one with a start delay time of 10 seconds, a collection time of 10 seconds, and a repeat of 1 (the cycle repeats once), and the second with a start delay time of 5 seconds, a collection time of 3 seconds, and a repeat of 0 (the cycle repeats indefinitely). Read more...
--collection-period-unit {hour,min,sec,msec,usec,nsec} To change the unit of time used in --collection-period or -P, specify the desired unit using the --collection-period-unit option. The available units are hour for hours, min for minutes, sec for seconds, msec for milliseconds, usec for microseconds, and nsec for nanoseconds. Read more...
Perfetto-specific --perfetto-backend {inprocess,system} Specifies backend for Perfetto data collection. When selecting ‘system’ mode, ensure to run the Perfetto traced daemon and then start a Perfetto session. Read more...
--perfetto-buffer-size KB Specifies buffer size for Perfetto output in KB. Default: 1 GB. Read more...
--perfetto-buffer-fill-policy {discard,ring_buffer} Specifies policy for handling new records when Perfetto reaches the buffer limit. Read more...
--perfetto-shmem-size-hint KB Specifies Perfetto shared memory size hint in KB. Default: 64 KB. Read more...
Display -L [BOOL] | --list-avail [BOOL] Lists the PC sampling configurations and metrics available in the counter_defs.yaml file for counter collection. In earlier rocprof versions, this was known as --list-basic, --list-derived, and --list-counters. Read more...
--group-by-queue [BOOL] For displaying the HSA Queues that kernels and memory copy operations are submitted to rather than the default grouping of HIP Streams for perfetto. Read more...
Others --preload PRELOAD Specifies libraries to prepend to LD_PRELOAD. Useful for sanitizer libraries and custom instrumentation tools. Multiple libraries can be specified. Read more...
--minimum-output-data KB Specifies the minimum output data size threshold in KB. Output files are generated only if the collected profiling data exceeds this threshold. This prevents creation of empty or very small output files. Default is 0 (no threshold). Read more...
--disable-signal-handlers [BOOL] Controls signal handler prioritization. When set to true, disables rocprofv3 signal handler prioritization, allowing application signal handlers to take precedence. Useful for applications with custom crash handling or when integrating with testing frameworks. Default value is false (rocprofv3 handlers have priority). Read more...
--rocm-root PATH Specifies custom ROCm installation directory instead of automatic detection. Useful for multiple ROCm installations, custom builds, or non-standard locations. Read more...
--sdk-soversion SDK_SOVERSION Specifies the shared object version number for ROCProfiler SDK library resolution. Controls which major version of librocprofiler-sdk.so.X to use. Read more...
--sdk-version SDK_VERSION Specifies the exact version number for ROCProfiler SDK library resolution. Controls library selection with full semantic versioning (X.Y.Z format). Read more...
--selected-regions If set, rocprofv3 profiles only regions of code surrounded by roctxMark(name) and roctxMark(0). Read more...

To see the exhaustive list of rocprofv3 options, use:

rocprofv3 -h
rocprofv3 --help

To display the version information for rocprofv3, use:

rocprofv3 -v
rocprofv3 --version

The version command provides comprehensive build and system information including the following:

$ rocprofv3 -v
             version: 1.0.0
        git_revision: a1b2c3d4e5f6789012345678901234567890abcd
        library_arch: x86_64-linux-gnu
         system_name: Linux
    system_processor: x86_64
      system_version: 6.8.0-57-generic
         compiler_id: GNU
    compiler_version: 11.4.0
        rocm_version: 6.2.0