ROCprofiler-SDK quick reference guide#
This quick reference guide provides an overview of the most commonly used rocprofv3 commands and documentation links to various useful functionalities. For comprehensive documentation on each feature, click on the respective link.
Getting started#
Export the ROCm binary path:
source /opt/rocm/share/rocprofiler-sdk/setup-env.sh
Check rocprofv3 version and help:
rocprofv3 --version
rocprofv3 --help
Essential commands#
Querying system capabilities#
List available counters and capabilities:
# List all available features
rocprofv3 --list-avail
# Use the dedicated tool for detailed queries
rocprofv3-avail list
rocprofv3-avail info
Documentation: Using rocprofv3-avail
Basic tracing#
Application tracing (HIP API + kernel dispatches + memory operations):
# Runtime tracing (recommended for most use cases)
rocprofv3 --runtime-trace -- ./your_app
# System-level tracing (includes HSA API)
rocprofv3 --sys-trace -- ./your_app
Documentation: Application tracing
Granular tracing options#
# HIP API, kernel dispatches, and memory operations tracing
rocprofv3 --hip-trace --kernel-trace --memory-copy-trace -- ./your_app
Documentation: Application tracing
Performance counter collection#
# List available counters
rocprofv3-avail list --pmc
# Check if counters can be collected together
rocprofv3-avail pmc-check SQ_WAVES SQ_INSTS_VALU
# Collect specific counters
rocprofv3 --pmc SQ_WAVES,SQ_INSTS_VALU -- ./your_app
Documentation: Kernel counter collection
Advanced profiling features#
PC sampling (beta)#
# Check PC sampling support
rocprofv3-avail list --pc-sampling
# Enable PC sampling
rocprofv3 --pc-sampling-beta-enabled --pc-sampling-interval 1000 -- ./your_app
Documentation: Using PC sampling
Thread trace#
# Collect thread trace data
rocprofv3 --att --output-format csv -- ./your_app
Documentation: Using thread trace
Process attachment#
# Attach to a running process by PID
rocprofv3 --pid 12345 --runtime-trace -d ./results
# or
# Attach for a specific duration (10 seconds)
rocprofv3 --pid 12345 --runtime-trace --attach-duration-msec 1000
Documentation: Dynamic process attachment using rocprofv3
Output formats and post-processing#
rocprofv3 supports multiple output formats for different analytical requirements. The default format is rocpd, which stores data in a structured SQLite3 database.
Working with rocpd database format#
# Generate rocpd database (default format)
rocprofv3 --runtime-trace -- ./your_app
# Creates: hostname/pid_results.db
# Query the database directly with SQL
sqlite3 hostname/12345_results.db "SELECT * FROM regions;"
# Convert rocpd database to other formats
rocpd convert -i *.db -f csv pftrace otf2 --start 20% --end 80%
Documentation: Using rocpd output format
Collection in various formats#
# Multiple output formats in one run
rocprofv3 --runtime-trace --output-format csv json pftrace otf2 -- ./your_app
Documentation: Using rocpd output format
Summary and statistics#
# Overall summary statistics per domain grouped by kernel and memory operations
rocprofv3 --runtime-trace --summary-per-domain --summary-groups "KERNEL_DISPATCH|MEMORY_COPY" -- ./your_app
Documentation: Application tracing and profiling using rocprofv3 (Post-processing tracing section)
Filtering and selection#
Kernel filtering#
# Include specific kernels by regex
rocprofv3 --kernel-trace --kernel-iteration-range 10-20 --kernel-include-regex "matmul.*" --kernel-exclude-regex ".*copy.*" -- ./your_app
Documentation: Kernel filtering
Time-based collection#
# Collect for specific time periods (start_delay:collection_time:repeat)
rocprofv3 --runtime-trace --collection-period 500:2000:0 --collection-period-unit msec -- ./your_app
Documentation: Collection period
Kernel naming and display#
# Keep mangled kernel names
rocprofv3 --kernel-trace --mangled-kernels -- ./your_app
# Truncate kernel names for readability
rocprofv3 --kernel-trace --truncate-kernels -- ./your_app
# Use ROCTx regions to rename kernels
rocprofv3 --kernel-trace --kernel-rename -- ./your_app
Documentation: Kernel naming and filtering using rocprofv3
Code annotation with ROCTx#
# Trace ROCTx markers and ranges
rocprofv3 --marker-trace -- ./your_app
Documentation: Using ROCTx
Parallel and distributed applications#
MPI applications#
# Profile MPI applications
mpirun -n 4 rocprofv3 --runtime-trace --output-format csv -- ./your_mpi_app
Documentation: Using rocprofv3 with MPI
OpenMP applications#
# Profile OpenMP applications
rocprofv3 --runtime-trace --output-format csv -- ./your_openmp_app
Documentation: Using rocprofv3 with OpenMP
Output management#
File organization#
# Specify output directory
rocprofv3 --runtime-trace --output-directory ./results --output-file my_trace -- ./your_app
# Generate configuration file
rocprofv3 --runtime-trace --output-config -- ./your_app
Documentation: rocprofv3 I/O control options
Common use cases#
Basic performance analysis#
Use case: To get a high-level view of application performance:
# Quick performance overview
rocprofv3 --runtime-trace --summary -- ./your_app
Detailed kernel analysis#
Use case: To analyze specific kernel performance bottlenecks:
# Detailed kernel profiling with counters
rocprofv3 --kernel-trace --pmc SQ_WAVES,SQ_INSTS_VALU,TCP_PERF_SEL_TOTAL_CACHE_ACCESSES -- ./your_app
Memory transfer analysis#
Use case: To optimize data movement between CPU and GPU:
# Focus on memory operations
rocprofv3 --memory-copy-trace --memory-allocation-trace -- ./your_app
Timeline visualization#
Use case: To visualize execution timeline in Perfetto or similar tools:
# Generate timeline for visualization tools
rocprofv3 --runtime-trace -- ./your_app
# Convert to Perfetto format
rocpd2pftrace -i hostname/pid_results.db -o perfetto_trace
Installation and setup#
Installation documentation: Build ROCprofiler-SDK from source
API reference: Tool library
Samples and examples: Samples
Quick troubleshooting tips#
Permission issues: Ensure proper access to GPU devices and
/dev/kfd.Counter collection failure: Use
rocprofv3-avail pmc-checkto verify counter compatibility.Large output files: Use
--minimum-output-datato set file size thresholds.Signal handling: Use
--disable-signal-handlersin case of conflicts with application handlers.ROCm path issues: Use
--rocm-rootto specify custom ROCm installation paths.