ROCprofiler-SDK Quick Reference Guide#
This quick reference guide provides an overview of the most commonly used rocprofv3 commands and links to detailed documentation sections.
Getting Started#
Export the ROCm binary path:
source /opt/rocm/share/rocprofiler-sdk/setup-env.sh
Check rocprofv3 version and help:
rocprofv3 --version
rocprofv3 --help
Essential Commands#
Querying System Capabilities#
List available counters and capabilities:
# List all available features
rocprofv3 --list-avail
# Using the dedicated tool for detailed queries
rocprofv3-avail list
rocprofv3-avail info
Documentation: Using rocprofv3-avail
Basic Tracing#
Application tracing (HIP API + kernel dispatches + memory operations):
# Runtime tracing (recommended for most use cases)
rocprofv3 --runtime-trace -- ./your_app
# System-level tracing (includes HSA API)
rocprofv3 --sys-trace -- ./your_app
Documentation: Using rocprofv3
Granular Tracing Options#
# HIP API, kernel dispatches, and memory operations tracing
rocprofv3 --hip-trace --kernel-trace --memory-copy-trace -- ./your_app
Documentation: Using rocprofv3 (Basic tracing section)
Performance Counter Collection#
# List available counters
rocprofv3-avail list --pmc
# Check if counters can be collected together
rocprofv3-avail pmc-check SQ_WAVES SQ_INSTS_VALU
# Collect specific counters
rocprofv3 --pmc SQ_WAVES,SQ_INSTS_VALU -- ./your_app
Documentation: Using rocprofv3 (Counter collection section)
Advanced Profiling Features#
PC Sampling (Beta)#
# Check PC sampling support
rocprofv3-avail list --pc-sampling
# Enable PC sampling
rocprofv3 --pc-sampling-beta-enabled --pc-sampling-interval 1000 -- ./your_app
Documentation: Using PC sampling
Thread Trace#
# Collect thread trace data
rocprofv3 --att --output-format csv -- ./your_app
Documentation: Using thread trace
Process Attachment#
# Attach to a running process by PID
rocprofv3 --pid 12345 --runtime-trace -d ./results
# or
# Attach for a specific duration (10 seconds)
rocprofv3 --pid 12345 --runtime-trace --attach-duration-msec 1000
Documentation: using-rocprofv3-process-attachment
Output Formats and Post-processing#
rocprofv3 supports multiple output formats for different analysis needs. The default format is rocpd, which stores data in a structured SQLite3 database.
Working with rocpd Database Format#
# Generate rocpd database (default format)
rocprofv3 --runtime-trace -- ./your_app
# Creates: hostname/pid_results.db
# Query the database directly with SQL
sqlite3 hostname/12345_results.db "SELECT * FROM regions;"
# Convert rocpd database to other formats
rocpd convert -i *.db -f csv pftrace otf2 --start 20% --end 80%
Collecting and converting to Other Formats#
# Multiple output formats in one run
rocprofv3 --runtime-trace --output-format csv json pftrace otf2 -- ./your_app
Documentation: Using rocpd Output Format
Summary and Statistics#
# Overall summary statistics per domain grouped by kernel and memory operations
rocprofv3 --runtime-trace --summary-per-domain --summary-groups "KERNEL_DISPATCH|MEMORY_COPY" -- ./your_app
Documentation: Using rocprofv3 (Post-processing tracing section)
Filtering and Selection#
Kernel Filtering#
# Include specific kernels by regex
rocprofv3 --kernel-trace --kernel-iteration-range 10-20 --kernel-include-regex "matmul.*" --kernel-exclude-regex ".*copy.*" -- ./your_app
Documentation: Using rocprofv3 (Filtering section)
Time-based Collection#
# Collect for specific time periods (start_delay:collection_time:repeat)
rocprofv3 --runtime-trace --collection-period 500:2000:0 --collection-period-unit msec -- ./your_app
Documentation: Using rocprofv3 (Filtering section)
Kernel Naming and Display#
# Keep mangled kernel names
rocprofv3 --kernel-trace --mangled-kernels -- ./your_app
# Truncate kernel names for readability
rocprofv3 --kernel-trace --truncate-kernels -- ./your_app
# Use ROCTx regions to rename kernels
rocprofv3 --kernel-trace --kernel-rename -- ./your_app
Documentation: Using rocprofv3 (Kernel naming section)
Code Annotation with ROCTx#
# Trace ROCTx markers and ranges
rocprofv3 --marker-trace -- ./your_app
Documentation: Using ROCTx
Parallel and Distributed Applications#
MPI Applications#
# Profile MPI applications
mpirun -n 4 rocprofv3 --runtime-trace --output-format csv -- ./your_mpi_app
Documentation: Using rocprofv3 with MPI
OpenMP Applications#
# Profile OpenMP applications
rocprofv3 --runtime-trace --output-format csv -- ./your_openmp_app
Documentation: Using rocprofv3 with OpenMP
Output Management#
File Organization#
# Specify output directory
rocprofv3 --runtime-trace --output-directory ./results --output-file my_trace -- ./your_app
# Generate configuration file
rocprofv3 --runtime-trace --output-config -- ./your_app
Documentation: Using rocprofv3 (I/O options section)
Common Use Cases#
Basic Performance Analysis#
# Quick performance overview
rocprofv3 --runtime-trace --summary -- ./your_app
Use case: Get a high-level view of application performance
Detailed Kernel Analysis#
# Detailed kernel profiling with counters
rocprofv3 --kernel-trace --pmc SQ_WAVES,SQ_INSTS_VALU,TCP_PERF_SEL_TOTAL_CACHE_ACCESSES -- ./your_app
Use case: Analyze specific kernel performance bottlenecks
Memory Transfer Analysis#
# Focus on memory operations
rocprofv3 --memory-copy-trace --memory-allocation-trace -- ./your_app
Use case: Optimize data movement between CPU and GPU
Timeline Visualization#
# Generate timeline for visualization tools
rocprofv3 --runtime-trace -- ./your_app
# Convert to Perfetto format
rocpd2pftrace -i hostname/pid_results.db -o perfetto_trace
Use case: Visualize execution timeline in Perfetto or similar tools
Installation and Setup#
Installation Documentation: Installing ROCprofiler-SDK
API Reference: Tool library
Samples and Examples: Samples
Troubleshooting Quick Tips#
Permission Issues: Ensure proper access to GPU devices and
/dev/kfdCounter Collection Fails: Use
rocprofv3-avail pmc-checkto verify counter compatibilityLarge Output Files: Use
--minimum-output-datato set file size thresholdsSignal Handling: Use
--disable-signal-handlersif conflicts with application handlersROCm Path Issues: Use
--rocm-rootto specify custom ROCm installation paths
For comprehensive documentation on each feature, refer to the detailed sections linked throughout this guide.