ROCprofiler-SDK quick reference guide#

This quick reference guide provides an overview of the most commonly used rocprofv3 commands and documentation links to various useful functionalities. For comprehensive documentation on each feature, click on the respective link.

Getting started#

Export the ROCm binary path:

source /opt/rocm/share/rocprofiler-sdk/setup-env.sh

Check rocprofv3 version and help:

rocprofv3 --version
rocprofv3 --help

Essential commands#

Querying system capabilities#

List available counters and capabilities:

# List all available features
rocprofv3 --list-avail

# Use the dedicated tool for detailed queries
rocprofv3-avail list
rocprofv3-avail info

Documentation: Using rocprofv3-avail

Basic tracing#

Application tracing (HIP API + kernel dispatches + memory operations):

# Runtime tracing (recommended for most use cases)
rocprofv3 --runtime-trace -- ./your_app

# System-level tracing (includes HSA API)
rocprofv3 --sys-trace -- ./your_app

Documentation: Application tracing

Granular tracing options#

# HIP API, kernel dispatches, and memory operations tracing
rocprofv3 --hip-trace --kernel-trace --memory-copy-trace -- ./your_app

Documentation: Application tracing

Performance counter collection#

# List available counters
rocprofv3-avail list --pmc

# Check if counters can be collected together
rocprofv3-avail pmc-check SQ_WAVES SQ_INSTS_VALU

# Collect specific counters
rocprofv3 --pmc SQ_WAVES,SQ_INSTS_VALU -- ./your_app

Documentation: Kernel counter collection

Advanced profiling features#

PC sampling (beta)#

# Check PC sampling support
rocprofv3-avail list --pc-sampling

# Enable PC sampling
rocprofv3 --pc-sampling-beta-enabled --pc-sampling-interval 1000 -- ./your_app

Documentation: Using PC sampling

Thread trace#

# Collect thread trace data
rocprofv3 --att --output-format csv -- ./your_app

Documentation: Using thread trace

Process attachment#

# Attach to a running process by PID
rocprofv3 --pid 12345 --runtime-trace -d ./results
# or

# Attach for a specific duration (10 seconds)
rocprofv3 --pid 12345 --runtime-trace --attach-duration-msec 1000

Documentation: Dynamic process attachment using rocprofv3

Output formats and post-processing#

rocprofv3 supports multiple output formats for different analytical requirements. The default format is rocpd, which stores data in a structured SQLite3 database.

Working with rocpd database format#

# Generate rocpd database (default format)
rocprofv3 --runtime-trace -- ./your_app
# Creates: hostname/pid_results.db

# Query the database directly with SQL
sqlite3 hostname/12345_results.db "SELECT * FROM regions;"

# Convert rocpd database to other formats
rocpd convert -i *.db -f csv pftrace otf2 --start 20% --end 80%

Documentation: Using rocpd output format

Collection in various formats#

# Multiple output formats in one run
rocprofv3 --runtime-trace --output-format csv json pftrace otf2 -- ./your_app

Documentation: Using rocpd output format

Summary and statistics#

# Overall summary statistics per domain grouped by kernel and memory operations
rocprofv3 --runtime-trace --summary-per-domain --summary-groups "KERNEL_DISPATCH|MEMORY_COPY" -- ./your_app

Documentation: Application tracing and profiling using rocprofv3 (Post-processing tracing section)

Filtering and selection#

Kernel filtering#

# Include specific kernels by regex
rocprofv3 --kernel-trace --kernel-iteration-range 10-20 --kernel-include-regex "matmul.*" --kernel-exclude-regex ".*copy.*" -- ./your_app

Documentation: Kernel filtering

Time-based collection#

# Collect for specific time periods (start_delay:collection_time:repeat)
rocprofv3 --runtime-trace --collection-period 500:2000:0 --collection-period-unit msec -- ./your_app

Documentation: Collection period

Kernel naming and display#

# Keep mangled kernel names
rocprofv3 --kernel-trace --mangled-kernels -- ./your_app

# Truncate kernel names for readability
rocprofv3 --kernel-trace --truncate-kernels -- ./your_app

# Use ROCTx regions to rename kernels
rocprofv3 --kernel-trace --kernel-rename -- ./your_app

Documentation: Kernel naming and filtering using rocprofv3

Code annotation with ROCTx#

# Trace ROCTx markers and ranges
rocprofv3 --marker-trace -- ./your_app

Documentation: Using ROCTx

Parallel and distributed applications#

MPI applications#

# Profile MPI applications
mpirun -n 4 rocprofv3 --runtime-trace --output-format csv -- ./your_mpi_app

Documentation: Using rocprofv3 with MPI

OpenMP applications#

# Profile OpenMP applications
rocprofv3 --runtime-trace --output-format csv -- ./your_openmp_app

Documentation: Using rocprofv3 with OpenMP

Output management#

File organization#

# Specify output directory
rocprofv3 --runtime-trace --output-directory ./results --output-file my_trace   -- ./your_app

# Generate configuration file
rocprofv3 --runtime-trace --output-config -- ./your_app

Documentation: rocprofv3 I/O control options

Common use cases#

Basic performance analysis#

Use case: To get a high-level view of application performance:

# Quick performance overview
rocprofv3 --runtime-trace --summary -- ./your_app

Detailed kernel analysis#

Use case: To analyze specific kernel performance bottlenecks:

# Detailed kernel profiling with counters
rocprofv3 --kernel-trace --pmc SQ_WAVES,SQ_INSTS_VALU,TCP_PERF_SEL_TOTAL_CACHE_ACCESSES -- ./your_app

Memory transfer analysis#

Use case: To optimize data movement between CPU and GPU:

# Focus on memory operations
rocprofv3 --memory-copy-trace --memory-allocation-trace -- ./your_app

Timeline visualization#

Use case: To visualize execution timeline in Perfetto or similar tools:

# Generate timeline for visualization tools
rocprofv3 --runtime-trace  -- ./your_app

# Convert to Perfetto format
rocpd2pftrace -i hostname/pid_results.db -o perfetto_trace

Installation and setup#

Installation documentation: Build ROCprofiler-SDK from source

API reference: Tool library

Samples and examples: Samples

Quick troubleshooting tips#

  • Permission issues: Ensure proper access to GPU devices and /dev/kfd.

  • Counter collection failure: Use rocprofv3-avail pmc-check to verify counter compatibility.

  • Large output files: Use --minimum-output-data to set file size thresholds.

  • Signal handling: Use --disable-signal-handlers in case of conflicts with application handlers.

  • ROCm path issues: Use --rocm-root to specify custom ROCm installation paths.