ROCprofiler-SDK Quick Reference Guide#

This quick reference guide provides an overview of the most commonly used rocprofv3 commands and links to detailed documentation sections.

Getting Started#

Export the ROCm binary path:

source /opt/rocm/share/rocprofiler-sdk/setup-env.sh

Check rocprofv3 version and help:

rocprofv3 --version
rocprofv3 --help

Essential Commands#

Querying System Capabilities#

List available counters and capabilities:

# List all available features
rocprofv3 --list-avail

# Using the dedicated tool for detailed queries
rocprofv3-avail list
rocprofv3-avail info

Documentation: Using rocprofv3-avail

Basic Tracing#

Application tracing (HIP API + kernel dispatches + memory operations):

# Runtime tracing (recommended for most use cases)
rocprofv3 --runtime-trace -- ./your_app

# System-level tracing (includes HSA API)
rocprofv3 --sys-trace -- ./your_app

Documentation: Using rocprofv3

Granular Tracing Options#

# HIP API, kernel dispatches, and memory operations tracing
rocprofv3 --hip-trace --kernel-trace --memory-copy-trace -- ./your_app

Documentation: Using rocprofv3 (Basic tracing section)

Performance Counter Collection#

# List available counters
rocprofv3-avail list --pmc

# Check if counters can be collected together
rocprofv3-avail pmc-check SQ_WAVES SQ_INSTS_VALU

# Collect specific counters
rocprofv3 --pmc SQ_WAVES,SQ_INSTS_VALU -- ./your_app

Documentation: Using rocprofv3 (Counter collection section)

Advanced Profiling Features#

PC Sampling (Beta)#

# Check PC sampling support
rocprofv3-avail list --pc-sampling

# Enable PC sampling
rocprofv3 --pc-sampling-beta-enabled --pc-sampling-interval 1000 -- ./your_app

Documentation: Using PC sampling

Thread Trace#

# Collect thread trace data
rocprofv3 --att --output-format csv -- ./your_app

Documentation: Using thread trace

Process Attachment#

# Attach to a running process by PID
rocprofv3 --pid 12345 --runtime-trace -d ./results
# or

# Attach for a specific duration (10 seconds)
rocprofv3 --pid 12345 --runtime-trace --attach-duration-msec 1000

Documentation: using-rocprofv3-process-attachment

Output Formats and Post-processing#

rocprofv3 supports multiple output formats for different analysis needs. The default format is rocpd, which stores data in a structured SQLite3 database.

Working with rocpd Database Format#

# Generate rocpd database (default format)
rocprofv3 --runtime-trace -- ./your_app
# Creates: hostname/pid_results.db

# Query the database directly with SQL
sqlite3 hostname/12345_results.db "SELECT * FROM regions;"

# Convert rocpd database to other formats
rocpd convert -i *.db -f csv pftrace otf2 --start 20% --end 80%

Collecting and converting to Other Formats#

# Multiple output formats in one run
rocprofv3 --runtime-trace --output-format csv json pftrace otf2 -- ./your_app

Documentation: Using rocpd Output Format

Summary and Statistics#

# Overall summary statistics per domain grouped by kernel and memory operations
rocprofv3 --runtime-trace --summary-per-domain --summary-groups "KERNEL_DISPATCH|MEMORY_COPY" -- ./your_app

Documentation: Using rocprofv3 (Post-processing tracing section)

Filtering and Selection#

Kernel Filtering#

# Include specific kernels by regex
rocprofv3 --kernel-trace --kernel-iteration-range 10-20 --kernel-include-regex "matmul.*" --kernel-exclude-regex ".*copy.*" -- ./your_app

Documentation: Using rocprofv3 (Filtering section)

Time-based Collection#

# Collect for specific time periods (start_delay:collection_time:repeat)
rocprofv3 --runtime-trace --collection-period 500:2000:0 --collection-period-unit msec -- ./your_app

Documentation: Using rocprofv3 (Filtering section)

Kernel Naming and Display#

# Keep mangled kernel names
rocprofv3 --kernel-trace --mangled-kernels -- ./your_app

# Truncate kernel names for readability
rocprofv3 --kernel-trace --truncate-kernels -- ./your_app

# Use ROCTx regions to rename kernels
rocprofv3 --kernel-trace --kernel-rename -- ./your_app

Documentation: Using rocprofv3 (Kernel naming section)

Code Annotation with ROCTx#

# Trace ROCTx markers and ranges
rocprofv3 --marker-trace -- ./your_app

Documentation: Using ROCTx

Parallel and Distributed Applications#

MPI Applications#

# Profile MPI applications
mpirun -n 4 rocprofv3 --runtime-trace --output-format csv -- ./your_mpi_app

Documentation: Using rocprofv3 with MPI

OpenMP Applications#

# Profile OpenMP applications
rocprofv3 --runtime-trace --output-format csv -- ./your_openmp_app

Documentation: Using rocprofv3 with OpenMP

Output Management#

File Organization#

# Specify output directory
rocprofv3 --runtime-trace --output-directory ./results --output-file my_trace   -- ./your_app

# Generate configuration file
rocprofv3 --runtime-trace --output-config -- ./your_app

Documentation: Using rocprofv3 (I/O options section)

Common Use Cases#

Basic Performance Analysis#

# Quick performance overview
rocprofv3 --runtime-trace --summary -- ./your_app

Use case: Get a high-level view of application performance

Detailed Kernel Analysis#

# Detailed kernel profiling with counters
rocprofv3 --kernel-trace --pmc SQ_WAVES,SQ_INSTS_VALU,TCP_PERF_SEL_TOTAL_CACHE_ACCESSES -- ./your_app

Use case: Analyze specific kernel performance bottlenecks

Memory Transfer Analysis#

# Focus on memory operations
rocprofv3 --memory-copy-trace --memory-allocation-trace -- ./your_app

Use case: Optimize data movement between CPU and GPU

Timeline Visualization#

# Generate timeline for visualization tools
rocprofv3 --runtime-trace  -- ./your_app

# Convert to Perfetto format
rocpd2pftrace -i hostname/pid_results.db -o perfetto_trace

Use case: Visualize execution timeline in Perfetto or similar tools

Installation and Setup#

Installation Documentation: Installing ROCprofiler-SDK

API Reference: Tool library

Samples and Examples: Samples

Troubleshooting Quick Tips#

  1. Permission Issues: Ensure proper access to GPU devices and /dev/kfd

  2. Counter Collection Fails: Use rocprofv3-avail pmc-check to verify counter compatibility

  3. Large Output Files: Use --minimum-output-data to set file size thresholds

  4. Signal Handling: Use --disable-signal-handlers if conflicts with application handlers

  5. ROCm Path Issues: Use --rocm-root to specify custom ROCm installation paths

For comprehensive documentation on each feature, refer to the detailed sections linked throughout this guide.