Using rocpd output format#
To accommodate diverse analysis workflows, rocprofv3 provides comprehensive support for multiple output formats:
rocpd (SQLite3 database) - Default format providing structured data storage.
CSV (Comma-Separated Values) - Tabular format for spreadsheet applications and data analysis tools.
JSON (JavaScript Object Notation) - Structured format optimized for programmatic analysis and integration.
PFTrace (Perfetto protocol buffers) - Binary trace format for high-performance visualization using Perfetto.
OTF2 (Open Trace Format 2) - Standardized trace format for interoperability with third-party analysis tools.
The rocpd output format serves as the primary data repository for rocprofv3 profiling sessions. This format leverages SQLite3’s ACID-compliant database engine to provide robust, structured storage of comprehensive profiling datasets. The relational schema enables efficient querying and manipulation of profiling data through standard SQL interfaces, facilitating complex analytical operations and custom reporting workflows.
Features#
Comprehensive data model: Consolidates all profiling artifacts including execution traces, performance counters, hardware metrics, and contextual metadata within a single SQLite3 database file (
.dbextension).Standards-compliant access: Supports querying through industry-standard SQL interfaces including command-line tools (
sqlite3CLI), programming language bindings (Pythonsqlite3module, C/C++ SQLite API), and database management applications.Advanced analytics integration: Facilitates sophisticated post-processing workflows through custom analytical scripts, automated reporting systems, and integration with third-party visualization and analysis frameworks that provide SQLite3 connectivity.
Generating rocpd output#
To generate profiling data in the default rocpd format, use:
rocprofv3 --hip-trace -- <application>
Or, explicitly specify the rocpd output format using the --output-format parameter:
rocprofv3 --hip-trace --output-format rocpd -- <application>
The profiling session generates output files following the naming convention %hostname%/%pid%_results.db, where:
%hostname%: The system hostname.%pid%: The process identifier of the profiled application.
Converting rocpd to alternative formats#
The rocpd database format supports conversion to alternative output formats for specialized analysis and visualization workflows.
The rocpd conversion utility is distributed as part of the ROCm installation package, located in /opt/rocm-<version>/bin, and provides both executable and Python module interfaces for programmatic integration.
To transform database files into target formats, run the rocpd convert command with appropriate parameters.
CSV format conversion
/opt/rocm/bin/rocpd convert -i <input-file>.db --output-format csv
The converted CSV files are generated as
rocpd-output-data/out_hip_api_trace.csv, where therocpd-output-datais relative to the current working directory.OTF2 format conversion
/opt/rocm/bin/rocpd convert -i <input-file>.db --output-format otf2
Perfetto trace format conversion
/opt/rocm/bin/rocpd convert -i <input-file>.db --output-format pftrace
Python interpreter compatibility:
On encountering Python interpreter version conflicts, specify the appropriate Python executable explicitly:
python3.10 $(which rocpd) convert -f csv -i <input-file>.db
rocpd convert command-line options#
The command-line options as displayed using rocpd convert --help are listed here:
usage: rocpd convert [-h] -i INPUT [INPUT ...] -f {csv,pftrace,otf2} [{csv,pftrace,otf2} ...]
[-o OUTPUT_FILE] [-d OUTPUT_PATH] [--kernel-rename]
[--agent-index-value {absolute,relative,type-relative}]
[--perfetto-backend {inprocess,system}]
[--perfetto-buffer-fill-policy {discard,ring_buffer}]
[--perfetto-buffer-size KB] [--perfetto-shmem-size-hint KB]
[--group-by-queue]
[--start START | --start-marker START_MARKER]
[--end END | --end-marker END_MARKER]
[--inclusive INCLUSIVE]
The following table provides a detailed listing of the rocpd convert command-line options:
Category |
Option |
Description |
|---|---|---|
Required arguments |
-i INPUT [INPUT ...], --input INPUT [INPUT ...] -f {csv,pftrace,otf2} [{csv,pftrace,otf2} ...], --output-format {csv,pftrace,otf2} [{csv,pftrace,otf2} ...] |
Specifies input database file paths. Accepts multiple SQLite3 database files separated by whitespace for batch processing operations.
Defines target output formats. Supports concurrent conversion to multiple formats such as CSV, PFTrace, and OTF2.
|
I/O configuration |
-o OUTPUT_FILE, --output-file OUTPUT_FILE -d OUTPUT_PATH, --output-path OUTPUT_PATH |
Configures the base filename for generated output files (default:
out). Specifies the target directory for output file generation (default:
./rocpd-output-data). |
Kernel identification Options |
|
Substitutes kernel function names with the corresponding ROCTx marker annotations for enhanced semantic context. |
Device identification configuration |
|
Controls device identification methodology in the converted output. Here are the values:
|
Perfetto trace configuration |
--perfetto-backend {inprocess,system} --perfetto-buffer-fill-policy {discard,ring_buffer} --perfetto-buffer-size KB --perfetto-shmem-size-hint KB --group-by-queue |
Configures Perfetto data collection architecture. The value
system requires active traced and perfetto daemon processes, while inprocess operates autonomously. The default value is inprocess. Defines buffer overflow handling strategy. The value
discard drops new records when capacity is exceeded and ring_buffer overwrites oldest records. The default value is discard. Sets the trace buffer capacity (in kilobytes) for Perfetto output generation. The default value is 1,048,576 KB or 1 GB.
Specifies shared memory allocation hint (in kilobytes) for Perfetto interprocess communication. The default value is 64 KB.
Organizes trace data by HIP stream abstractions rather than low-level HSA queue identifiers, providing higher-level application context for kernel and memory transfer operations.
|
Temporal filtering configuration |
--start START --start-marker START_MARKER --end END --end-marker END_MARKER --inclusive INCLUSIVE |
Defines trace window start boundary using percentage notation such as
50% or absolute nanosecond timestamps such as 781470909013049. Specifies named marker event identifier to establish trace window start boundary.
Defines trace window end boundary using percentage notation such as
75% or absolute nanosecond timestamps such as 3543724246381057. Specifies named marker event identifier to establish trace window end boundary.
Controls event inclusion criteria. The value
True includes events with either start or end timestamps within the specified window while False requires both timestamps within the window. The default value is True. |
Command-line Help |
|
Displays comprehensive command syntax, parameter descriptions, and usage examples. |
Types of conversion#
Here are the types of conversion supported by rocpd:
Single database conversion to Perfetto format
/opt/rocm/bin/rocpd convert -i db1.db --output-format pftrace
Multi-Database conversion with temporal filtering
The following example converts multiple databases to Perfetto format while specifying custom output directory and filename with temporal window constraint set to the final 70% of the trace duration:
/opt/rocm/bin/rocpd convert -i db1.db db2.db --output-format pftrace -d "./output/" -o "twoFileTraces" --start 30% --end 100%
Batch conversion into multiple formats
The following example processes six database files simultaneously, generating both CSV and Perfetto trace outputs with custom output configuration:
/opt/rocm/bin/rocpd convert -i db{0..5}.db --output-format csv pftrace -d "~/output_folder/" -o "sixFileTraces"
Comprehensive format conversion
The following example converts multiple databases into all supported formats (CSV, OTF2, and Perfetto trace) in a single operation:
/opt/rocm/bin/rocpd convert -i db{3,4}.db --output-format csv otf2 pftrace
Dedicated conversion tools#
ROCprofiler-SDK provides specialized conversion utilities for efficient format-specific operations. The following tools offer streamlined interfaces for single-format conversions, which are particularly useful in automated workflows and scripts.
rocpd2csv - CSV export tool#
Purpose: To convert rocpd SQLite3 databases to CSV format for spreadsheet analysis and data processing workflows.
Location: /opt/rocm/bin/rocpd2csv
Syntax:
rocpd2csv -i INPUT [INPUT ...] [OPTIONS]
Key features:
Structured data export: Converts hierarchical database content to tabular CSV format.
Multidatabase support: Aggregates data from multiple database files into a unified CSV output.
Time window filtering: Applies temporal filters to limit exported data range.
Configurable output: Facilitates customized output file naming and directory structure.
Usage examples:
# Basic CSV conversion
rocpd2csv -i profile_data.db
# Convert multiple databases with custom output path
rocpd2csv -i db1.db db2.db db3.db -d ~/analysis_output/ -o combined_profile
# Apply time window filtering (export middle 70% of execution)
rocpd2csv -i large_profile.db --start 15% --end 85%
Common output files:
out_hip_api_trace.csv: HIP API call trace data.out_kernel_trace.csv: GPU kernel execution information.out_counter_collection.csv: Hardware performance counter data.
rocpd2otf2 - OTF2 export tool#
Purpose: To generate OTF2 files for high-performance trace analysis using tools such as Vampir, Tau, and Score-P viewers.
Location: /opt/rocm/bin/rocpd2otf2
Syntax:
rocpd2otf2 -i INPUT [INPUT ...] [OPTIONS]
Key features:
HPC-standard format: Produces traces compatible with scientific computing analysis tools.
Hierarchical timeline: Preserves process, thread, or queue relationships in trace structure.
Scalable storage: Efficient binary format for large-scale profiling data.
Agent indexing: Configurable GPU agent indexing strategies (absolute, relative, or type-relative).
Usage examples:
# Generate OTF2 trace archive
rocpd2otf2 -i gpu_workload.db
# Multi-process trace with custom indexing
rocpd2otf2 -i mpi_rank_*.db --agent-index-value type-relative -o mpi_trace
# Time-windowed trace export
rocpd2otf2 -i long_execution.db --start-marker "computation_begin" --end-marker "computation_end"
Output structure:
trace.otf2: Main trace archive containing timeline data.trace.def: Trace definition file with metadata.Supporting files for multistream trace data.
rocpd2pftrace - Perfetto trace export#
Purpose: To convert rocpd databases to Perfetto protocol buffer format for interactive visualization using the Perfetto UI.
Location: /opt/rocm/bin/rocpd2pftrace
Syntax:
rocpd2pftrace -i INPUT [INPUT ...] [OPTIONS]
Key features:
Interactive visualization: Optimized for modern web-based trace viewers.
Real-time analysis: Supports streaming analysis workflows.
GPU timeline integration: Specialized visualization of GPU execution patterns.
Configurable backend: Supports both in-process and system-wide tracing backends.
Backend configuration options:
# In-process backend (default)
rocpd2pftrace -i profile.db --perfetto-backend inprocess
# System-wide tracing backend
rocpd2pftrace -i system_profile.db --perfetto-backend system \
--perfetto-buffer-size 64MB --perfetto-shmem-size-hint 32MB
Buffer management options:
# Ring buffer mode (overwrites old data)
rocpd2pftrace -i continuous_profile.db --perfetto-buffer-fill-policy ring_buffer
# Discard mode (stops recording when full)
rocpd2pftrace -i bounded_profile.db --perfetto-buffer-fill-policy discard
Usage examples:
# Basic Perfetto trace generation
rocpd2pftrace -i application.db
# High-throughput configuration
rocpd2pftrace -i heavy_workload.db --perfetto-buffer-size 128MB \
--perfetto-buffer-fill-policy ring_buffer
# Multi-queue analysis
rocpd2pftrace -i multi_stream.db --group-by-queue -o queue_analysis
Visualization workflow:
Follow these steps to visualize traces on Perfetto:
Generate
.perfetto-tracefile usingrocpd2pftrace.Open https://ui.perfetto.dev in web browser.
Load generated trace file for interactive analysis.
rocpd2summary - statistical analysis tool#
Purpose: To generate comprehensive statistical summaries and performance analysis reports from rocpd profiling data.
Location: /opt/rocm/bin/rocpd2summary
Syntax:
rocpd2summary -i INPUT [INPUT ...] [OPTIONS]
Key features:
Multiformat output: Supports console, CSV, HTML, JSON, Markdown, and PDF report generation.
Comprehensive statistics: Statistics include kernel execution times, API call frequencies, and memory transfer analysis.
Domain-specific analysis: Generates separate summaries for HIP, ROCr, Markers, and other trace domains. For examples, see analysis-categories.
Rank-based analysis: Per-process and per-rank performance breakdowns for MPI applications.
Configurable scope: Selective inclusion or exclusion of analysis categories.
Output format options:
# Console output (default)
rocpd2summary -i profile.db
# CSV format for data analysis
rocpd2summary -i profile.db --format csv -o performance_metrics
# HTML report with visualization
rocpd2summary -i profile.db --format html -d ~/reports/
# Multiple output formats
rocpd2summary -i profile.db --format csv html json
Analysis categories:
Here is how you can generate summary for specific trace domains:
# Include all available domains
rocpd2summary -i profile.db --region-categories HIP HSA MARKERS KERNEL
# Focus on GPU kernel analysis only
rocpd2summary -i profile.db --region-categories KERNEL
# Exclude markers to speed up processing
rocpd2summary -i profile.db --region-categories HIP HSA KERNEL
Advanced analysis options:
The following commands demonstrate the usage of advanced analysis options such as domain-summary, summary-by-rank, and start/end:
# Include domain-specific statistics
rocpd2summary -i multi_gpu.db --domain-summary
# Per-rank analysis for MPI applications
rocpd2summary -i mpi_profile_*.db --summary-by-rank --format html
# Time-windowed summary analysis
rocpd2summary -i long_run.db --start 25% --end 75% --format csv
Report content:
Kernel statistics: Execution time distributions, call frequencies, and grid or block sizes.
API timing: HIP or HSA function call durations and frequencies.
Memory analysis: Transfer patterns, bandwidth utilization, and allocation statistics.
Device utilization: GPU occupancy patterns and idle time analysis.
Synchronization overhead: Barrier and synchronization point analysis.
Output files:
kernels_summary.{format}- GPU kernel execution summary.hip_summary.{format}- HIP API call statistics.hsa_summary.{format}- HSA runtime API analysis.memory_summary.{format}- Memory operation statistics.markers_summary.{format}- Marker event analysis.
Generating performance summary using rocpd#
The rocpd summary command provides statistical analysis and performance summaries, similar to the summary functionality available in rocprofv3. This command generates comprehensive reports from rocpd database files, offering the same analytical capabilities that were previously available through rocprofv3 --summary but on the structured database format.
Purpose: To generate statistical summaries and performance reports from rocpd database files, providing similar functionality as rocprofv3 builtin summary capabilities.
Location: /opt/rocm/bin/rocpd summary
Syntax:
rocpd summary -i INPUT [INPUT ...] [OPTIONS]
Key features:
Compatible analysis: Provides the same summary statistics and reports as
rocprofv3 --summary.Database-driven: Operates on structured
rocpddatabase files for consistent and reproducible analysis.Multidatabase aggregation: Combines and analyzes data from multiple profiling sessions, ranks, or nodes in a single operation. This is further explained in Multidatabase summary analysis.
Comparative analysis: Uses
--summary-by-rankto compare performance across different ranks, nodes, or execution contexts.Flexible output: Generates summaries in multiple formats such as console, CSV, HTML, and JSON.
Selective reporting: Allows generating reports based on specific performance domains and categories.
Multidatabase summary analysis#
The rocpd summary command excels at aggregating multiple database files, providing capabilities not available with single-session analysis.
Here are the benefits of using rocpd summary for multidatabase summary:
Unified summary reports:
# Aggregate multiple databases into single comprehensive summary rocpd summary -i session1.db session2.db session3.db --format html -o unified_summary # Combine all MPI rank databases for overall application analysis rocpd summary -i rank_*.db --format csv -o mpi_application_summary # Time-series aggregation across multiple profiling runs rocpd summary -i daily_profile_*.db --format json -o weekly_performance_trends
Rankwise comparative analysis:
The
--summary-by-rankoption provides detailed comparative analysis, helping you to identify performance variations, load balancing issues, and optimization opportunities across different execution contexts:# Compare performance across MPI ranks rocpd summary -i rank_0.db rank_1.db rank_2.db rank_3.db --summary-by-rank --format html -o rank_comparison # Analyze multi-node performance characteristics rocpd summary -i node_*.db --summary-by-rank --format csv -o node_performance_analysis # Compare GPU device performance in multi-GPU applications rocpd summary -i gpu_0.db gpu_1.db gpu_2.db gpu_3.db --summary-by-rank --format json -o gpu_scaling_analysis
Use cases for multidatabase summary analysis#
MPI application performance analysis:
# Profile distributed MPI application mpirun -np 8 rocprofv3 --hip-trace --output-format rocpd -- mpi_simulation # Generate unified summary for overall application performance rocpd summary -i results_rank_*.db --format html -o application_overview # Identify load balancing issues with rank-by-rank comparison rocpd summary -i results_rank_*.db --summary-by-rank --format csv -o load_balance_analysis
Multi-GPU scaling studies:
# Profile scaling from 1 to 4 GPUs for gpus in 1 2 4; do rocprofv3 --hip-trace --device 0:$((gpus-1)) --output-format rocpd -o "scaling_${gpus}gpu.db" -- gpu_benchmark done # Aggregate scaling analysis rocpd summary -i scaling_*gpu.db --format html -o gpu_scaling_summary # Compare efficiency across different GPU counts rocpd summary -i scaling_*gpu.db --summary-by-rank --format json -o scaling_efficiency
Performance regression testing:
# Profile baseline and optimized versions rocprofv3 --hip-trace --output-format rocpd -o baseline.db -- application_v1 rocprofv3 --hip-trace --output-format rocpd -o optimized.db -- application_v2 # Generate unified performance comparison rocpd summary -i baseline.db optimized.db --summary-by-rank --format html -o regression_analysis
Cross-platform performance comparison:
# Profile on different hardware platforms rocprofv3 --hip-trace --output-format rocpd -o platform_A.db -- benchmark rocprofv3 --hip-trace --output-format rocpd -o platform_B.db -- benchmark # Compare platform performance characteristics rocpd summary -i platform_*.db --summary-by-rank --format csv -o platform_comparison
Domain-specific reporting:
# Cross-rank summary for MPI applications with domain focus rocpd summary -i rank_*.db --summary-by-rank --region-categories KERNEL HIP --format html
Time-windowed analysis:
rocpd summary -i profile_*.db --start 25% --end 75% --summary-by-rank
Domain-specific comparative analysis:
rocpd summary -i node_*.db --domain-summary --summary-by-rank --region-categories HIP ROCR
In summary, here is how the output generated using rocpd summary enhances performance analysis:
Unified summaries: Provides aggregate statistics across all input databases, showing combined performance metrics.
Rankwise summaries: Generates separate statistical reports for each input database, enabling direct comparison of performance characteristics.
Comparative metrics: Highlights performance variations, identifies outliers, and reveals load balancing opportunities.
Integration with rocprofv3 workflow#
The rocpd summary command maintains full compatibility with rocprofv3 summary analysis, while extending capabilities to multidatabase scenarios. Users familiar with rocprofv3 --summary will find identical statistical outputs and report formats when using rocpd summary on database files, with the added benefit of cross-session analysis capabilities.
Aggregating multiprofiling data using rocpd#
One of the key advantages of the rocpd format is its ability to aggregate and analyze data from multiple profiling sessions, ranks, or nodes within a unified framework. This capability enables comprehensive analysis workflows, which was not possible with earlier output formats.
Here are the use cases of data aggregation using rocpd:
Multidatabase analysis capabilities.
Unlike the Perfetto output format used in earlier versions,
rocpddatabases can be seamlessly combined for cross-session analysis:# Aggregate analysis across multiple profiling sessions rocpd query -i session1.db session2.db session3.db \ --query "SELECT name, AVG(duration) FROM kernels GROUP BY name" # Cross-rank performance comparison for MPI applications rocpd summary -i rank_0.db rank_1.db rank_2.db rank_3.db --summary-by-rank # Multi-node scaling analysis rocpd query -i node_*.db \ --query "SELECT COUNT(*) as total_kernels, SUM(duration) as total_time FROM kernels"
Distributed computing workflows.
rocpdcan effectively aggregate data in a distributed computing environment. Here are the examples for single and multi-GPU analysis:MPI application analysis:
# Profile MPI application across multiple ranks mpirun -np 4 rocprofv3 --hip-trace --output-format rocpd -- mpi_application # Generate aggregated performance summary rocpd summary -i results_rank_*.db --summary-by-rank --format html -o mpi_performance_report # Analyze load balancing across ranks rocpd query -i results_rank_*.db \ --query "SELECT pid, COUNT(*) as kernel_count, AVG(duration) as avg_duration FROM kernels GROUP BY pid"
Multi-GPU scaling analysis:
# Profile application with multiple GPU devices rocprofv3 --hip-trace --device 0,1,2,3 --output-format rocpd -- multi_gpu_app # Aggregate device utilization analysis rocpd query -i multi_gpu_results.db \ --query "SELECT agent_abs_index as device_id, COUNT(*) as operations, SUM(duration) as total_time FROM kernels GROUP BY device_id" # Cross-device performance comparison rocpd summary -i multi_gpu_results.db --domain-summary
Temporal aggregation.
Time-series analysis:
# Collect profiles over time for performance monitoring for hour in {1..24}; do rocprofv3 --hip-trace --output-format rocpd -o "profile_hour_$hour.db" -- application done # Analyze performance trends over time rocpd query -i profile_hour_*.db \ --query "SELECT AVG(duration) as avg_kernel_time, COUNT(*) as kernel_count FROM kernels" \ --format csv -o performance_trends
Comparative analysis:
# Compare baseline vs optimized performance rocpd query -i baseline.db optimized.db \ --query "SELECT kernel, AVG(duration) as avg_time FROM kernels GROUP BY name ORDER BY avg_time DESC" # Generate comparative summary reports rocpd summary -i baseline.db optimized.db --format html -o comparison_report
Here are the benefits of data aggregation using rocpd:
Unified analysis: Combines data from different execution contexts, hardware configurations, and time periods.
Scalability insights: Analyzes performance scaling across multiple nodes, ranks, or GPU devices.
Trend analysis: Tracks performance evolution over time or across different software versions.
Load balancing: Identifies performance bottlenecks and load distribution issues in distributed applications.
Cross-platform comparison: Compares performance across different hardware platforms using unified database schema.
The aggregation capabilities of rocpd format enable sophisticated analysis workflows that provide deeper insight into application performance characteristics across diverse computing environments.
Tool integration and workflow examples#
The following examples demonstrate how to use the rocpd conversion tools for automated performance monitoring and multiformat analysis.
Multiformat analysis pipeline:
# Generate all analysis formats for comprehensive review rocpd2csv -i profile.db -o analysis_data rocpd2summary -i profile.db --format html -o performance_report rocpd2pftrace -i profile.db -o interactive_trace
Automated performance monitoring:
#!/bin/bash # Performance analysis automation script PROFILE_DB="$1" OUTPUT_DIR="analysis_$(date +%Y%m%d_%H%M%S)" mkdir -p "$OUTPUT_DIR" # Generate CSV data for automated analysis rocpd2csv -i "$PROFILE_DB" -d "$OUTPUT_DIR" -o raw_data # Create summary reports rocpd2summary -i "$PROFILE_DB" --format csv html \ -d "$OUTPUT_DIR" -o performance_summary # Generate interactive trace for detailed investigation rocpd2pftrace -i "$PROFILE_DB" -d "$OUTPUT_DIR" -o interactive_trace
Executing SQL queries with rocpd#
The rocpd query command provides comprehensive SQL-based analysis capabilities for exploring and extracting data from rocpd databases. This command enables custom analysis workflows and automated reporting for GPU profiling data analysis, and integration with external analysis pipelines.
Purpose: To execute custom SQL queries against rocpd databases with support for multiple output formats, automated reporting, and Email delivery.
Location: /opt/rocm/bin/rocpd query
Syntax:
rocpd query -i INPUT [INPUT ...] --query "SQL_STATEMENT" [OPTIONS]
Key features:
Standard SQL support: Supports full SQLite3 SQL syntax including JOINs, aggregate functions, and complex WHERE clauses.
Multidatabase aggregation: Facilitates queries across multiple database files as a unified virtual database.
Multiple output formats: Supports output formats such as console, CSV, HTML, JSON, Markdown, PDF, and interactive dashboards.
Script execution: Can execute complex SQL scripts with view definitions and custom functions.
Automated reporting: Supports Email delivery with SMTP configuration and attachment management.
Time window integration: Allows temporal filtering before query execution.
Database schema and views#
rocpd databases provide the following comprehensive views for analysis. It is generally recommended to build a query using data_views:
Core data views:
-- System and hardware information SELECT * FROM rocpd_info_agents; SELECT * FROM rocpd_info_node; -- Kernel execution data SELECT * FROM kernels; SELECT * FROM top_kernels; -- API trace information SELECT * FROM regions_and_samples WHERE category LIKE 'HIP_%'; SELECT * FROM regions_and_samples WHERE category LIKE 'RCCL_%; -- Performance counters SELECT * FROM counters_collection; -- Memory operations SELECT * FROM memory_copies; SELECT * FROM memory_allocations; -- Process and thread information SELECT * FROM processes; SELECT * FROM threads; -- Marker and region data SELECT * FROM regions; SELECT * FROM regions_and_samples WHERE category LIKE 'MARKERS_%';
Summary and analysis views:
-- Top performing kernels by execution time SELECT * FROM top_kernels LIMIT 10; -- Top Analysis SELECT * FROM top; -- Busy Analysis SELECT * FROM busy;
rocpd query examples#
The following examples demonstrate the usage of rocpd query for performing some useful operations:
Simple data exploration:
# List available GPU agents rocpd query -i profile.db --query "SELECT * FROM rocpd_info_agents" # Show top 10 longest-running kernels rocpd query -i profile.db --query "SELECT name, duration FROM kernels ORDER BY duration DESC LIMIT 10" # Count total number of kernel dispatches rocpd query -i profile.db --query "SELECT COUNT(*) as total_kernels FROM kernels"
Multidatabase aggregation:
# Combine data from multiple profiling sessions rocpd query -i session1.db session2.db session3.db \ --query "SELECT pid, COUNT(*) as kernel_count FROM kernels GROUP BY pid" # Cross-session performance comparison rocpd query -i baseline.db optimized.db \ --query "SELECT name as kernel_name, AVG(duration) as avg_duration FROM kernels GROUP BY kernel_name"
Advanced analytics:
# Kernel performance analysis with statistics rocpd query -i profile.db --query " SELECT name as kernel_name, COUNT(*) as dispatch_count, MIN(duration) as min_duration, AVG(duration) as avg_duration, MAX(duration) as max_duration, SUM(duration) as total_duration FROM kernels GROUP BY kernel_name ORDER BY total_duration DESC"
Memory transfer analysis:
# Memory copy analysis by direction rocpd query -i profile.db --query " SELECT name as kernel_name, src_agent_type, src_agent_abs_index, dst_agent_type, dst_agent_abs_index, COUNT(*) as transfer_count, SUM(size) as total_bytes, SUM(duration) as total_duration FROM memory_copies GROUP BY src_agent_abs_index ORDER BY total_bytes DESC"
Output format options#
rocpd query supports the following output formats:
Console output (default):
# Display results in terminal rocpd query -i profile.db --query "SELECT * FROM top_kernels LIMIT 5"
CSV export for data analysis:
# Export to CSV file rocpd query -i profile.db --query "SELECT * FROM kernels" --format csv -o kernel_analysis # Specify custom output directory rocpd query -i profile.db --query "SELECT * FROM kernels" --format csv -d ~/analysis/ -o kernel_data
HTML reports:
# Generate HTML table rocpd query -i profile.db --query "SELECT * FROM top_kernels" --format html -o performance_report
Interactive dashboard:
# Create interactive HTML dashboard rocpd query -i profile.db --query "SELECT * FROM device_utilization" --format dashboard -o utilization_dashboard # Use custom dashboard template rocpd query -i profile.db --query "SELECT * FROM kernels" --format dashboard \ --template-path ~/templates/custom_dashboard.html -o custom_report
JSON for programmatic integration:
# Export structured JSON data rocpd query -i profile.db --query "SELECT * FROM counters_collection" --format json -o counter_data
PDF reports:
# Generate PDF report with monospace formatting rocpd query -i profile.db --query "SELECT name, duration FROM top_kernels" --format pdf -o kernel_report
Script-based analysis#
You can use rocpd query to execute complex SQL scripts with view definitions and custom analysis logic.
Here is a sample SQL script (analysis.sql):
-- Create temporary views for complex analysis
CREATE TEMP VIEW kernel_stats AS
SELECT
name as kernel_name,
COUNT(*) as dispatch_count,
AVG(duration) as avg_duration,
STDDEV(duration) as duration_stddev
FROM kernels
GROUP BY kernel_name;
CREATE TEMP VIEW performance_outliers AS
SELECT k.*, ks.avg_duration, ks.duration_stddev
FROM kernels k
JOIN kernel_stats ks ON k.name = ks.name
WHERE ABS(k.duration - ks.avg_duration) > 2 * ks.duration_stddev;
Here is how to execute the preceding script using rocpd query:
# Run script then execute query
rocpd query -i profile.db --script analysis.sql \
--query "SELECT * FROM performance_outliers" --format html -o outlier_analysis
Time window integration#
Here is how to apply temporal filtering before query execution:
# Query only middle 50% of execution timeline
rocpd query -i profile.db --start 25% --end 75% \
--query "SELECT COUNT(*) as kernel_count FROM kernels"
# Use marker-based time windows
rocpd query -i profile.db --start-marker "computation_begin" --end-marker "computation_end" \
--query "SELECT * FROM kernels ORDER BY start_time"
# Absolute timestamp filtering
rocpd query -i profile.db --start 1000000000 --end 2000000000 \
--query "SELECT * FROM kernels WHERE start_time BETWEEN 1000000000 AND 2000000000"
Automated Email reporting#
rocpd query facilitates sending reports through Email. Here is how you can send reports in different formats and use various Email configuration options:
Basic Email delivery:
# Send CSV report via email rocpd query -i profile.db --query "SELECT * FROM top_kernels" --format csv \ --email-to analyst@company.com --email-from profiler@company.com \ --email-subject "Weekly Performance Report"
Advanced Email configuration:
# Multiple recipients with SMTP authentication rocpd query -i profile.db --query "SELECT * FROM device_utilization" --format html \ --email-to "team@company.com,manager@company.com" \ --email-from profiler@company.com \ --email-subject "GPU Utilization Analysis" \ --smtp-server smtp.company.com --smtp-port 587 \ --smtp-user profiler@company.com --smtp-password $(cat ~/.smtp_pass) \ --inline-preview --zip-attachments
Dashboard Email reports:
# Send interactive dashboard via email rocpd query -i profile.db --query "SELECT * FROM kernels" --format dashboard \ --template-path ~/templates/executive_summary.html \ --email-to executives@company.com --email-from profiler@company.com \ --email-subject "Executive Performance Dashboard" \ --inline-preview
Integration workflows#
Automated analysis pipeline:
#!/bin/bash # Automated reporting script DB_FILE="$1" REPORT_DATE=$(date +%Y-%m-%d) # Generate multiple analysis reports rocpd query -i "$DB_FILE" --query "SELECT * FROM top_kernels LIMIT 20" \ --format html -o "top_kernels_$REPORT_DATE" rocpd query -i "$DB_FILE" --query "SELECT * FROM memory_copy_summary" \ --format csv -o "memory_analysis_$REPORT_DATE" rocpd query -i "$DB_FILE" --query "SELECT * FROM device_utilization" \ --format dashboard -o "utilization_dashboard_$REPORT_DATE" \ --email-to team@company.com --email-from automation@company.com
Performance regression detection:
# Compare current performance against baseline rocpd query -i baseline.db current.db --script performance_comparison.sql \ --query "SELECT * FROM performance_regression_analysis" \ --format html -o regression_report \ --email-to devteam@company.com --email-from ci@company.com \ --email-subject "Performance Regression Analysis"
Custom analysis functions:
rocpd databases support custom SQL functions for advanced analysis:
# Use built-in rocpd functions rocpd query -i profile.db --query " SELECT name, rocpd_get_string(name_id, 0, nid, pid) as full_kernel_name, duration FROM kernels WHERE rocpd_get_string(name_id, 0, nid, pid) LIKE '%gemm%'"
rocpd query command-line reference#
The command-line options as displayed using rocpd query --help are listed here:
usage: rocpd query [-h] -i INPUT [INPUT ...] --query QUERY [--script SCRIPT]
[--format {console,csv,html,json,md,pdf,dashboard,clipboard}]
[-o OUTPUT_FILE] [-d OUTPUT_PATH]
[--email-to EMAIL_TO] [--email-from EMAIL_FROM]
[--email-subject EMAIL_SUBJECT] [--smtp-server SMTP_SERVER]
[--smtp-port SMTP_PORT] [--smtp-user SMTP_USER]
[--smtp-password SMTP_PASSWORD] [--zip-attachments]
[--inline-preview] [--template-path TEMPLATE_PATH]
[--start START | --start-marker START_MARKER]
[--end END | --end-marker END_MARKER]
The following table provides a detailed listing of the rocpd query command-line options:
Category |
Option |
Description |
|---|---|---|
Required arguments |
-i INPUT [INPUT ...], --input INPUT [INPUT ...] --query QUERY |
The input database file paths. Multiple databases are merged into a unified view.
The SQL SELECT statement to be executed. Enclose complex queries in quotes.
|
Query options |
--script SCRIPT --format {console,csv,html,json,md,pdf,dashboard,clipboard} |
The SQL script file to be executed before running the main query. Useful for creating views and functions.
The output format. Dashboard format creates interactive HTML reports. The default value is
console. |
Output configuration |
-o OUTPUT_FILE, --output-file OUTPUT_FILE -d OUTPUT_PATH, --output-path OUTPUT_PATH --template-path TEMPLATE_PATH |
Base filename for exported files.
Output directory path.
Jinja2 template file for dashboard format customization.
|
Email reporting |
--email-to EMAIL_TO --email-from EMAIL_FROM --email-subject EMAIL_SUBJECT --smtp-server SMTP_SERVER, --smtp-port SMTP_PORT --smtp-user SMTP_USER, --smtp-password SMTP_PASSWORD --zip-attachments --inline-preview |
Recipient Email addresses (comma-separated for multiple recipients).
Sender Email address. This is required when using Email delivery.
Email subject line.
SMTP server configuration. The default value is
localhost:25. SMTP authentication credentials.
Bundles all attachments into a single ZIP file.
Embeds HTML reports as Email body content.
|
Time window filtering
|
--start START, --end END --start-marker START_MARKER, --end-marker END_MARKER |
Temporal boundaries using percentage such as 25% or absolute timestamps.
Named marker events defining time window boundaries.
|