ROCProfilerV2 API#
ROCProfilerV2 provides an API that allows fine-grained control over the
profiling process. Like the CLI tool rocprofv2, the API supports both application
tracing and kernel profiling. ROCProfilerV2 API allows you to create a session
and invoke application tracing or kernel profiling within the session. The following
sections describe session management, application tracing and kernel profiling
using ROCProfilerV2 API.
Profiling sessions#
A ROCProfilerV2 session maintains the global profiling state for an application. It is a unique identifier for a profiling or tracing task that is specified within the session. A session contains sufficient information about what needs to be collected or traced and it allows you to start/stop profiling/tracing as and when required.
The following demonstrates the use of session management APIs:
// Initialize the tools
   rocprofiler_initialize();
// Create the session with no replay mode
   rocprofiler_session_id_t session_id;
   rocprofiler_create_session(ROCPROFILER_NONE_REPLAY_MODE, &session_id);
// Start Session
   rocprofiler_start_session(session_id);
// profile a kernel -kernelA
   hipLaunchKernelGGL(kernelA, dim3(1), dim3(1), 0, 0);
// Deactivating session
   rocprofiler_terminate_session(session_id);
// Destroy sessions
   rocprofiler_destroy_session(session_id);
// Destroy all profiling related objects
   rocprofiler_finalize();
The following is a typical session management workflow:
- Initialize ROCProfilerV2 using - rocprofiler_initialize.
- Create a session using - rocprofiler_create_session. The created session keeps track of the global status of the application profiling.- Note - You can only create a session in no-replay mode (ROCPROFILER_NONE_REPLAY_MODE) which allows kernels to be run only once. 
- Create a buffer to hold the results using - rocprofiler_create_buffer.
- Create filters using - rocprofiler_create_filterto specify the profiling task such as application tracing or metrics/counters collection.- Note - If the same filter is applied twice with different values, the latter application of the filter is considered the recent one, which overwrites the former application of the filter. To learn about the types of filters and their utility, see Filters. 
- Start the session with - rocprofiler_start_session.
- Run the specified kernels to collect traces or counters/metrics (as specified in the filter) 
- Terminate the session with - rocprofiler_terminate_sessionand flush the profiling results using- rocprofiler_flush_data.- Note - The session must be terminated after the kernel completes (synchronization required). If a session is stopped before the completion of kernel execution within that session, the instrumentation data is undefined. Additionally, a session can be restarted after terminating. 
- Destroy the session with - rocprofiler_destroy_sessionand finalize profiling with- rocprofiler_finalize.
See working examples demonstrating the use of the ROCProfilerV2 API in Application Tracing and Kernel Profiling.
Filters#
As explained in Profiling Sessions, filters
allow you to specify a profiling task within a session. For different
profiling tasks, different filters are specified as a parameter to
rocprofiler_create_filter.
See the list of filters in the table below:
| Filter | Purpose | 
|---|---|
| 
 | To trace API calls. You must specify the API calls to be traced, in a vector. | 
| 
 | To track all the kernel execution’s start and end times on the GPUs | 
| 
 | To collect counters. You must specify the counters to be collected, in a vector. | 
Application tracing#
The following code demonstrates the usage of ROCProfilerV2 APIs to
trace an application. This example traces HIP APIs, HIP
asynchronous activities, HSA APIs, HSA asynchronous activities, and
ROCTX ranges. Note the use of ROCPROFILER_API_TRACE filter to trace
API calls, and ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION filter to trace the
kernel.
int main(int argc, char*\* argv) {
   int\* gpuMem;
   prepare();
   // Initialize the tools
   CHECK_ROCPROFILER(rocprofiler_initialize());
   // Creating the session with given replay mode
   rocprofiler_session_id_t session_id;
   CHECK_ROCPROFILER(rocprofiler_create_session(ROCPROFILER_NONE_REPLAY_MODE, &session_id));
   // Creating Output Buffer for the data
   rocprofiler_buffer_id_t buffer_id;
   CHECK_ROCPROFILER(rocprofiler_create_buffer(session_id,
      [](const rocprofiler_record_header_t\* record, const
      rocprofiler_record_header_t\* end_record,
      rocprofiler_session_id_t session_id, rocprofiler_buffer_id_t
      buffer_id) {
         WriteBufferRecords(record, end_record, session_id, buffer_id);
      },
      0x9999, &buffer_id));
   // Specifying the APIs to be traced in a vector
   std::vector<rocprofiler_tracer_activity_domain_t> apis_requested;
   apis_requested.emplace_back(ACTIVITY_DOMAIN_HIP_API);
   apis_requested.emplace_back(ACTIVITY_DOMAIN_HIP_OPS);
   apis_requested.emplace_back(ACTIVITY_DOMAIN_HSA_API);
   apis_requested.emplace_back(ACTIVITY_DOMAIN_HSA_OPS);
   apis_requested.emplace_back(ACTIVITY_DOMAIN_ROCTX);
   rocprofiler_filter_id_t api_tracing_filter_id;
   // Creating filter for tracing APIs
   CHECK_ROCPROFILER(rocprofiler_create_filter(
      session_id, ROCPROFILER_API_TRACE,
      rocprofiler_filter_data_t{&apis_requested[0]}, apis_requested.size(),
      &api_tracing_filter_id, rocprofiler_filter_property_t{}));
   CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id,
      api_tracing_filter_id, buffer_id));
   // Kernel Tracing
   rocprofiler_filter_id_t kernel_tracing_filter_id;
   CHECK_ROCPROFILER(rocprofiler_create_filter(session_id,
      ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION, rocprofiler_filter_data_t{},
      0, &kernel_tracing_filter_id, rocprofiler_filter_property_t{}));
   CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id,
      kernel_tracing_filter_id, buffer_id));
   // Normal HIP Calls won't be traced
   hipDeviceProp_t devProp;
   HIP_CALL(hipGetDeviceProperties(&devProp, 0));
   HIP_CALL(hipMalloc((void**)&gpuMem, 1 \* sizeof(int)));
   // KernelA and KernelB won't be traced
   kernelCalls('A');
   kernelCalls('B');
   // Activating Profiling Session to profile whatever kernel launches occur
   // up to the next terminate session
   CHECK_ROCPROFILER(rocprofiler_start_session(session_id));
   // KernelC, KernelD, KernelE and KernelF to be traced as part of the session
   kernelCalls('C');
   kernelCalls('D');
   kernelCalls('E');
   kernelCalls('F');
   // Normal HIP Calls that will be traced
   HIP_CALL(hipFree(gpuMem));
   // Deactivating session
   CHECK_ROCPROFILER(rocprofiler_terminate_session(session_id));
   // Manual Flush user buffer request
   CHECK_ROCPROFILER(rocprofiler_flush_data(session_id, buffer_id));
   // Destroy sessions
   CHECK_ROCPROFILER(rocprofiler_destroy_session(session_id));
   // Destroy all profiling related objects (User buffer, sessions, filters, etc..)
   CHECK_ROCPROFILER(rocprofiler_finalize());
   return 0;
}
Kernel profiling#
The following is a full-application example that utilizes the ROCProfilerV2
API to profile the kernels. The ROCPROFILER_COUNTERS_COLLECTION filter for
counter collection distinguishes this example from the one in Application tracing.
The GRBM_COUNT counter to be collected is specified in a vector of strings as
shown.
#include <hip/hip_runtime.h>
#include <rocprofiler/v2/rocprofiler.h>
int main(int argc, char*\* argv) {
   int\* gpuMem;
   // Initialize the tools
   CHECK_ROCPROFILER(rocprofiler_initialize());
   // Creating the session with given replay mode
   rocprofiler_session_id_t session_id;
   CHECK_ROCPROFILER(rocprofiler_create_session(ROCPROFILER_NONE_REPLAY_MODE,
   &session_id));
   // Creating Output Buffer for the data
   rocprofiler_buffer_id_t buffer_id;
   CHECK_ROCPROFILER(rocprofiler_create_buffer(session_id,
      [](const rocprofiler_record_header_t\* record, const
      rocprofiler_record_header_t\* end_record,
      rocprofiler_session_id_t session_id, rocprofiler_buffer_id_t
      buffer_id) {
         WriteBufferRecords(record, end_record, session_id, buffer_id);
      },
      0x9999, &buffer_id));
   // Counter Collection Filter
   std::vector<const char*> counters;
   counters.emplace_back("GRBM_COUNT");
   rocprofiler_filter_id_t filter_id;
   [[maybe_unused]] rocprofiler_filter_property_t property = {};
   CHECK_ROCPROFILER(rocprofiler_create_filter(session_id,
      ROCPROFILER_COUNTERS_COLLECTION,
      rocprofiler_filter_data_t{.counters_names = &counters[0]},
      counters.size(), &filter_id, property));
   CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, filter_id,
      buffer_id));
   // Normal HIP Calls
   hipDeviceProp_t devProp;
   HIP_CALL(hipGetDeviceProperties(&devProp, 0));
   HIP_CALL(hipMalloc((void**)&gpuMem, 1 \* sizeof(int)));
   // KernelA and KernelB won't be profiled
   kernelCalls('A');
   kernelCalls('B');
   // Activating Profiling Session to profile whatever kernel launches occur
   // up to the next terminate session
   CHECK_ROCPROFILER(rocprofiler_start_session(session_id));
   // KernelC, KernelD, KernelE and KernelF to be profiled as part of the session
   kernelCalls('C');
   kernelCalls('D');
   kernelCalls('E');
   kernelCalls('F');
   // Normal HIP Calls
   HIP_CALL(hipFree(gpuMem));
   // Deactivating session
   CHECK_ROCPROFILER(rocprofiler_terminate_session(session_id));
   // Manual Flush user buffer request
   CHECK_ROCPROFILER(rocprofiler_flush_data(session_id, buffer_id));
   // Destroy sessions
   CHECK_ROCPROFILER(rocprofiler_destroy_session(session_id));
   // Destroy all profiling related objects (User buffer, sessions, filters, etc..)
   CHECK_ROCPROFILER(rocprofiler_finalize());
   return 0;
}
Environment variables#
- ROCPROFILER_LOG: When set to 1, enables error messages logging to- /tmp/rocprofiler_log.txt.
- ROCPROFILER_TRUNCATE_KERNEL_PATH: When set to 1, enables truncation of kernel names for improved output readability. By default, the kernel names are not truncated.