ROCProfilerV2 API#

ROCProfilerV2 provides an API that allows fine-grained control over the profiling process. Like the CLI tool rocprofv2, the API supports both application tracing and kernel profiling. ROCProfilerV2 API allows you to create a session and invoke application tracing or kernel profiling within the session. The following sections describe session management, application tracing and kernel profiling using ROCProfilerV2 API.

Profiling Sessions#

A ROCProfilerV2 session maintains the global profiling state for an application. It is a unique identifier for a profiling or tracing task that is specified within the session. A session contains sufficient information about what needs to be collected or traced and it allows you to start/stop profiling/tracing as and when required.

The following demonstrates the use of session management APIs:

// Initialize the tools
   rocprofiler_initialize();

// Create the session with no replay mode
   rocprofiler_session_id_t session_id;
   rocprofiler_create_session(ROCPROFILER_NONE_REPLAY_MODE, &session_id);

// Start Session
   rocprofiler_start_session(session_id);

// profile a kernel -kernelA
   hipLaunchKernelGGL(kernelA, dim3(1), dim3(1), 0, 0);

// Deactivating session
   rocprofiler_terminate_session(session_id);

// Destroy sessions
   rocprofiler_destroy_session(session_id);

// Destroy all profiling related objects
   rocprofiler_finalize();

The following is a typical session management workflow:

  1. Initialize ROCProfilerV2 using rocprofiler_initialize.

  2. Create a session using rocprofiler_create_session. The created session keeps track of the global status of the application profiling.

    Note

    You can only create a session in no-replay mode (ROCPROFILER_NONE_REPLAY_MODE) which allows kernels to be run only once.

  3. Create a buffer to hold the results using rocprofiler_create_buffer.

  4. Create filters using rocprofiler_create_filter to specify the profiling task such as application tracing or metrics/counters collection.

    Note

    If the same filter is applied twice with different values, the latter application of the filter is considered the recent one, which overwrites the former application of the filter. To learn about the types of filters and their utility, see Filters.

  5. Start the session with rocprofiler_start_session.

  6. Run the specified kernels to collect traces or counters/metrics (as specified in the filter)

  7. Terminate the session with rocprofiler_terminate_session and flush the profiling results using rocprofiler_flush_data.

    Note

    The session must be terminated after the kernel completes (synchronization required). If a session is stopped before the completion of kernel execution within that session, the instrumentation data is undefined. Additionally, a session can be restarted after terminating.

  8. Destroy the session with rocprofiler_destroy_session and finalize profiling with rocprofiler_finalize.

See working examples demonstrating the use of the ROCProfilerV2 API in Application Tracing and Kernel Profiling.

Filters#

As explained in Profiling Sessions, filters allow you to specify a profiling task within a session. For different profiling tasks, different filters are specified as a parameter to rocprofiler_create_filter.

See the list of filters in the table below:

Filter

Purpose

ROCPROFILER_API_TRACE

To trace API calls. You must specify the API calls to be traced, in a vector.

ROCPROFILER_DISPATCH _TIMESTAMPS_COLLECTION

To track all the kernel execution’s start and end times on the GPUs

ROCPROFILER_COUNTERS_COLLECTION

To collect counters. You must specify the counters to be collected, in a vector.

Application Tracing#

The following code demonstrates the usage of ROCProfilerV2 APIs to trace an application. This example traces HIP APIs, HIP asynchronous activities, HSA APIs, HSA asynchronous activities, and ROCTX ranges. Note the use of ROCPROFILER_API_TRACE filter to trace API calls, and ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION filter to trace the kernel.

int main(int argc, char*\* argv) {
   int\* gpuMem;
   prepare();
   // Initialize the tools
   CHECK_ROCPROFILER(rocprofiler_initialize());

   // Creating the session with given replay mode
   rocprofiler_session_id_t session_id;
   CHECK_ROCPROFILER(rocprofiler_create_session(ROCPROFILER_NONE_REPLAY_MODE, &session_id));

   // Creating Output Buffer for the data
   rocprofiler_buffer_id_t buffer_id;
   CHECK_ROCPROFILER(rocprofiler_create_buffer(session_id,
      [](const rocprofiler_record_header_t\* record, const
      rocprofiler_record_header_t\* end_record,
      rocprofiler_session_id_t session_id, rocprofiler_buffer_id_t
      buffer_id) {
         WriteBufferRecords(record, end_record, session_id, buffer_id);
      },
      0x9999, &buffer_id));

   // Specifying the APIs to be traced in a vector
   std::vector<rocprofiler_tracer_activity_domain_t> apis_requested;
   apis_requested.emplace_back(ACTIVITY_DOMAIN_HIP_API);
   apis_requested.emplace_back(ACTIVITY_DOMAIN_HIP_OPS);
   apis_requested.emplace_back(ACTIVITY_DOMAIN_HSA_API);
   apis_requested.emplace_back(ACTIVITY_DOMAIN_HSA_OPS);
   apis_requested.emplace_back(ACTIVITY_DOMAIN_ROCTX);
   rocprofiler_filter_id_t api_tracing_filter_id;

   // Creating filter for tracing APIs
   CHECK_ROCPROFILER(rocprofiler_create_filter(
      session_id, ROCPROFILER_API_TRACE,
      rocprofiler_filter_data_t{&apis_requested[0]}, apis_requested.size(),
      &api_tracing_filter_id, rocprofiler_filter_property_t{}));
   CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id,
      api_tracing_filter_id, buffer_id));

   // Kernel Tracing
   rocprofiler_filter_id_t kernel_tracing_filter_id;
   CHECK_ROCPROFILER(rocprofiler_create_filter(session_id,
      ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION, rocprofiler_filter_data_t{},
      0, &kernel_tracing_filter_id, rocprofiler_filter_property_t{}));
   CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id,
      kernel_tracing_filter_id, buffer_id));

   // Normal HIP Calls won't be traced
   hipDeviceProp_t devProp;
   HIP_CALL(hipGetDeviceProperties(&devProp, 0));
   HIP_CALL(hipMalloc((void**)&gpuMem, 1 \* sizeof(int)));
   // KernelA and KernelB won't be traced
   kernelCalls('A');
   kernelCalls('B');

   // Activating Profiling Session to profile whatever kernel launches occur
   // up to the next terminate session
   CHECK_ROCPROFILER(rocprofiler_start_session(session_id));

   // KernelC, KernelD, KernelE and KernelF to be traced as part of the session
   kernelCalls('C');
   kernelCalls('D');
   kernelCalls('E');
   kernelCalls('F');
   // Normal HIP Calls that will be traced
   HIP_CALL(hipFree(gpuMem));

   // Deactivating session
   CHECK_ROCPROFILER(rocprofiler_terminate_session(session_id));

   // Manual Flush user buffer request
   CHECK_ROCPROFILER(rocprofiler_flush_data(session_id, buffer_id));

   // Destroy sessions
   CHECK_ROCPROFILER(rocprofiler_destroy_session(session_id));

   // Destroy all profiling related objects (User buffer, sessions, filters, etc..)
   CHECK_ROCPROFILER(rocprofiler_finalize());

   return 0;
}

Kernel Profiling#

The following is a full-application example that utilizes the ROCProfilerV2 API to profile the kernels. The ROCPROFILER_COUNTERS_COLLECTION filter for counter collection distinguishes this example from the one in Application Tracing. The GRBM_COUNT counter to be collected is specified in a vector of strings as shown.

#include <hip/hip_runtime.h>
#include <rocprofiler/v2/rocprofiler.h>

int main(int argc, char*\* argv) {
   int\* gpuMem;

   // Initialize the tools
   CHECK_ROCPROFILER(rocprofiler_initialize());

   // Creating the session with given replay mode
   rocprofiler_session_id_t session_id;
   CHECK_ROCPROFILER(rocprofiler_create_session(ROCPROFILER_NONE_REPLAY_MODE,
   &session_id));

   // Creating Output Buffer for the data
   rocprofiler_buffer_id_t buffer_id;
   CHECK_ROCPROFILER(rocprofiler_create_buffer(session_id,
      [](const rocprofiler_record_header_t\* record, const
      rocprofiler_record_header_t\* end_record,
      rocprofiler_session_id_t session_id, rocprofiler_buffer_id_t
      buffer_id) {
         WriteBufferRecords(record, end_record, session_id, buffer_id);
      },
      0x9999, &buffer_id));

   // Counter Collection Filter
   std::vector<const char*> counters;
   counters.emplace_back("GRBM_COUNT");
   rocprofiler_filter_id_t filter_id;
   [[maybe_unused]] rocprofiler_filter_property_t property = {};
   CHECK_ROCPROFILER(rocprofiler_create_filter(session_id,
      ROCPROFILER_COUNTERS_COLLECTION,
      rocprofiler_filter_data_t{.counters_names = &counters[0]},
      counters.size(), &filter_id, property));
   CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, filter_id,
      buffer_id));

   // Normal HIP Calls
   hipDeviceProp_t devProp;
   HIP_CALL(hipGetDeviceProperties(&devProp, 0));
   HIP_CALL(hipMalloc((void**)&gpuMem, 1 \* sizeof(int)));

   // KernelA and KernelB won't be profiled
   kernelCalls('A');
   kernelCalls('B');

   // Activating Profiling Session to profile whatever kernel launches occur
   // up to the next terminate session
   CHECK_ROCPROFILER(rocprofiler_start_session(session_id));

   // KernelC, KernelD, KernelE and KernelF to be profiled as part of the session
   kernelCalls('C');
   kernelCalls('D');
   kernelCalls('E');
   kernelCalls('F');
   // Normal HIP Calls
   HIP_CALL(hipFree(gpuMem));

   // Deactivating session
   CHECK_ROCPROFILER(rocprofiler_terminate_session(session_id));

   // Manual Flush user buffer request
   CHECK_ROCPROFILER(rocprofiler_flush_data(session_id, buffer_id));

   // Destroy sessions
   CHECK_ROCPROFILER(rocprofiler_destroy_session(session_id));

   // Destroy all profiling related objects (User buffer, sessions, filters, etc..)
   CHECK_ROCPROFILER(rocprofiler_finalize());

   return 0;
}