ROCProfilerV2 API#
ROCProfilerV2 provides an API that allows fine-grained control over the
profiling process. Like the CLI tool rocprofv2
, the API supports both application
tracing and kernel profiling. ROCProfilerV2 API allows you to create a session
and invoke application tracing or kernel profiling within the session. The following
sections describe session management, application tracing and kernel profiling
using ROCProfilerV2 API.
Profiling Sessions#
A ROCProfilerV2 session maintains the global profiling state for an application. It is a unique identifier for a profiling or tracing task that is specified within the session. A session contains sufficient information about what needs to be collected or traced and it allows you to start/stop profiling/tracing as and when required.
The following demonstrates the use of session management APIs:
// Initialize the tools
rocprofiler_initialize();
// Create the session with no replay mode
rocprofiler_session_id_t session_id;
rocprofiler_create_session(ROCPROFILER_NONE_REPLAY_MODE, &session_id);
// Start Session
rocprofiler_start_session(session_id);
// profile a kernel -kernelA
hipLaunchKernelGGL(kernelA, dim3(1), dim3(1), 0, 0);
// Deactivating session
rocprofiler_terminate_session(session_id);
// Destroy sessions
rocprofiler_destroy_session(session_id);
// Destroy all profiling related objects
rocprofiler_finalize();
The following is a typical session management workflow:
Initialize ROCProfilerV2 using
rocprofiler_initialize
.Create a session using
rocprofiler_create_session
. The created session keeps track of the global status of the application profiling.Note
You can only create a session in no-replay mode (ROCPROFILER_NONE_REPLAY_MODE) which allows kernels to be run only once.
Create a buffer to hold the results using
rocprofiler_create_buffer
.Create filters using
rocprofiler_create_filter
to specify the profiling task such as application tracing or metrics/counters collection.Note
If the same filter is applied twice with different values, the latter application of the filter is considered the recent one, which overwrites the former application of the filter. To learn about the types of filters and their utility, see Filters.
Start the session with
rocprofiler_start_session
.Run the specified kernels to collect traces or counters/metrics (as specified in the filter)
Terminate the session with
rocprofiler_terminate_session
and flush the profiling results usingrocprofiler_flush_data
.Note
The session must be terminated after the kernel completes (synchronization required). If a session is stopped before the completion of kernel execution within that session, the instrumentation data is undefined. Additionally, a session can be restarted after terminating.
Destroy the session with
rocprofiler_destroy_session
and finalize profiling withrocprofiler_finalize
.
See working examples demonstrating the use of the ROCProfilerV2 API in Application Tracing and Kernel Profiling.
Filters#
As explained in Profiling Sessions, filters
allow you to specify a profiling task within a session. For different
profiling tasks, different filters are specified as a parameter to
rocprofiler_create_filter
.
See the list of filters in the table below:
Filter |
Purpose |
---|---|
|
To trace API calls. You must specify the API calls to be traced, in a vector. |
|
To track all the kernel execution’s start and end times on the GPUs |
|
To collect counters. You must specify the counters to be collected, in a vector. |
Application Tracing#
The following code demonstrates the usage of ROCProfilerV2 APIs to
trace an application. This example traces HIP APIs, HIP
asynchronous activities, HSA APIs, HSA asynchronous activities, and
ROCTX ranges. Note the use of ROCPROFILER_API_TRACE
filter to trace
API calls, and ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION
filter to trace the
kernel.
int main(int argc, char*\* argv) {
int\* gpuMem;
prepare();
// Initialize the tools
CHECK_ROCPROFILER(rocprofiler_initialize());
// Creating the session with given replay mode
rocprofiler_session_id_t session_id;
CHECK_ROCPROFILER(rocprofiler_create_session(ROCPROFILER_NONE_REPLAY_MODE, &session_id));
// Creating Output Buffer for the data
rocprofiler_buffer_id_t buffer_id;
CHECK_ROCPROFILER(rocprofiler_create_buffer(session_id,
[](const rocprofiler_record_header_t\* record, const
rocprofiler_record_header_t\* end_record,
rocprofiler_session_id_t session_id, rocprofiler_buffer_id_t
buffer_id) {
WriteBufferRecords(record, end_record, session_id, buffer_id);
},
0x9999, &buffer_id));
// Specifying the APIs to be traced in a vector
std::vector<rocprofiler_tracer_activity_domain_t> apis_requested;
apis_requested.emplace_back(ACTIVITY_DOMAIN_HIP_API);
apis_requested.emplace_back(ACTIVITY_DOMAIN_HIP_OPS);
apis_requested.emplace_back(ACTIVITY_DOMAIN_HSA_API);
apis_requested.emplace_back(ACTIVITY_DOMAIN_HSA_OPS);
apis_requested.emplace_back(ACTIVITY_DOMAIN_ROCTX);
rocprofiler_filter_id_t api_tracing_filter_id;
// Creating filter for tracing APIs
CHECK_ROCPROFILER(rocprofiler_create_filter(
session_id, ROCPROFILER_API_TRACE,
rocprofiler_filter_data_t{&apis_requested[0]}, apis_requested.size(),
&api_tracing_filter_id, rocprofiler_filter_property_t{}));
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id,
api_tracing_filter_id, buffer_id));
// Kernel Tracing
rocprofiler_filter_id_t kernel_tracing_filter_id;
CHECK_ROCPROFILER(rocprofiler_create_filter(session_id,
ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION, rocprofiler_filter_data_t{},
0, &kernel_tracing_filter_id, rocprofiler_filter_property_t{}));
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id,
kernel_tracing_filter_id, buffer_id));
// Normal HIP Calls won't be traced
hipDeviceProp_t devProp;
HIP_CALL(hipGetDeviceProperties(&devProp, 0));
HIP_CALL(hipMalloc((void**)&gpuMem, 1 \* sizeof(int)));
// KernelA and KernelB won't be traced
kernelCalls('A');
kernelCalls('B');
// Activating Profiling Session to profile whatever kernel launches occur
// up to the next terminate session
CHECK_ROCPROFILER(rocprofiler_start_session(session_id));
// KernelC, KernelD, KernelE and KernelF to be traced as part of the session
kernelCalls('C');
kernelCalls('D');
kernelCalls('E');
kernelCalls('F');
// Normal HIP Calls that will be traced
HIP_CALL(hipFree(gpuMem));
// Deactivating session
CHECK_ROCPROFILER(rocprofiler_terminate_session(session_id));
// Manual Flush user buffer request
CHECK_ROCPROFILER(rocprofiler_flush_data(session_id, buffer_id));
// Destroy sessions
CHECK_ROCPROFILER(rocprofiler_destroy_session(session_id));
// Destroy all profiling related objects (User buffer, sessions, filters, etc..)
CHECK_ROCPROFILER(rocprofiler_finalize());
return 0;
}
Kernel Profiling#
The following is a full-application example that utilizes the ROCProfilerV2
API to profile the kernels. The ROCPROFILER_COUNTERS_COLLECTION
filter for
counter collection distinguishes this example from the one in Application Tracing.
The GRBM_COUNT
counter to be collected is specified in a vector of strings as
shown.
#include <hip/hip_runtime.h>
#include <rocprofiler/v2/rocprofiler.h>
int main(int argc, char*\* argv) {
int\* gpuMem;
// Initialize the tools
CHECK_ROCPROFILER(rocprofiler_initialize());
// Creating the session with given replay mode
rocprofiler_session_id_t session_id;
CHECK_ROCPROFILER(rocprofiler_create_session(ROCPROFILER_NONE_REPLAY_MODE,
&session_id));
// Creating Output Buffer for the data
rocprofiler_buffer_id_t buffer_id;
CHECK_ROCPROFILER(rocprofiler_create_buffer(session_id,
[](const rocprofiler_record_header_t\* record, const
rocprofiler_record_header_t\* end_record,
rocprofiler_session_id_t session_id, rocprofiler_buffer_id_t
buffer_id) {
WriteBufferRecords(record, end_record, session_id, buffer_id);
},
0x9999, &buffer_id));
// Counter Collection Filter
std::vector<const char*> counters;
counters.emplace_back("GRBM_COUNT");
rocprofiler_filter_id_t filter_id;
[[maybe_unused]] rocprofiler_filter_property_t property = {};
CHECK_ROCPROFILER(rocprofiler_create_filter(session_id,
ROCPROFILER_COUNTERS_COLLECTION,
rocprofiler_filter_data_t{.counters_names = &counters[0]},
counters.size(), &filter_id, property));
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, filter_id,
buffer_id));
// Normal HIP Calls
hipDeviceProp_t devProp;
HIP_CALL(hipGetDeviceProperties(&devProp, 0));
HIP_CALL(hipMalloc((void**)&gpuMem, 1 \* sizeof(int)));
// KernelA and KernelB won't be profiled
kernelCalls('A');
kernelCalls('B');
// Activating Profiling Session to profile whatever kernel launches occur
// up to the next terminate session
CHECK_ROCPROFILER(rocprofiler_start_session(session_id));
// KernelC, KernelD, KernelE and KernelF to be profiled as part of the session
kernelCalls('C');
kernelCalls('D');
kernelCalls('E');
kernelCalls('F');
// Normal HIP Calls
HIP_CALL(hipFree(gpuMem));
// Deactivating session
CHECK_ROCPROFILER(rocprofiler_terminate_session(session_id));
// Manual Flush user buffer request
CHECK_ROCPROFILER(rocprofiler_flush_data(session_id, buffer_id));
// Destroy sessions
CHECK_ROCPROFILER(rocprofiler_destroy_session(session_id));
// Destroy all profiling related objects (User buffer, sessions, filters, etc..)
CHECK_ROCPROFILER(rocprofiler_finalize());
return 0;
}