PC Sampling

PC Sampling#

Rocprofiler SDK Developer API: PC Sampling
Rocprofiler SDK Developer API 0.4.0
ROCm Profiling API and tools

Enabling PC (Program Counter) Sampling for GPU Activity. More...

Data Structures

struct  rocprofiler_pc_sampling_configuration_t
 PC sampling configuration supported by a GPU agent. More...
 
struct  rocprofiler_pc_sampling_header_v1_t
 The header of the rocprofiler_pc_sampling_record_t, indicating what fields of the rocprofiler_pc_sampling_record_t instance are meaningful for the sample. More...
 
struct  rocprofiler_pc_sampling_snapshot_v1_t
 For future use. More...
 
struct  rocprofiler_pc_sampling_record_t
 ROCProfiler PC Sampling Record corresponding to the interrupted wave. More...
 
struct  rocprofiler_pc_sampling_code_object_load_marker_t
 Marker representing code object loading event. More...
 
struct  rocprofiler_pc_sampling_code_object_unload_marker_t
 Marker representing code object unloading event. More...
 

Typedefs

typedef rocprofiler_status_t(* rocprofiler_available_pc_sampling_configurations_cb_t) (const rocprofiler_pc_sampling_configuration_t *configs, unsigned long num_config, void *user_data)
 Rocprofiler SDK's callback function to deliver the list of available PC sampling configurations upon the call to the rocprofiler_query_pc_sampling_agent_configurations.
 

Functions

rocprofiler_status_t rocprofiler_configure_pc_sampling_service (rocprofiler_context_id_t context_id, rocprofiler_agent_id_t agent_id, rocprofiler_pc_sampling_method_t method, rocprofiler_pc_sampling_unit_t unit, uint64_t interval, rocprofiler_buffer_id_t buffer_id)
 Function used to configure the PC sampling service on the GPU agent with agent_id.
 
rocprofiler_status_t rocprofiler_query_pc_sampling_agent_configurations (rocprofiler_agent_id_t agent_id, rocprofiler_available_pc_sampling_configurations_cb_t cb, void *user_data)
 Query PC Sampling Configuration.
 

Detailed Description

Enabling PC (Program Counter) Sampling for GPU Activity.


Data Structure Documentation

◆ rocprofiler_pc_sampling_configuration_t

struct rocprofiler_pc_sampling_configuration_t

PC sampling configuration supported by a GPU agent.

Definition at line 125 of file pc_sampling.h.

+ Collaboration diagram for rocprofiler_pc_sampling_configuration_t:
Data Fields
uint64_t flags
unsigned long max_interval the lowest possible frequency for generating samples using method
rocprofiler_pc_sampling_method_t method for future use

Sampling method supported by the GPU agent. Currently, it can take one of the following two values:

unsigned long min_interval the highest possible frequencey for generating samples using method.
uint64_t size Size of this struct.
rocprofiler_pc_sampling_unit_t unit A unit used to specify the interval of the method for samples generation.

◆ rocprofiler_pc_sampling_header_v1_t

struct rocprofiler_pc_sampling_header_v1_t

The header of the rocprofiler_pc_sampling_record_t, indicating what fields of the rocprofiler_pc_sampling_record_t instance are meaningful for the sample.

Definition at line 202 of file pc_sampling.h.

+ Collaboration diagram for rocprofiler_pc_sampling_header_v1_t:
Data Fields
uint8_t has_stall_reason: 1 whether the sample contains information about the stall reason. If so, please
See also
rocprofiler_pc_sampling_snapshot_v1_t.
uint8_t has_wave_cnt: 1 whether the rocprofiler_pc_sampling_record_t::wave_count contains meaningful value
uint8_t reserved: 1
uint8_t type: 4 rocprofiler_pc_sampling_snapshot_v1_t field is valid

for future use

The following values are possible:

  • 0 - reserved
  • 1 - host trap pc sample
  • 2 - stochastic pc sample
  • 3 - perfcounter (unsupported at the moment)
  • other values does not mean anything at the moment
uint8_t valid: 1

◆ rocprofiler_pc_sampling_snapshot_v1_t

struct rocprofiler_pc_sampling_snapshot_v1_t

For future use.

Definition at line 233 of file pc_sampling.h.

+ Collaboration diagram for rocprofiler_pc_sampling_snapshot_v1_t:
Data Fields
uint32_t arb_state_issue: 10
uint32_t arb_state_stall: 10
uint32_t dual_issue_valu: 1
uint32_t inst_type: 4
uint32_t reason_not_issued: 7

◆ rocprofiler_pc_sampling_record_t

struct rocprofiler_pc_sampling_record_t

ROCProfiler PC Sampling Record corresponding to the interrupted wave.

Definition at line 247 of file pc_sampling.h.

+ Collaboration diagram for rocprofiler_pc_sampling_record_t:
Data Fields
uint8_t chiplet chiplet index
rocprofiler_correlation_id_t correlation_id correlation id of the API call that initiated kernel launch. The interrupted wave is executed as part of the kernel.
uint64_t exec_mask shows how many SIMD lanes of the wave were executing the instruction represented by the pc. Useful to understand thread-divergance within the wave
rocprofiler_pc_sampling_header_v1_t flags indicates what fields of this struct are meaningful for the represented sample. The values depend on what the underlying GPU agent architecture supports.
uint32_t hw_id compute unit identifier
uint64_t pc Program counter of the wave of the moment of interruption.
uint8_t reserved: 7 reserved 7 bits, must be zero
uint32_t reserved2 for future use
uint64_t size Size of this struct.
rocprofiler_pc_sampling_snapshot_v1_t snapshot
See also
rocprofiler_pc_sampling_snapshot_v1_t
uint64_t timestamp timestamp when sample is generated
uint32_t wave_count number of active waves on the CU at the moment of sample generation
uint8_t wave_id wave identifier within the workgroup
uint8_t wave_issued: 1 indicates whether the wave is issueing the instruction represented by the pc
rocprofiler_dim3_t workgroup_id wave coordinates within the workgroup

◆ rocprofiler_pc_sampling_code_object_load_marker_t

struct rocprofiler_pc_sampling_code_object_load_marker_t

Marker representing code object loading event.

See also
rocprofiler_callback_tracing_code_object_load_data_t for more information

Definition at line 287 of file pc_sampling.h.

+ Collaboration diagram for rocprofiler_pc_sampling_code_object_load_marker_t:
Data Fields
uint64_t code_object_id
uint64_t size Size of this struct.

◆ rocprofiler_pc_sampling_code_object_unload_marker_t

struct rocprofiler_pc_sampling_code_object_unload_marker_t

Marker representing code object unloading event.

See also
rocprofiler_callback_tracing_code_object_load_data_t for more information

Definition at line 299 of file pc_sampling.h.

+ Collaboration diagram for rocprofiler_pc_sampling_code_object_unload_marker_t:
Data Fields
uint64_t code_object_id
uint64_t size Size of this struct.

Typedef Documentation

◆ rocprofiler_available_pc_sampling_configurations_cb_t

typedef rocprofiler_status_t(* rocprofiler_available_pc_sampling_configurations_cb_t) (const rocprofiler_pc_sampling_configuration_t *configs, unsigned long num_config, void *user_data)

Rocprofiler SDK's callback function to deliver the list of available PC sampling configurations upon the call to the rocprofiler_query_pc_sampling_agent_configurations.

Parameters
[out]configs- The array of PC sampling configurations supported by the agent at the moment of invoking rocprofiler_query_pc_sampling_agent_configurations.
[out]num_config- The number of configurations contained in the underlying array configs. In case the GPU agent does not support PC sampling, the value is 0.
[in]user_data- client's private data passed via rocprofiler_query_pc_sampling_agent_configurations
Returns
rocprofiler_status_t

Definition at line 164 of file pc_sampling.h.

Function Documentation

◆ rocprofiler_configure_pc_sampling_service()

rocprofiler_status_t rocprofiler_configure_pc_sampling_service ( rocprofiler_context_id_t  context_id,
rocprofiler_agent_id_t  agent_id,
rocprofiler_pc_sampling_method_t  method,
rocprofiler_pc_sampling_unit_t  unit,
uint64_t  interval,
rocprofiler_buffer_id_t  buffer_id 
)

Function used to configure the PC sampling service on the GPU agent with agent_id.

Prerequisites are the following:

  • The client must create a context and supply its context_id. By using this context, the client can start/stop PC sampling on the agent. For more information, please
    See also
    rocprofiler_start_context/rocprofiler_stop_context.
  • The user must create a buffer and supply its buffer_id. Rocprofiler-SDK uses the buffer to deliver the PC samples to the client. For more information about the data delivery, please
    See also
    rocprofiler_create_buffer and
    rocprofiler_buffer_tracing_cb_t.
    Before calling this function, we recommend querying PC sampling configurations supported by the GPU agent via the
    See also
    rocprofiler_query_pc_sampling_agent_configurations. The client chooses the method, unit, and interval to match one of the available configurations. Note that the interval must belong to the range of values [available_config.min_interval, available_config.max_interval], where available_config is the instance of the
    rocprofiler_pc_sampling_configuration_s supported/available at the moment.
    Rocprofiler-SDK checks whether the requsted configuration is actually supported at the moment of calling this function. If the answer is yes, it returns the
    See also
    ROCPROFILER_STATUS_SUCCESS. Otherwise, it notifies the client about the rejection reason via the returned status code. For more information about the status codes, please
    rocprofiler_status_t.
    There are a few constraints a client's code needs to be aware of.

Constraint1: A GPU agent can be configured to support at most one running PC sampling configuration at any time, which implies some of the consequences described below. After the tool configures the PC sampling with one of the available configurations, rocprofiler-SDK guarantees that this configuration will be valid for the tool's lifetime. The tool can start and stop the configured PC sampling service whenever convenient.

Constraint2: Since the same GPU agent can be used by multiple processes concurrently, Rocprofiler-SDK cannot guarantee the exclusive access to the PC sampling capability. The consequence is the following scenario. The tool TA that belongs to the process PA, calls the

See also
rocprofiler_query_pc_sampling_agent_configurations that returns the two supported configurations CA and CB by the agent. Then the tool TB of the process PB, configures the PC sampling on the same agent by using the configuration CB. Subsequently, the TA tries configuring the CA on the agent, and it fails. To point out that this case happened, we introduce a special status code
ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE. When this status code is observed by the tool TA, it queries all available configurations again by calling
rocprofiler_query_pc_sampling_agent_configurations, that returns only CB this time. The tool TA can choose CB, so that both TA and TB use the PC sampling capability in the separate processes. Both TA and TB receives samples generated by the kernels launched by the corresponding processes PA and PB, respectively.

Constraint3: Rocprofiler-SDK allows only one context to contain the configured PC sampling service within the process, that implies that at most one of the loaded tools can use PC sampling. One context can contains multiple PC sampling services configured for different GPU agents.

Constraint4: PC sampling feature is not available within the ROCgdb.

Constraint5: PC sampling service cannot be used simultaneously with counter collection service.

Parameters
[in]context_id- id of the context used for starting/stopping PC sampling service
[in]agent_id- id of the agent on which caller tries using PC sampling capability
[in]method- the type of PC sampling the caller tries to use on the agent.
[in]unit- The unit appropriate to the PC sampling type/method.
[in]interval- frequency at which PC samples are generated
[in]buffer_id- id of the buffer used for delivering PC samples
Returns
rocprofiler_status_t
Return values
ROCPROFILER_STATUS_SUCCESSPC sampling service configured successfully
ROCPROFILER_STATUS_ERROR_NOT_AVAILABLEOne of the scenarios is present:
  1. PC sampling is already configured with configuration different than requested,
  2. PC sampling is requested from a process that runs within the ROCgdb.
  3. HSA runtime does not support PC sampling.
ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNELthe amdgpu driver installed on the system does not support the PC sampling feature
ROCPROFILER_STATUS_ERRORa general error caused by the amdgpu driver
ROCPROFILER_STATUS_ERROR_CONTEXT_CONFLICTcounter collection service already setup in the context

◆ rocprofiler_query_pc_sampling_agent_configurations()

rocprofiler_status_t rocprofiler_query_pc_sampling_agent_configurations ( rocprofiler_agent_id_t  agent_id,
rocprofiler_available_pc_sampling_configurations_cb_t  cb,
void *  user_data 
)

Query PC Sampling Configuration.

Lists PC sampling configurations a GPU agent with agent_id supports at the moment of invoking the function. Delivers configurations via cb. In case the PC sampling is configured on the GPU agent, the cb delivers information about the active PC sampling configuration. In case the GPU agent does not support PC sampling capability, the cb delivers none PC sampling configurations.

Parameters
[in]agent_id- id of the agent for which available configurations will be listed
[in]cb- User callback that delivers the available PC sampling configurations
[in]user_data- passed to the cb
Returns
rocprofiler_status_t
Return values
ROCPROFILER_STATUS_ERROR_NOT_AVAILABLEOne of the scenarios is present:
  1. PC sampling is requested from a process that runs within the ROCgdb.
  2. HSA runtime does not support PC sampling.
ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNELthe amdgpu driver installed on the system does not support the PC sampling feature.
ROCPROFILER_STATUS_ERRORa general error caused by the amdgpu driver
ROCPROFILER_STATUS_SUCCESScb successfully finished