PC Sampling#
|
ROCprofiler-SDK developer API 1.1.0
ROCm Profiling API and tools
|
Enabling PC (Program Counter) Sampling for GPU Activity. More...
Data Structures | |
| struct | rocprofiler_pc_sampling_configuration_t |
| (experimental) PC sampling configuration supported by a GPU agent. More... | |
| struct | rocprofiler_pc_sampling_hw_id_v0_t |
| (experimental) Information about the GPU part where wave was executing at the moment of sampling. More... | |
| struct | rocprofiler_pc_t |
| (experimental) Sampled program counter. More... | |
| struct | rocprofiler_pc_sampling_record_host_trap_v0_t |
| (experimental) ROCProfiler Host-Trap PC Sampling Record. More... | |
| struct | rocprofiler_pc_sampling_record_stochastic_header_t |
| (experimental) The header of the rocprofiler_pc_sampling_record_stochastic_v0_t, indicating what fields of the rocprofiler_pc_sampling_record_stochastic_v0_t instance are meaningful for the sample. More... | |
| struct | rocprofiler_pc_sampling_snapshot_v0_t |
| (experimental) Data provided by stochastic sampling hardware. More... | |
| struct | rocprofiler_pc_sampling_memory_counters_t |
| (experimental) Counters of issued but not yet completed instructions. More... | |
| struct | rocprofiler_pc_sampling_record_stochastic_v0_t |
| (experimental) ROCProfiler Stochastic PC Sampling Record. More... | |
| struct | rocprofiler_pc_sampling_record_invalid_t |
| (experimental) Record representing an invalid PC Sampling Record. More... | |
Typedefs | |
| typedef rocprofiler_status_t(* | rocprofiler_available_pc_sampling_configurations_cb_t) (const rocprofiler_pc_sampling_configuration_t *configs, unsigned long num_config, void *user_data) |
| (experimental) Rocprofiler SDK's callback function to deliver the list of available PC sampling configurations upon the call to the rocprofiler_query_pc_sampling_agent_configurations. | |
Enumerations | |
| enum | rocprofiler_pc_sampling_configuration_flags_t { ROCPROFILER_PC_SAMPLING_CONFIGURATION_FLAGS_NONE = 0 , ROCPROFILER_PC_SAMPLING_CONFIGURATION_FLAGS_INTERVAL_POW2 } |
| (experimental) Enumeration describing values of flags of rocprofiler_pc_sampling_configuration_t. More... | |
| enum | rocprofiler_pc_sampling_instruction_type_t { ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_NONE = 0 , ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_VALU , ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_MATRIX , ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_SCALAR , ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_TEX , ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_LDS , ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_LDS_DIRECT , ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_FLAT , ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_EXPORT , ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_MESSAGE , ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_BARRIER , ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_BRANCH_NOT_TAKEN , ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_BRANCH_TAKEN , ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_JUMP , ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_OTHER , ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_NO_INST , ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_DUAL_VALU , ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_BRANCH_TAKEN } |
| (experimental) Enumeration describing type of sampled issued instruction. More... | |
| enum | rocprofiler_pc_sampling_instruction_not_issued_reason_t { ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_NONE = 0 , ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_NO_INSTRUCTION_AVAILABLE , ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_ALU_DEPENDENCY , ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_WAITCNT , ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_INTERNAL_INSTRUCTION , ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_BARRIER_WAIT , ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_ARBITER_NOT_WIN , ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_ARBITER_WIN_EX_STALL , ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_OTHER_WAIT , ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_SLEEP_WAIT , ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_ALU_DEPENDENCY , ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_INTERNAL_INSTRUCTION , ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_ARBITER_NOT_WIN , ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_ARBITER_WIN_EX_STALL , ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_OTHER_WAIT } |
| (experimental) Enumeration describing reason for not issuing an instruction. More... | |
Detailed Description
Enabling PC (Program Counter) Sampling for GPU Activity.
Data Structure Documentation
◆ rocprofiler_pc_sampling_configuration_t
| struct rocprofiler_pc_sampling_configuration_t |
(experimental) PC sampling configuration supported by a GPU agent.
Definition at line 145 of file pc_sampling.h.
Collaboration diagram for rocprofiler_pc_sampling_configuration_t:| Data Fields | ||
|---|---|---|
| uint64_t | flags | take values from rocprofiler_pc_sampling_configuration_flags_t |
| unsigned long | max_interval | the lowest possible frequency for generating samples using method |
| rocprofiler_pc_sampling_method_t | method |
Sampling method supported by the GPU agent. Currently, it can take one of the following two values:
|
| unsigned long | min_interval | the highest possible frequencey for generating samples using method. |
| uint64_t | size | Size of this struct. |
| rocprofiler_pc_sampling_unit_t | unit | A unit used to specify the interval of the method for samples generation. |
◆ rocprofiler_pc_sampling_hw_id_v0_t
| struct rocprofiler_pc_sampling_hw_id_v0_t |
(experimental) Information about the GPU part where wave was executing at the moment of sampling.
Definition at line 223 of file pc_sampling.h.
Collaboration diagram for rocprofiler_pc_sampling_hw_id_v0_t:◆ rocprofiler_pc_t
| struct rocprofiler_pc_t |
(experimental) Sampled program counter.
Definition at line 245 of file pc_sampling.h.
Collaboration diagram for rocprofiler_pc_t:| Data Fields | ||
|---|---|---|
| uint64_t | code_object_id | id of the loaded code object instance that contains sampled PC. This fields holds the value ROCPROFILER_CODE_OBJECT_ID_NONE if the code object cannot be determined (e.g., sampled PC belongs to code generated by self modifying code). |
| uint64_t | code_object_offset | If code_object_id is different than ROCPROFILER_CODE_OBJECT_ID_NONE, then this field contains the offset of the sampled PC relative to the rocprofiler_callback_tracing_code_object_load_data_t.load_base of the code object instance with code_object_id. To calculate the original virtual address of the sampled PC, one can add the value of this field to the rocprofiler_callback_tracing_code_object_load_data_t.load_base. The value of code_object_offset matches the virtual address of the sampled instruction (PC), only if the code_object_id is equal to the ROCPROFILER_CODE_OBJECT_ID_NONE. |
◆ rocprofiler_pc_sampling_record_host_trap_v0_t
| struct rocprofiler_pc_sampling_record_host_trap_v0_t |
(experimental) ROCProfiler Host-Trap PC Sampling Record.
Definition at line 270 of file pc_sampling.h.
Collaboration diagram for rocprofiler_pc_sampling_record_host_trap_v0_t:| Data Fields | ||
|---|---|---|
| rocprofiler_async_correlation_id_t | correlation_id | API launch call id that matches dispatch ID. |
| uint64_t | dispatch_id | originating kernel dispatch ID |
| uint64_t | exec_mask | active SIMD lanes when sampled |
| rocprofiler_pc_sampling_hw_id_v0_t | hw_id | |
| rocprofiler_pc_t | pc | information about sampled program counter |
| uint32_t | reserved0: 24 | wave position within the workgroup (0-31) |
| uint64_t | size | Size of this struct. |
| uint64_t | timestamp | timestamp when sample is generated |
| uint32_t | wave_in_group: 8 | wave position within the workgroup (0-31) |
| rocprofiler_dim3_t | workgroup_id | wave coordinates within the workgroup |
◆ rocprofiler_pc_sampling_record_stochastic_header_t
| struct rocprofiler_pc_sampling_record_stochastic_header_t |
(experimental) The header of the rocprofiler_pc_sampling_record_stochastic_v0_t, indicating what fields of the rocprofiler_pc_sampling_record_stochastic_v0_t instance are meaningful for the sample.
Definition at line 292 of file pc_sampling.h.
Collaboration diagram for rocprofiler_pc_sampling_record_stochastic_header_t:| Data Fields | ||
|---|---|---|
| uint8_t | has_memory_counter: 1 | pc sample provides memory counters information via rocprofiler_pc_sampling_memory_counters_t |
| uint8_t | reserved_type: 7 | |
◆ rocprofiler_pc_sampling_snapshot_v0_t
| struct rocprofiler_pc_sampling_snapshot_v0_t |
(experimental) Data provided by stochastic sampling hardware.
Definition at line 364 of file pc_sampling.h.
Collaboration diagram for rocprofiler_pc_sampling_snapshot_v0_t:| Data Fields | ||
|---|---|---|
| uint32_t | arb_state_issue_brmsg: 1 | arbiter issued a branch/message instruction |
| uint32_t | arb_state_issue_exp: 1 | arbiter issued a export instruction |
| uint32_t | arb_state_issue_flat: 1 | arbiter issued a FLAT instruction |
| uint32_t | arb_state_issue_lds: 1 | arbiter issued a LDS instruction |
| uint32_t | arb_state_issue_lds_direct: 1 | arbiter issued a LDS direct instruction |
| uint32_t | arb_state_issue_matrix: 1 | arbiter issued a matrix instruction |
| uint32_t | arb_state_issue_misc: 1 | arbiter issued a miscellaneous instruction |
| uint32_t | arb_state_issue_reserved: 1 | reserved for the future use |
| uint32_t | arb_state_issue_scalar: 1 | arbiter issued a scalar (SALU/SMEM) instruction |
| uint32_t | arb_state_issue_valu: 1 | arbiter issued a VALU instruction |
| uint32_t | arb_state_issue_vmem_tex: 1 | arbiter issued a texture instruction |
| uint32_t | arb_state_stall_brmsg: 1 | branch/message instruction was stalled |
| uint32_t | arb_state_stall_exp: 1 | export instruction was stalled |
| uint32_t | arb_state_stall_flat: 1 | flat instruction was stalled |
| uint32_t | arb_state_stall_lds: 1 | LDS instruction was stalled. |
| uint32_t | arb_state_stall_lds_direct: 1 | LDS direct instruction was stalled. |
| uint32_t | arb_state_stall_matrix: 1 | matrix instruction was stalled |
| uint32_t | arb_state_stall_misc: 1 | miscellaneous instruction was stalled |
| uint32_t | arb_state_stall_scalar: 1 | Scalar (SALU/SMEM) instruction was stalled. |
| uint32_t | arb_state_stall_valu: 1 | VALU instruction was stalled when a sample was generated. |
| uint32_t | arb_state_stall_vmem_tex: 1 | texture instruction was stalled |
| uint32_t | arb_state_state_reserved: 1 | reserved for the future use |
| uint32_t | dual_issue_valu: 1 | Two VALU instructions were issued for coexecution (MI3xx specific) |
| uint32_t | reason_not_issued: 4 | The reason for not issuing an instruction. The field takes one of the value defined in rocprofiler_pc_sampling_instruction_not_issued_reason_t. |
| uint32_t | reserved0: 1 | reserved for future use |
| uint32_t | reserved1: 1 | reserved for the future use |
| uint32_t | reserved2: 3 | reserved for the future use |
◆ rocprofiler_pc_sampling_memory_counters_t
| struct rocprofiler_pc_sampling_memory_counters_t |
(experimental) Counters of issued but not yet completed instructions.
Definition at line 407 of file pc_sampling.h.
Collaboration diagram for rocprofiler_pc_sampling_memory_counters_t:◆ rocprofiler_pc_sampling_record_stochastic_v0_t
| struct rocprofiler_pc_sampling_record_stochastic_v0_t |
(experimental) ROCProfiler Stochastic PC Sampling Record.
Definition at line 434 of file pc_sampling.h.
Collaboration diagram for rocprofiler_pc_sampling_record_stochastic_v0_t:| Data Fields | ||
|---|---|---|
| rocprofiler_async_correlation_id_t | correlation_id | API launch call id that matches dispatch ID. |
| uint64_t | dispatch_id | originating kernel dispatch ID |
| uint64_t | exec_mask | active SIMD lanes at the moment of sampling |
| rocprofiler_pc_sampling_record_stochastic_header_t | flags | Defines what fields are meaningful for the sample. |
| rocprofiler_pc_sampling_hw_id_v0_t | hw_id | |
| uint8_t | inst_type: 5 | instruction type, takes a value defined in rocprofiler_pc_sampling_instruction_type_t |
| rocprofiler_pc_sampling_memory_counters_t | memory_counters | Counters of issued but not yet completed instructions. |
| rocprofiler_pc_t | pc | information about sampled program counter |
| uint8_t | reserved: 2 | reserved 2 bits must be zero |
| uint64_t | size | Size of this struct. |
| rocprofiler_pc_sampling_snapshot_v0_t | snapshot | Data provided by stochastic sampling hardware. |
| uint64_t | timestamp | timestamp when sample is generated |
| uint32_t | wave_count | active waves on the CU at the moment of sampling |
| uint8_t | wave_in_group | wave position within the workgroup (0-15) |
| uint8_t | wave_issued: 1 | wave issued the instruction represented with the PC |
| rocprofiler_dim3_t | workgroup_id | wave coordinates within the workgroup |
◆ rocprofiler_pc_sampling_record_invalid_t
| struct rocprofiler_pc_sampling_record_invalid_t |
(experimental) Record representing an invalid PC Sampling Record.
Definition at line 491 of file pc_sampling.h.
Collaboration diagram for rocprofiler_pc_sampling_record_invalid_t:| Data Fields | ||
|---|---|---|
| uint64_t | size | Size of the struct. |
Typedef Documentation
◆ rocprofiler_available_pc_sampling_configurations_cb_t
| typedef rocprofiler_status_t(* rocprofiler_available_pc_sampling_configurations_cb_t) (const rocprofiler_pc_sampling_configuration_t *configs, unsigned long num_config, void *user_data) |
#include <rocprofiler-sdk/pc_sampling.h>
(experimental) Rocprofiler SDK's callback function to deliver the list of available PC sampling configurations upon the call to the rocprofiler_query_pc_sampling_agent_configurations.
- Parameters
-
[out] configs - The array of PC sampling configurations supported by the agent at the moment of invoking rocprofiler_query_pc_sampling_agent_configurations. [out] num_config - The number of configurations contained in the underlying array configs. In case the GPU agent does not support PC sampling, the value is 0.[in] user_data - client's private data passed via rocprofiler_query_pc_sampling_agent_configurations
- Returns
- rocprofiler_status_t
Definition at line 185 of file pc_sampling.h.
Enumeration Type Documentation
◆ rocprofiler_pc_sampling_configuration_flags_t
#include <rocprofiler-sdk/pc_sampling.h>
(experimental) Enumeration describing values of flags of rocprofiler_pc_sampling_configuration_t.
Definition at line 132 of file pc_sampling.h.
◆ rocprofiler_pc_sampling_instruction_not_issued_reason_t
#include <rocprofiler-sdk/pc_sampling.h>
(experimental) Enumeration describing reason for not issuing an instruction.
Definition at line 332 of file pc_sampling.h.
◆ rocprofiler_pc_sampling_instruction_type_t
#include <rocprofiler-sdk/pc_sampling.h>
(experimental) Enumeration describing type of sampled issued instruction.
Definition at line 302 of file pc_sampling.h.
Function Documentation
◆ rocprofiler_configure_pc_sampling_service()
| rocprofiler_status_t rocprofiler_configure_pc_sampling_service | ( | rocprofiler_context_id_t | context_id, |
| rocprofiler_agent_id_t | agent_id, | ||
| rocprofiler_pc_sampling_method_t | method, | ||
| rocprofiler_pc_sampling_unit_t | unit, | ||
| uint64_t | interval, | ||
| rocprofiler_buffer_id_t | buffer_id, | ||
| int | flags | ||
| ) |
#include <rocprofiler-sdk/pc_sampling.h>
(experimental) Function used to configure the PC sampling service on the GPU agent with agent_id.
Prerequisites are the following:
- The client must create a context and supply its
context_id. By using this context, the client can start/stop PC sampling on the agent. For more information, please- See also
- rocprofiler_start_context/rocprofiler_stop_context.
- The user must create a buffer and supply its
buffer_id. Rocprofiler-SDK uses the buffer to deliver the PC samples to the client. For more information about the data delivery, please- See also
- rocprofiler_create_buffer and
- rocprofiler_buffer_tracing_cb_t.
- See also
- rocprofiler_query_pc_sampling_agent_configurations. The client chooses the
method,unit, andintervalto match one of the available configurations. Note that theintervalmust belong to the range of values [available_config.min_interval, available_config.max_interval], where available_config is the instance of the - rocprofiler_pc_sampling_configuration_s supported/available at the moment.
- See also
- ROCPROFILER_STATUS_SUCCESS. Otherwise, it notifies the client about the rejection reason via the returned status code. For more information about the status codes, please
- rocprofiler_status_t.
Constraint1: A GPU agent can be configured to support at most one running PC sampling configuration at any time, which implies some of the consequences described below. After the tool configures the PC sampling with one of the available configurations, rocprofiler-SDK guarantees that this configuration will be valid for the tool's lifetime. The tool can start and stop the configured PC sampling service whenever convenient.
Constraint2: Since the same GPU agent can be used by multiple processes concurrently, Rocprofiler-SDK cannot guarantee the exclusive access to the PC sampling capability. The consequence is the following scenario. The tool TA that belongs to the process PA, calls the
- See also
- rocprofiler_query_pc_sampling_agent_configurations that returns the two supported configurations CA and CB by the agent. Then the tool TB of the process PB, configures the PC sampling on the same agent by using the configuration CB. Subsequently, the TA tries configuring the CA on the agent, and it fails. To point out that this case happened, we introduce a special status code
- ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE. When this status code is observed by the tool TA, it queries all available configurations again by calling
- rocprofiler_query_pc_sampling_agent_configurations, that returns only CB this time. The tool TA can choose CB, so that both TA and TB use the PC sampling capability in the separate processes. Both TA and TB receives samples generated by the kernels launched by the corresponding processes PA and PB, respectively.
Constraint3: Rocprofiler-SDK allows only one context to contain the configured PC sampling service within the process, that implies that at most one of the loaded tools can use PC sampling. One context can contains multiple PC sampling services configured for different GPU agents.
Constraint4: PC sampling feature is not available within the ROCgdb.
Constraint5: PC sampling service cannot be used simultaneously with counter collection service.
- Parameters
-
[in] context_id - id of the context used for starting/stopping PC sampling service [in] agent_id - id of the agent on which caller tries using PC sampling capability [in] method - the type of PC sampling the caller tries to use on the agent. [in] unit - The unit appropriate to the PC sampling type/method. [in] interval - frequency at which PC samples are generated [in] buffer_id - id of the buffer used for delivering PC samples [in] flags - for future use
- Returns
- rocprofiler_status_t
- Return values
-
ROCPROFILER_STATUS_SUCCESS PC sampling service configured successfully ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE One of the scenarios is present: - PC sampling is already configured with configuration different than requested,
- PC sampling is requested from a process that runs within the ROCgdb.
- HSA runtime does not support PC sampling.
- GPU device does not support requested PC sampling method.
ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL the amdgpu driver installed on the system does not support the PC sampling feature ROCPROFILER_STATUS_ERROR a general error caused by the amdgpu driver ROCPROFILER_STATUS_ERROR_CONTEXT_CONFLICT counter collection service already setup in the context ROCPROFILER_STATUS_ERROR_INVALID_ARGUMENT function invoked with an invalid argument
◆ rocprofiler_get_pc_sampling_instruction_not_issued_reason_name()
| const char * rocprofiler_get_pc_sampling_instruction_not_issued_reason_name | ( | rocprofiler_pc_sampling_instruction_not_issued_reason_t | not_issued_reason | ) |
#include <rocprofiler-sdk/pc_sampling.h>
(experimental) Return the string encoding of rocprofiler_pc_sampling_instruction_not_issued_reason_t value
- Parameters
-
[in] not_issued_reason no issue reason enum value
- Returns
- Will return a nullptr if invalid/unsupported rocprofiler_pc_sampling_instruction_not_issued_reason_t value is provided.
◆ rocprofiler_get_pc_sampling_instruction_type_name()
| const char * rocprofiler_get_pc_sampling_instruction_type_name | ( | rocprofiler_pc_sampling_instruction_type_t | instruction_type | ) |
#include <rocprofiler-sdk/pc_sampling.h>
(experimental) Return the string encoding of rocprofiler_pc_sampling_instruction_type_t value
- Parameters
-
[in] instruction_type instruction type enum value
- Returns
- Will return a nullptr if invalid/unsupported rocprofiler_pc_sampling_instruction_type_t value is provided.
◆ rocprofiler_query_pc_sampling_agent_configurations()
| rocprofiler_status_t rocprofiler_query_pc_sampling_agent_configurations | ( | rocprofiler_agent_id_t | agent_id, |
| rocprofiler_available_pc_sampling_configurations_cb_t | cb, | ||
| void * | user_data | ||
| ) |
#include <rocprofiler-sdk/pc_sampling.h>
(experimental) Query PC Sampling Configuration.
Lists PC sampling configurations a GPU agent with agent_id supports at the moment of invoking the function. Delivers configurations via cb. In case the PC sampling is configured on the GPU agent, the cb delivers information about the active PC sampling configuration. In case the GPU agent does not support PC sampling capability, the cb delivers none PC sampling configurations.
- Parameters
-
[in] agent_id - id of the agent for which available configurations will be listed [in] cb - User callback that delivers the available PC sampling configurations [in] user_data - passed to the cb
- Returns
- rocprofiler_status_t
- Return values
-
ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE One of the scenarios is present: - PC sampling is requested from a process that runs within the ROCgdb.
- HSA runtime does not support PC sampling.
ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL the amdgpu driver installed on the system does not support the PC sampling feature. ROCPROFILER_STATUS_ERROR a general error caused by the amdgpu driver ROCPROFILER_STATUS_SUCCESS cbsuccessfully finished
Generated by