rocprofiler-sdk/pc_sampling.h Source File

rocprofiler-sdk/pc_sampling.h Source File#

Rocprofiler SDK Developer API: rocprofiler-sdk/pc_sampling.h Source File
Rocprofiler SDK Developer API 0.6.0
ROCm Profiling API and tools
pc_sampling.h
Go to the documentation of this file.
1// MIT License
2//
3// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
4//
5// Permission is hereby granted, free of charge, to any person obtaining a copy
6// of this software and associated documentation files (the "Software"), to deal
7// in the Software without restriction, including without limitation the rights
8// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9// copies of the Software, and to permit persons to whom the Software is
10// furnished to do so, subject to the following conditions:
11//
12// The above copyright notice and this permission notice shall be included in all
13// copies or substantial portions of the Software.
14//
15// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21// SOFTWARE.
22
23#pragma once
24
27#include <rocprofiler-sdk/fwd.h>
28
29ROCPROFILER_EXTERN_C_INIT
30
31/**
32 * @defgroup PC_SAMPLING_SERVICE PC Sampling
33 * @brief Enabling PC (Program Counter) Sampling for GPU Activity
34 * @{
35 */
36
37/**
38 * @brief Function used to configure the PC sampling service on the GPU agent with @p agent_id.
39 *
40 * Prerequisites are the following:
41 * - The client must create a context and supply its @p context_id. By using this context,
42 * the client can start/stop PC sampling on the agent. For more information,
43 * please @see rocprofiler_start_context/rocprofiler_stop_context.
44 * - The user must create a buffer and supply its @p buffer_id. Rocprofiler-SDK uses the buffer
45 * to deliver the PC samples to the client. For more information about the data delivery,
46 * please @see rocprofiler_create_buffer and @see rocprofiler_buffer_tracing_cb_t.
47 *
48 * Before calling this function, we recommend querying PC sampling configurations
49 * supported by the GPU agent via the @see rocprofiler_query_pc_sampling_agent_configurations.
50 * The client chooses the @p method, @p unit, and @p interval to match one of the
51 * available configurations. Note that the @p interval must belong to the range of values
52 * [available_config.min_interval, available_config.max_interval],
53 * where available_config is the instance of the @see rocprofiler_pc_sampling_configuration_s
54 * supported/available at the moment.
55 *
56 * Rocprofiler-SDK checks whether the requsted configuration is actually supported
57 * at the moment of calling this function. If the answer is yes, it returns
58 * the @see ROCPROFILER_STATUS_SUCCESS. Otherwise, it notifies the client about the
59 * rejection reason via the returned status code. For more information
60 * about the status codes, please @see rocprofiler_status_t.
61 *
62 * There are a few constraints a client's code needs to be aware of.
63 *
64 * Constraint1: A GPU agent can be configured to support at most one running PC sampling
65 * configuration at any time, which implies some of the consequences described below.
66 * After the tool configures the PC sampling with one of the available configurations,
67 * rocprofiler-SDK guarantees that this configuration will be valid for the tool's
68 * lifetime. The tool can start and stop the configured PC sampling service whenever convenient.
69 *
70 * Constraint2: Since the same GPU agent can be used by multiple processes concurrently,
71 * Rocprofiler-SDK cannot guarantee the exclusive access to the PC sampling capability.
72 * The consequence is the following scenario. The tool TA that belongs to the process PA,
73 * calls the @see rocprofiler_query_pc_sampling_agent_configurations that returns the
74 * two supported configurations CA and CB by the agent. Then the tool TB of the process PB,
75 * configures the PC sampling on the same agent by using the configuration CB.
76 * Subsequently, the TA tries configuring the CA on the agent, and it fails.
77 * To point out that this case happened, we introduce a special status code
78 * @see ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE.
79 * When this status code is observed by the tool TA, it queries all available configurations again
80 * by calling @see rocprofiler_query_pc_sampling_agent_configurations,
81 * that returns only CB this time. The tool TA can choose CB, so that both
82 * TA and TB use the PC sampling capability in the separate processes.
83 * Both TA and TB receives samples generated by the kernels launched by the
84 * corresponding processes PA and PB, respectively.
85 *
86 * Constraint3: Rocprofiler-SDK allows only one context to contain the configured PC sampling
87 * service within the process, that implies that at most one of the loaded tools can use PC
88 * sampling. One context can contains multiple PC sampling services configured for different GPU
89 * agents.
90 *
91 * Constraint4: PC sampling feature is not available within the ROCgdb.
92 *
93 * Constraint5: PC sampling service cannot be used simultaneously with
94 * counter collection service.
95 *
96 * @param [in] context_id - id of the context used for starting/stopping PC sampling service
97 * @param [in] agent_id - id of the agent on which caller tries using PC sampling capability
98 * @param [in] method - the type of PC sampling the caller tries to use on the agent.
99 * @param [in] unit - The unit appropriate to the PC sampling type/method.
100 * @param [in] interval - frequency at which PC samples are generated
101 * @param [in] buffer_id - id of the buffer used for delivering PC samples
102 * @param [in] flags - for future use
103 * @return ::rocprofiler_status_t
104 * @retval ::ROCPROFILER_STATUS_SUCCESS PC sampling service configured successfully
105 * @retval ::ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE One of the scenarios is present:
106 * 1. PC sampling is already configured with configuration different than requested,
107 * 2. PC sampling is requested from a process that runs within the ROCgdb.
108 * 3. HSA runtime does not support PC sampling.
109 * @retval ::ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL the amdgpu driver installed on the system
110 * does not support the PC sampling feature
111 * @retval ::ROCPROFILER_STATUS_ERROR a general error caused by the amdgpu driver
112 * @retval ::ROCPROFILER_STATUS_ERROR_CONTEXT_CONFLICT counter collection service already
113 * setup in the context
114 */
117 rocprofiler_agent_id_t agent_id,
120 uint64_t interval,
121 rocprofiler_buffer_id_t buffer_id,
122 int flags) ROCPROFILER_API;
123
124/**
125 * @brief PC sampling configuration supported by a GPU agent.
126 */
127typedef struct
128{
129 uint64_t size; ///< Size of this struct
134 uint64_t flags; /// for future use
135
136 /// @var method
137 /// @brief Sampling method supported by the GPU agent.
138 /// Currently, it can take one of the following two values:
139 /// - ::ROCPROFILER_PC_SAMPLING_METHOD_HOST_TRAP: a background host kernel thread
140 /// periodically interrupts waves execution on the GPU to generate PC samples
141 /// - ::ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC: performance monitoring hardware
142 /// on the GPU periodically interrupts waves to generate PC samples.
143 /// @var unit
144 /// @brief A unit used to specify the interval of the @ref method for samples generation.
145 /// @var min_interval
146 /// @brief the highest possible frequencey for generating samples using @ref method.
147 /// @var max_interval
148 /// @brief the lowest possible frequency for generating samples using @ref method
149
151
152/**
153 * @brief Rocprofiler SDK's callback function to deliver the list of available PC
154 * sampling configurations upon the call to the
155 * @ref rocprofiler_query_pc_sampling_agent_configurations.
156 *
157 * @param[out] configs - The array of PC sampling configurations supported by the agent
158 * at the moment of invoking @ref rocprofiler_query_pc_sampling_agent_configurations.
159 * @param[out] num_config - The number of configurations contained in the underlying array
160 * @p configs.
161 * In case the GPU agent does not support PC sampling, the value is 0.
162 * @param[in] user_data - client's private data passed via
163 * @ref rocprofiler_query_pc_sampling_agent_configurations
164 * @return ::rocprofiler_status_t
165 */
168 size_t num_config,
169 void* user_data);
170
171/**
172 * @brief Query PC Sampling Configuration.
173 *
174 * Lists PC sampling configurations a GPU agent with @p agent_id supports at the moment
175 * of invoking the function. Delivers configurations via @p cb.
176 * In case the PC sampling is configured on the GPU agent, the @p cb delivers information
177 * about the active PC sampling configuration.
178 * In case the GPU agent does not support PC sampling capability,
179 * the @p cb delivers none PC sampling configurations.
180 *
181 * @param [in] agent_id - id of the agent for which available configurations will be listed
182 * @param [in] cb - User callback that delivers the available PC sampling configurations
183 * @param [in] user_data - passed to the @p cb
184 * @return ::rocprofiler_status_t
185 * @retval ::ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE One of the scenarios is present:
186 * 1. PC sampling is requested from a process that runs within the ROCgdb.
187 * 2. HSA runtime does not support PC sampling.
188 * @retval ::ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL the amdgpu driver installed on the system
189 * does not support the PC sampling feature.
190 * @retval ::ROCPROFILER_STATUS_ERROR a general error caused by the amdgpu driver
191 * @retval ::ROCPROFILER_STATUS_SUCCESS @p cb successfully finished
192 */
195 rocprofiler_agent_id_t agent_id,
197 void* user_data) ROCPROFILER_API ROCPROFILER_NONNULL(2, 3);
198
199/**
200 * @brief Information about the GPU part where wave was executing
201 * at the moment of sampling.
202 */
204{
205 uint64_t chiplet : 6; ///< chiplet index (3 bits allocated by the ROCr runtime)
206 uint64_t wave_id : 7; ///< wave slot index
207 uint64_t simd_id : 2; ///< SIMD index
208 uint64_t pipe_id : 4; ///< pipe index
209 uint64_t cu_or_wgp_id : 4; ///< Index of compute unit on GFX9 or workgroup processer on other
210 ///< architectures
211 uint64_t shader_array_id : 1; ///< Shared array index
212 uint64_t shader_engine_id : 5; ///< shared engine index
213 uint64_t workgroup_id : 7; ///< thread_group index on GFX9, and workgroup index on GFX10+
214 uint64_t vm_id : 6; ///< virtual memory ID
215 uint64_t queue_id : 4; ///< queue id
216 uint64_t microengine_id : 2; ///< ACE (microengine) index
217 uint64_t reserved0 : 16; ///< Reserved for the future use
219
220/**
221 * @brief Sampled program counter.
222 */
223typedef struct
224{
227
228 /// @var code_object_id
229 /// @brief id of the loaded code object instance that contains sampled PC.
230 /// This fields holds the value ::ROCPROFILER_CODE_OBJECT_ID_NONE
231 /// if the code object cannot be determined
232 /// (e.g., sampled PC belongs to code generated by self modifying code).
233 /// @var code_object_offset
234 /// @brief If @ref code_object_id is different than ::ROCPROFILER_CODE_OBJECT_ID_NONE,
235 /// then this field contains the offset of the sampled PC relative to the
236 /// ::rocprofiler_callback_tracing_code_object_load_data_t::load_base
237 /// of the code object instance with @ref code_object_id.
238 /// To calculate the original virtual address of the sampled PC, one can add the value
239 /// of this field to the ::rocprofiler_callback_tracing_code_object_load_data_t::load_base.
240 /// The value of @ref code_object_offset matches
241 /// the virtual address of the sampled instruction (PC), only if the
242 /// @ref code_object_id is equal to the ::ROCPROFILER_CODE_OBJECT_ID_NONE.
244
245// TODO: The definition of this struct might change over time.
246/**
247 * @brief ROCProfiler Host-Trap PC Sampling Record.
248 */
250{
251 uint64_t size; ///< Size of this struct
252 rocprofiler_pc_sampling_hw_id_v0_t hw_id; ///< @see ::rocprofiler_pc_sampling_hw_id_0_t
253 rocprofiler_pc_t pc; ///< information about sampled program counter
254 uint64_t exec_mask; ///< active SIMD lanes when sampled
255 uint64_t timestamp; ///< timestamp when sample is generated
256 uint64_t dispatch_id; ///< originating kernel dispatch ID
257 rocprofiler_correlation_id_t correlation_id; ///< API launch call id that matches dispatch ID
258 rocprofiler_dim3_t workgroup_id; ///< wave coordinates within the workgroup
259 uint32_t wave_in_group : 8; ///< wave position within the workgroup (0-31)
260 uint32_t reserved0 : 24; ///< wave position within the workgroup (0-31)
262
263/** @} */
264
265ROCPROFILER_EXTERN_C_FINI
rocprofiler_pc_sampling_method_t
PC Sampling Method.
Definition fwd.h:300
rocprofiler_pc_sampling_unit_t
PC Sampling Unit.
Definition fwd.h:311
rocprofiler_status_t
Status codes.
Definition fwd.h:53
Agent Identifier.
Definition fwd.h:578
Context ID.
Definition fwd.h:539
ROCProfiler Record Correlation ID.
Definition fwd.h:555
Multi-dimensional struct of data used to describe GPU workgroup and grid sizes.
Definition fwd.h:603
uint64_t reserved0
Reserved for the future use.
uint64_t shader_array_id
Shared array index.
uint64_t size
Size of this struct.
uint64_t timestamp
timestamp when sample is generated
uint64_t code_object_id
id of the loaded code object instance that contains sampled PC. This fields holds the value ROCPROFIL...
uint64_t workgroup_id
thread_group index on GFX9, and workgroup index on GFX10+
uint64_t dispatch_id
originating kernel dispatch ID
uint64_t cu_or_wgp_id
Index of compute unit on GFX9 or workgroup processer on other architectures.
uint32_t reserved0
wave position within the workgroup (0-31)
uint64_t microengine_id
ACE (microengine) index.
uint64_t shader_engine_id
shared engine index
uint64_t exec_mask
active SIMD lanes when sampled
rocprofiler_pc_t pc
information about sampled program counter
uint32_t wave_in_group
wave position within the workgroup (0-31)
rocprofiler_pc_sampling_method_t method
for future use
unsigned long max_interval
the lowest possible frequency for generating samples using method
uint64_t chiplet
chiplet index (3 bits allocated by the ROCr runtime)
rocprofiler_correlation_id_t correlation_id
API launch call id that matches dispatch ID.
unsigned long min_interval
the highest possible frequencey for generating samples using method.
rocprofiler_dim3_t workgroup_id
wave coordinates within the workgroup
uint64_t wave_id
wave slot index
uint64_t vm_id
virtual memory ID
rocprofiler_pc_sampling_hw_id_v0_t hw_id
uint64_t code_object_offset
If code_object_id is different than ROCPROFILER_CODE_OBJECT_ID_NONE, then this field contains the off...
rocprofiler_pc_sampling_unit_t unit
A unit used to specify the interval of the method for samples generation.
rocprofiler_status_t(* rocprofiler_available_pc_sampling_configurations_cb_t)(const rocprofiler_pc_sampling_configuration_t *configs, unsigned long num_config, void *user_data)
Rocprofiler SDK's callback function to deliver the list of available PC sampling configurations upon ...
rocprofiler_status_t rocprofiler_configure_pc_sampling_service(rocprofiler_context_id_t context_id, rocprofiler_agent_id_t agent_id, rocprofiler_pc_sampling_method_t method, rocprofiler_pc_sampling_unit_t unit, uint64_t interval, rocprofiler_buffer_id_t buffer_id, int flags)
Function used to configure the PC sampling service on the GPU agent with agent_id.
rocprofiler_status_t rocprofiler_query_pc_sampling_agent_configurations(rocprofiler_agent_id_t agent_id, rocprofiler_available_pc_sampling_configurations_cb_t cb, void *user_data)
Query PC Sampling Configuration.
PC sampling configuration supported by a GPU agent.
Information about the GPU part where wave was executing at the moment of sampling.
ROCProfiler Host-Trap PC Sampling Record.
Sampled program counter.