rocprofiler-sdk/pc_sampling.h Source File

rocprofiler-sdk/pc_sampling.h Source File#

Rocprofiler SDK Developer API: rocprofiler-sdk/pc_sampling.h Source File
Rocprofiler SDK Developer API 0.5.0
ROCm Profiling API and tools
pc_sampling.h
Go to the documentation of this file.
1// MIT License
2//
3// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
4//
5// Permission is hereby granted, free of charge, to any person obtaining a copy
6// of this software and associated documentation files (the "Software"), to deal
7// in the Software without restriction, including without limitation the rights
8// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9// copies of the Software, and to permit persons to whom the Software is
10// furnished to do so, subject to the following conditions:
11//
12// The above copyright notice and this permission notice shall be included in all
13// copies or substantial portions of the Software.
14//
15// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21// SOFTWARE.
22
23#pragma once
24
27#include <rocprofiler-sdk/fwd.h>
28
29ROCPROFILER_EXTERN_C_INIT
30
31/**
32 * @defgroup PC_SAMPLING_SERVICE PC Sampling
33 * @brief Enabling PC (Program Counter) Sampling for GPU Activity
34 * @{
35 */
36
37/**
38 * @brief Function used to configure the PC sampling service on the GPU agent with @p agent_id.
39 *
40 * Prerequisites are the following:
41 * - The client must create a context and supply its @p context_id. By using this context,
42 * the client can start/stop PC sampling on the agent. For more information,
43 * please @see rocprofiler_start_context/rocprofiler_stop_context.
44 * - The user must create a buffer and supply its @p buffer_id. Rocprofiler-SDK uses the buffer
45 * to deliver the PC samples to the client. For more information about the data delivery,
46 * please @see rocprofiler_create_buffer and @see rocprofiler_buffer_tracing_cb_t.
47 *
48 * Before calling this function, we recommend querying PC sampling configurations
49 * supported by the GPU agent via the @see rocprofiler_query_pc_sampling_agent_configurations.
50 * The client chooses the @p method, @p unit, and @p interval to match one of the
51 * available configurations. Note that the @p interval must belong to the range of values
52 * [available_config.min_interval, available_config.max_interval],
53 * where available_config is the instance of the @see rocprofiler_pc_sampling_configuration_s
54 * supported/available at the moment.
55 *
56 * Rocprofiler-SDK checks whether the requsted configuration is actually supported
57 * at the moment of calling this function. If the answer is yes, it returns
58 * the @see ROCPROFILER_STATUS_SUCCESS. Otherwise, it notifies the client about the
59 * rejection reason via the returned status code. For more information
60 * about the status codes, please @see rocprofiler_status_t.
61 *
62 * There are a few constraints a client's code needs to be aware of.
63 *
64 * Constraint1: A GPU agent can be configured to support at most one running PC sampling
65 * configuration at any time, which implies some of the consequences described below.
66 * After the tool configures the PC sampling with one of the available configurations,
67 * rocprofiler-SDK guarantees that this configuration will be valid for the tool's
68 * lifetime. The tool can start and stop the configured PC sampling service whenever convenient.
69 *
70 * Constraint2: Since the same GPU agent can be used by multiple processes concurrently,
71 * Rocprofiler-SDK cannot guarantee the exclusive access to the PC sampling capability.
72 * The consequence is the following scenario. The tool TA that belongs to the process PA,
73 * calls the @see rocprofiler_query_pc_sampling_agent_configurations that returns the
74 * two supported configurations CA and CB by the agent. Then the tool TB of the process PB,
75 * configures the PC sampling on the same agent by using the configuration CB.
76 * Subsequently, the TA tries configuring the CA on the agent, and it fails.
77 * To point out that this case happened, we introduce a special status code
78 * @see ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE.
79 * When this status code is observed by the tool TA, it queries all available configurations again
80 * by calling @see rocprofiler_query_pc_sampling_agent_configurations,
81 * that returns only CB this time. The tool TA can choose CB, so that both
82 * TA and TB use the PC sampling capability in the separate processes.
83 * Both TA and TB receives samples generated by the kernels launched by the
84 * corresponding processes PA and PB, respectively.
85 *
86 * Constraint3: Rocprofiler-SDK allows only one context to contain the configured PC sampling
87 * service within the process, that implies that at most one of the loaded tools can use PC
88 * sampling. One context can contains multiple PC sampling services configured for different GPU
89 * agents.
90 *
91 * Constraint4: PC sampling feature is not available within the ROCgdb.
92 *
93 * Constraint5: PC sampling service cannot be used simultaneously with
94 * counter collection service.
95 *
96 * @param [in] context_id - id of the context used for starting/stopping PC sampling service
97 * @param [in] agent_id - id of the agent on which caller tries using PC sampling capability
98 * @param [in] method - the type of PC sampling the caller tries to use on the agent.
99 * @param [in] unit - The unit appropriate to the PC sampling type/method.
100 * @param [in] interval - frequency at which PC samples are generated
101 * @param [in] buffer_id - id of the buffer used for delivering PC samples
102 * @return ::rocprofiler_status_t
103 * @retval ::ROCPROFILER_STATUS_SUCCESS PC sampling service configured successfully
104 * @retval ::ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE One of the scenarios is present:
105 * 1. PC sampling is already configured with configuration different than requested,
106 * 2. PC sampling is requested from a process that runs within the ROCgdb.
107 * 3. HSA runtime does not support PC sampling.
108 * @retval ::ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL the amdgpu driver installed on the system
109 * does not support the PC sampling feature
110 * @retval ::ROCPROFILER_STATUS_ERROR a general error caused by the amdgpu driver
111 * @retval ::ROCPROFILER_STATUS_ERROR_CONTEXT_CONFLICT counter collection service already
112 * setup in the context
113 */
116 rocprofiler_agent_id_t agent_id,
119 uint64_t interval,
120 rocprofiler_buffer_id_t buffer_id) ROCPROFILER_API;
121
122/**
123 * @brief PC sampling configuration supported by a GPU agent.
124 */
125typedef struct
126{
127 uint64_t size; ///< Size of this struct
132 uint64_t flags; /// for future use
133
134 /// @var method
135 /// @brief Sampling method supported by the GPU agent.
136 /// Currently, it can take one of the following two values:
137 /// - ::ROCPROFILER_PC_SAMPLING_METHOD_HOST_TRAP: a background host kernel thread
138 /// periodically interrupts waves execution on the GPU to generate PC samples
139 /// - ::ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC: performance monitoring hardware
140 /// on the GPU periodically interrupts waves to generate PC samples.
141 /// @var unit
142 /// @brief A unit used to specify the interval of the @ref method for samples generation.
143 /// @var min_interval
144 /// @brief the highest possible frequencey for generating samples using @ref method.
145 /// @var max_interval
146 /// @brief the lowest possible frequency for generating samples using @ref method
147
149
150/**
151 * @brief Rocprofiler SDK's callback function to deliver the list of available PC
152 * sampling configurations upon the call to the
153 * @ref rocprofiler_query_pc_sampling_agent_configurations.
154 *
155 * @param[out] configs - The array of PC sampling configurations supported by the agent
156 * at the moment of invoking @ref rocprofiler_query_pc_sampling_agent_configurations.
157 * @param[out] num_config - The number of configurations contained in the underlying array
158 * @p configs.
159 * In case the GPU agent does not support PC sampling, the value is 0.
160 * @param[in] user_data - client's private data passed via
161 * @ref rocprofiler_query_pc_sampling_agent_configurations
162 * @return ::rocprofiler_status_t
163 */
166 size_t num_config,
167 void* user_data);
168
169/**
170 * @brief Query PC Sampling Configuration.
171 *
172 * Lists PC sampling configurations a GPU agent with @p agent_id supports at the moment
173 * of invoking the function. Delivers configurations via @p cb.
174 * In case the PC sampling is configured on the GPU agent, the @p cb delivers information
175 * about the active PC sampling configuration.
176 * In case the GPU agent does not support PC sampling capability,
177 * the @p cb delivers none PC sampling configurations.
178 *
179 * @param [in] agent_id - id of the agent for which available configurations will be listed
180 * @param [in] cb - User callback that delivers the available PC sampling configurations
181 * @param [in] user_data - passed to the @p cb
182 * @return ::rocprofiler_status_t
183 * @retval ::ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE One of the scenarios is present:
184 * 1. PC sampling is requested from a process that runs within the ROCgdb.
185 * 2. HSA runtime does not support PC sampling.
186 * @retval ::ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL the amdgpu driver installed on the system
187 * does not support the PC sampling feature.
188 * @retval ::ROCPROFILER_STATUS_ERROR a general error caused by the amdgpu driver
189 * @retval ::ROCPROFILER_STATUS_SUCCESS @p cb successfully finished
190 */
193 rocprofiler_agent_id_t agent_id,
195 void* user_data) ROCPROFILER_API ROCPROFILER_NONNULL(2, 3);
196
197/**
198 * @brief The header of the @ref rocprofiler_pc_sampling_record_t, indicating
199 * what fields of the @ref rocprofiler_pc_sampling_record_t instance are meaningful
200 * for the sample.
201 */
202typedef struct
203{
204 uint8_t valid : 1; /// ::rocprofiler_pc_sampling_snapshot_v1_t field is valid
205 uint8_t type : 4;
206 uint8_t has_stall_reason : 1;
207 uint8_t has_wave_cnt : 1;
208 uint8_t reserved : 1; /// for future use
209
210 /// @var type
211 /// @brief The following values are possible:
212 /// - 0 - reserved
213 /// - 1 - host trap pc sample
214 /// - 2 - stochastic pc sample
215 /// - 3 - perfcounter (unsupported at the moment)
216 /// - other values does not mean anything at the moment
217 /// @var has_stall_reason
218 /// @brief whether the sample contains information about the stall reason.
219 /// If so, please @see rocprofiler_pc_sampling_snapshot_v1_t.
220 /// @var has_wave_cnt
221 /// @brief whether the @ref rocprofiler_pc_sampling_record_t::wave_count
222 /// contains meaningful value
224
225/**
226 * @brief For future use.
227 *
228 * @todo: Provide the description
229 * @todo: Should we use bitfields because of C ABI portability?
230 * @todo: Should we abstract this to be architecture agnostic?
231 * @todo: Consider having a query to determine organization of this information.
232 */
233typedef struct
234{
235 uint32_t dual_issue_valu : 1;
236 uint32_t inst_type : 4;
237 uint32_t reason_not_issued : 7;
238 uint32_t arb_state_issue : 10;
239 uint32_t arb_state_stall : 10;
241
242// TODO: The definition of this structure might change over time
243// to reduce the space needed to represent a single sample.
244/**
245 * @brief ROCProfiler PC Sampling Record corresponding to the interrupted wave.
246 */
247typedef struct
248{
249 uint64_t size; ///< Size of this struct
251 uint8_t chiplet; ///< chiplet index
252 uint8_t wave_id; ///< wave identifier within the workgroup
253 uint8_t wave_issued : 1;
254 uint8_t reserved : 7; ///< reserved 7 bits, must be zero
255 uint32_t hw_id; ///< compute unit identifier
256 uint64_t pc; ///< Program counter of the wave of the moment of interruption
257 uint64_t exec_mask;
258 rocprofiler_dim3_t workgroup_id; ///< wave coordinates within the workgroup
259 uint32_t wave_count;
260 uint64_t timestamp; ///< timestamp when sample is generated
263 snapshot; ///< @see ::rocprofiler_pc_sampling_snapshot_v1_t
264 uint32_t reserved2; ///< for future use
265
266 /// @var flags
267 /// @brief indicates what fields of this struct are meaningful for the represented sample.
268 /// The values depend on what the underlying GPU agent architecture supports.
269 /// @var wave_issued
270 /// @brief indicates whether the wave is issueing the instruction represented by the @ref pc
271 /// @var exec_mask
272 /// @brief shows how many SIMD lanes of the wave were executing the instruction
273 /// represented by the @ref pc. Useful to understand thread-divergance within the wave
274 /// @var wave_count
275 /// @brief number of active waves on the CU at the moment of sample generation
276 /// @var correlation_id
277 /// @brief correlation id of the API call that initiated kernel launch.
278 /// The interrupted wave is executed as part of the kernel.
280
281/**
282 * @brief Marker representing code object loading event.
283 *
284 * @see rocprofiler_callback_tracing_code_object_load_data_t
285 * for more information
286 */
287typedef struct
288{
289 uint64_t size; ///< Size of this struct
290 uint64_t code_object_id; /// unique code object identifier
292
293/**
294 * @brief Marker representing code object unloading event.
295 *
296 * @see rocprofiler_callback_tracing_code_object_load_data_t
297 * for more information
298 */
299typedef struct
300{
301 uint64_t size; ///< Size of this struct
302 uint64_t code_object_id; /// unique code object identifier
304
305/** @} */
306
307ROCPROFILER_EXTERN_C_FINI
308
310 static_assert(sizeof(rocprofiler_pc_sampling_record_t) == 80,
311 "Increasing the size of the pc sampling record is not permitted."));
312
313ROCPROFILER_CXX_CODE(static_assert(offsetof(rocprofiler_pc_sampling_record_t, chiplet) == 9 &&
314 offsetof(rocprofiler_pc_sampling_record_t, reserved2) == 76,
315 "PC sampling record layout changed."));
#define ROCPROFILER_CXX_CODE(...)
Definition defines.h:132
rocprofiler_pc_sampling_method_t
PC Sampling Method.
Definition fwd.h:288
rocprofiler_pc_sampling_unit_t
PC Sampling Unit.
Definition fwd.h:299
rocprofiler_status_t
Status codes.
Definition fwd.h:55
Agent Identifier.
Definition fwd.h:541
Context ID.
Definition fwd.h:502
ROCProfiler Record Correlation ID.
Definition fwd.h:518
Multi-dimensional struct of data used to describe GPU workgroup and grid sizes.
Definition fwd.h:566
uint64_t exec_mask
shows how many SIMD lanes of the wave were executing the instruction represented by the pc....
rocprofiler_correlation_id_t correlation_id
correlation id of the API call that initiated kernel launch. The interrupted wave is executed as part...
rocprofiler_pc_sampling_header_v1_t flags
indicates what fields of this struct are meaningful for the represented sample. The values depend on ...
uint64_t size
Size of this struct.
uint8_t has_wave_cnt
whether the rocprofiler_pc_sampling_record_t::wave_count contains meaningful value
uint32_t reserved2
for future use
uint32_t hw_id
compute unit identifier
uint64_t size
Size of this struct.
uint8_t reserved
reserved 7 bits, must be zero
uint8_t wave_issued
indicates whether the wave is issueing the instruction represented by the pc
rocprofiler_pc_sampling_method_t method
for future use
unsigned long max_interval
the lowest possible frequency for generating samples using method
uint8_t chiplet
chiplet index
uint64_t pc
Program counter of the wave of the moment of interruption.
rocprofiler_pc_sampling_snapshot_v1_t snapshot
unsigned long min_interval
the highest possible frequencey for generating samples using method.
uint32_t wave_count
number of active waves on the CU at the moment of sample generation
uint8_t wave_id
wave identifier within the workgroup
uint8_t has_stall_reason
whether the sample contains information about the stall reason. If so, please
uint8_t type
rocprofiler_pc_sampling_snapshot_v1_t field is valid
uint64_t timestamp
timestamp when sample is generated
rocprofiler_dim3_t workgroup_id
wave coordinates within the workgroup
rocprofiler_pc_sampling_unit_t unit
A unit used to specify the interval of the method for samples generation.
rocprofiler_status_t(* rocprofiler_available_pc_sampling_configurations_cb_t)(const rocprofiler_pc_sampling_configuration_t *configs, unsigned long num_config, void *user_data)
Rocprofiler SDK's callback function to deliver the list of available PC sampling configurations upon ...
rocprofiler_status_t rocprofiler_query_pc_sampling_agent_configurations(rocprofiler_agent_id_t agent_id, rocprofiler_available_pc_sampling_configurations_cb_t cb, void *user_data)
Query PC Sampling Configuration.
rocprofiler_status_t rocprofiler_configure_pc_sampling_service(rocprofiler_context_id_t context_id, rocprofiler_agent_id_t agent_id, rocprofiler_pc_sampling_method_t method, rocprofiler_pc_sampling_unit_t unit, uint64_t interval, rocprofiler_buffer_id_t buffer_id)
Function used to configure the PC sampling service on the GPU agent with agent_id.
Marker representing code object loading event.
Marker representing code object unloading event.
PC sampling configuration supported by a GPU agent.
The header of the rocprofiler_pc_sampling_record_t, indicating what fields of the rocprofiler_pc_samp...
ROCProfiler PC Sampling Record corresponding to the interrupted wave.