rocprofiler-sdk/pc_sampling.h Source File

rocprofiler-sdk/pc_sampling.h Source File#

ROCprofiler-SDK developer API: rocprofiler-sdk/pc_sampling.h Source File
ROCprofiler-SDK developer API 1.0.0
ROCm Profiling API and tools
pc_sampling.h
1// MIT License
2//
3// Copyright (c) 2023-2025 Advanced Micro Devices, Inc. All rights reserved.
4//
5// Permission is hereby granted, free of charge, to any person obtaining a copy
6// of this software and associated documentation files (the "Software"), to deal
7// in the Software without restriction, including without limitation the rights
8// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9// copies of the Software, and to permit persons to whom the Software is
10// furnished to do so, subject to the following conditions:
11//
12// The above copyright notice and this permission notice shall be included in all
13// copies or substantial portions of the Software.
14//
15// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21// SOFTWARE.
22
23#pragma once
24
25#include <rocprofiler-sdk/agent.h>
26#include <rocprofiler-sdk/defines.h>
27#include <rocprofiler-sdk/fwd.h>
28
29ROCPROFILER_EXTERN_C_INIT
30
31/**
32 * @defgroup PC_SAMPLING_SERVICE PC Sampling
33 * @brief Enabling PC (Program Counter) Sampling for GPU Activity
34 * @{
35 */
36
37/**
38 * @brief (experimental) Function used to configure the PC sampling service on the GPU agent with @p
39 * agent_id.
40 *
41 * Prerequisites are the following:
42 * - The client must create a context and supply its @p context_id. By using this context,
43 * the client can start/stop PC sampling on the agent. For more information,
44 * please @see rocprofiler_start_context/rocprofiler_stop_context.
45 * - The user must create a buffer and supply its @p buffer_id. Rocprofiler-SDK uses the buffer
46 * to deliver the PC samples to the client. For more information about the data delivery,
47 * please @see rocprofiler_create_buffer and @see rocprofiler_buffer_tracing_cb_t.
48 *
49 * Before calling this function, we recommend querying PC sampling configurations
50 * supported by the GPU agent via the @see rocprofiler_query_pc_sampling_agent_configurations.
51 * The client chooses the @p method, @p unit, and @p interval to match one of the
52 * available configurations. Note that the @p interval must belong to the range of values
53 * [available_config.min_interval, available_config.max_interval],
54 * where available_config is the instance of the @see rocprofiler_pc_sampling_configuration_s
55 * supported/available at the moment.
56 *
57 * Rocprofiler-SDK checks whether the requsted configuration is actually supported
58 * at the moment of calling this function. If the answer is yes, it returns
59 * the @see ROCPROFILER_STATUS_SUCCESS. Otherwise, it notifies the client about the
60 * rejection reason via the returned status code. For more information
61 * about the status codes, please @see rocprofiler_status_t.
62 *
63 * There are a few constraints a client's code needs to be aware of.
64 *
65 * Constraint1: A GPU agent can be configured to support at most one running PC sampling
66 * configuration at any time, which implies some of the consequences described below.
67 * After the tool configures the PC sampling with one of the available configurations,
68 * rocprofiler-SDK guarantees that this configuration will be valid for the tool's
69 * lifetime. The tool can start and stop the configured PC sampling service whenever convenient.
70 *
71 * Constraint2: Since the same GPU agent can be used by multiple processes concurrently,
72 * Rocprofiler-SDK cannot guarantee the exclusive access to the PC sampling capability.
73 * The consequence is the following scenario. The tool TA that belongs to the process PA,
74 * calls the @see rocprofiler_query_pc_sampling_agent_configurations that returns the
75 * two supported configurations CA and CB by the agent. Then the tool TB of the process PB,
76 * configures the PC sampling on the same agent by using the configuration CB.
77 * Subsequently, the TA tries configuring the CA on the agent, and it fails.
78 * To point out that this case happened, we introduce a special status code
79 * @see ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE.
80 * When this status code is observed by the tool TA, it queries all available configurations again
81 * by calling @see rocprofiler_query_pc_sampling_agent_configurations,
82 * that returns only CB this time. The tool TA can choose CB, so that both
83 * TA and TB use the PC sampling capability in the separate processes.
84 * Both TA and TB receives samples generated by the kernels launched by the
85 * corresponding processes PA and PB, respectively.
86 *
87 * Constraint3: Rocprofiler-SDK allows only one context to contain the configured PC sampling
88 * service within the process, that implies that at most one of the loaded tools can use PC
89 * sampling. One context can contains multiple PC sampling services configured for different GPU
90 * agents.
91 *
92 * Constraint4: PC sampling feature is not available within the ROCgdb.
93 *
94 * Constraint5: PC sampling service cannot be used simultaneously with
95 * counter collection service.
96 *
97 * @param [in] context_id - id of the context used for starting/stopping PC sampling service
98 * @param [in] agent_id - id of the agent on which caller tries using PC sampling capability
99 * @param [in] method - the type of PC sampling the caller tries to use on the agent.
100 * @param [in] unit - The unit appropriate to the PC sampling type/method.
101 * @param [in] interval - frequency at which PC samples are generated
102 * @param [in] buffer_id - id of the buffer used for delivering PC samples
103 * @param [in] flags - for future use
104 * @return ::rocprofiler_status_t
105 * @retval ::ROCPROFILER_STATUS_SUCCESS PC sampling service configured successfully
106 * @retval ::ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE One of the scenarios is present:
107 * 1. PC sampling is already configured with configuration different than requested,
108 * 2. PC sampling is requested from a process that runs within the ROCgdb.
109 * 3. HSA runtime does not support PC sampling.
110 * 4. GPU device does not support requested PC sampling method.
111 * @retval ::ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL the amdgpu driver installed on the system
112 * does not support the PC sampling feature
113 * @retval ::ROCPROFILER_STATUS_ERROR a general error caused by the amdgpu driver
114 * @retval ::ROCPROFILER_STATUS_ERROR_CONTEXT_CONFLICT counter collection service already
115 * setup in the context
116 * @retval ::ROCPROFILER_STATUS_ERROR_INVALID_ARGUMENT function invoked with an invalid argument
117 */
118ROCPROFILER_SDK_EXPERIMENTAL
121 rocprofiler_agent_id_t agent_id,
124 uint64_t interval,
125 rocprofiler_buffer_id_t buffer_id,
126 int flags) ROCPROFILER_API;
127
128/**
129 * @brief (experimental) Enumeration describing values of flags of
130 * ::rocprofiler_pc_sampling_configuration_t.
131 */
132typedef enum ROCPROFILER_SDK_EXPERIMENTAL rocprofiler_pc_sampling_configuration_flags_t
133{
134 ROCPROFILER_PC_SAMPLING_CONFIGURATION_FLAGS_NONE = 0,
135 ROCPROFILER_PC_SAMPLING_CONFIGURATION_FLAGS_INTERVAL_POW2,
136 ROCPROFILER_PC_SAMPLING_CONFIGURATION_FLAGS_LAST
137
138 /// @var ROCPROFILER_PC_SAMPLING_CONFIGURATION_FLAGS_INTERVAL_POW2
139 /// @brief The interval value must be a power of 2.
141
142/**
143 * @brief (experimental) PC sampling configuration supported by a GPU agent.
144 */
145typedef struct ROCPROFILER_SDK_EXPERIMENTAL rocprofiler_pc_sampling_configuration_t
146{
147 uint64_t size; ///< Size of this struct
152 uint64_t flags; ///< take values from ::rocprofiler_pc_sampling_configuration_flags_t
153
154 /// @var method
155 /// @brief Sampling method supported by the GPU agent.
156 /// Currently, it can take one of the following two values:
157 /// - ::ROCPROFILER_PC_SAMPLING_METHOD_HOST_TRAP: a background host kernel thread
158 /// periodically interrupts waves execution on the GPU to generate PC samples
159 /// - ::ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC: performance monitoring hardware
160 /// on the GPU periodically interrupts waves to generate PC samples.
161 /// @var unit
162 /// @brief A unit used to specify the interval of the @ref method for samples generation.
163 /// @var min_interval
164 /// @brief the highest possible frequencey for generating samples using @ref method.
165 /// @var max_interval
166 /// @brief the lowest possible frequency for generating samples using @ref method
167
169
170/**
171 * @brief (experimental) Rocprofiler SDK's callback function to deliver the list of available PC
172 * sampling configurations upon the call to the
173 * ::rocprofiler_query_pc_sampling_agent_configurations.
174 *
175 * @param[out] configs - The array of PC sampling configurations supported by the agent
176 * at the moment of invoking ::rocprofiler_query_pc_sampling_agent_configurations.
177 * @param[out] num_config - The number of configurations contained in the underlying array
178 * @p configs.
179 * In case the GPU agent does not support PC sampling, the value is 0.
180 * @param[in] user_data - client's private data passed via
181 * ::rocprofiler_query_pc_sampling_agent_configurations
182 * @return ::rocprofiler_status_t
183 */
184ROCPROFILER_SDK_EXPERIMENTAL
187 size_t num_config,
188 void* user_data);
189
190/**
191 * @brief (experimental) Query PC Sampling Configuration.
192 *
193 * Lists PC sampling configurations a GPU agent with @p agent_id supports at the moment
194 * of invoking the function. Delivers configurations via @p cb.
195 * In case the PC sampling is configured on the GPU agent, the @p cb delivers information
196 * about the active PC sampling configuration.
197 * In case the GPU agent does not support PC sampling capability,
198 * the @p cb delivers none PC sampling configurations.
199 *
200 * @param [in] agent_id - id of the agent for which available configurations will be listed
201 * @param [in] cb - User callback that delivers the available PC sampling configurations
202 * @param [in] user_data - passed to the @p cb
203 * @return ::rocprofiler_status_t
204 * @retval ::ROCPROFILER_STATUS_ERROR_NOT_AVAILABLE One of the scenarios is present:
205 * 1. PC sampling is requested from a process that runs within the ROCgdb.
206 * 2. HSA runtime does not support PC sampling.
207 * @retval ::ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL the amdgpu driver installed on the system
208 * does not support the PC sampling feature.
209 * @retval ::ROCPROFILER_STATUS_ERROR a general error caused by the amdgpu driver
210 * @retval ::ROCPROFILER_STATUS_SUCCESS @p cb successfully finished
211 */
212ROCPROFILER_SDK_EXPERIMENTAL
215 rocprofiler_agent_id_t agent_id,
217 void* user_data) ROCPROFILER_API ROCPROFILER_NONNULL(2, 3);
218
219/**
220 * @brief (experimental) Information about the GPU part where wave was executing
221 * at the moment of sampling.
222 */
223typedef struct ROCPROFILER_SDK_EXPERIMENTAL rocprofiler_pc_sampling_hw_id_v0_t
224{
225 uint64_t chiplet : 6; ///< chiplet index (3 bits allocated by the ROCr runtime)
226 uint64_t wave_id : 7; ///< wave slot index
227 uint64_t simd_id : 2; ///< SIMD index
228 uint64_t pipe_id : 4; ///< pipe index
229 uint64_t cu_or_wgp_id : 4;
230 uint64_t shader_array_id : 1; ///< Shared array index
231 uint64_t shader_engine_id : 5; ///< shared engine index
232 uint64_t workgroup_id : 7; ///< thread_group index on GFX9, and workgroup index on GFX10+
233 uint64_t vm_id : 6; ///< virtual memory ID
234 uint64_t queue_id : 4; ///< queue id
235 uint64_t microengine_id : 2; ///< ACE (microengine) index
236 uint64_t reserved0 : 16; ///< Reserved for the future use
237
238 /// @var cu_or_wgp_id
239 /// @brief Compute unit index on GFX9 or workgroup processor index on GFX10+.
241
242/**
243 * @brief (experimental) Sampled program counter.
244 */
245typedef struct ROCPROFILER_SDK_EXPERIMENTAL rocprofiler_pc_t
246{
249
250 /// @var code_object_id
251 /// @brief id of the loaded code object instance that contains sampled PC.
252 /// This fields holds the value ::ROCPROFILER_CODE_OBJECT_ID_NONE
253 /// if the code object cannot be determined
254 /// (e.g., sampled PC belongs to code generated by self modifying code).
255 /// @var code_object_offset
256 /// @brief If @ref code_object_id is different than ::ROCPROFILER_CODE_OBJECT_ID_NONE,
257 /// then this field contains the offset of the sampled PC relative to the
258 /// ::rocprofiler_callback_tracing_code_object_load_data_t.load_base
259 /// of the code object instance with @ref code_object_id.
260 /// To calculate the original virtual address of the sampled PC, one can add the value
261 /// of this field to the ::rocprofiler_callback_tracing_code_object_load_data_t.load_base.
262 /// The value of @ref code_object_offset matches
263 /// the virtual address of the sampled instruction (PC), only if the
264 /// @ref code_object_id is equal to the ::ROCPROFILER_CODE_OBJECT_ID_NONE.
266
267/**
268 * @brief (experimental) ROCProfiler Host-Trap PC Sampling Record.
269 */
270typedef struct ROCPROFILER_SDK_EXPERIMENTAL rocprofiler_pc_sampling_record_host_trap_v0_t
271{
272 uint64_t size; ///< Size of this struct
273 rocprofiler_pc_sampling_hw_id_v0_t hw_id; ///< @see ::rocprofiler_pc_sampling_hw_id_v0_t
274 rocprofiler_pc_t pc; ///< information about sampled program counter
275 uint64_t exec_mask; ///< active SIMD lanes when sampled
276 uint64_t timestamp; ///< timestamp when sample is generated
277 uint64_t dispatch_id; ///< originating kernel dispatch ID
279 rocprofiler_dim3_t workgroup_id; ///< wave coordinates within the workgroup
280 uint32_t wave_in_group : 8; ///< wave position within the workgroup (0-31)
281 uint32_t reserved0 : 24; ///< wave position within the workgroup (0-31)
282
283 /// @var correlation_id
284 /// @brief API launch call id that matches dispatch ID
286
287/**
288 * @brief (experimental) The header of the ::rocprofiler_pc_sampling_record_stochastic_v0_t,
289 * indicating what fields of the ::rocprofiler_pc_sampling_record_stochastic_v0_t instance are
290 * meaningful for the sample.
291 */
292typedef struct ROCPROFILER_SDK_EXPERIMENTAL rocprofiler_pc_sampling_record_stochastic_header_t
293{
294 uint8_t has_memory_counter : 1; ///< pc sample provides memory counters information
295 ///< via ::rocprofiler_pc_sampling_memory_counters_t
296 uint8_t reserved_type : 7;
298
299/**
300 * @brief (experimental) Enumeration describing type of sampled issued instruction.
301 */
302typedef enum ROCPROFILER_SDK_EXPERIMENTAL rocprofiler_pc_sampling_instruction_type_t
303{
304 ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_NONE = 0,
307 ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_SCALAR, ///< scalar (memory) instruction
308 ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_TEX, ///< texture memory instruction
310 ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_LDS_DIRECT, ///< LDS direct memory instruction
315 ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_BRANCH_NOT_TAKEN,
316 ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_BRANCH_TAKEN,
318 ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_OTHER, ///< other types of instruction
320 ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_DUAL_VALU, /// dual VALU instruction
321 ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_LAST
322
323 /// @var ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_BRANCH_NOT_TAKEN
324 /// @brief Instruction representing a branch not being taken.
325 /// @var ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_BRANCH_TAKEN
326 /// @brief Instruction representing a taken branch.
328
329/**
330 * @brief (experimental) Enumeration describing reason for not issuing an instruction.
331 */
332typedef enum ROCPROFILER_SDK_EXPERIMENTAL rocprofiler_pc_sampling_instruction_not_issued_reason_t
333{
334 ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_NONE = 0,
335 ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_NO_INSTRUCTION_AVAILABLE,
336 ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_ALU_DEPENDENCY,
338 ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_INTERNAL_INSTRUCTION,
340 ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_ARBITER_NOT_WIN,
341 ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_ARBITER_WIN_EX_STALL,
342 ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_OTHER_WAIT,
344 ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_LAST
345
346 /// @var ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_NO_INSTRUCTION_AVAILABLE
347 /// @brief No instruction available in the instruction cache.
348 /// @var ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_ALU_DEPENDENCY
349 /// @brief ALU dependency not resolved.
350 /// @var ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_INTERNAL_INSTRUCTION
351 /// @brief Wave executes an internal instruction.
352 /// @var ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_ARBITER_NOT_WIN
353 /// @brief The instruction did not win the arbiter.
354 /// @var ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_ARBITER_WIN_EX_STALL
355 /// @brief Arbiter issued an instruction, but the execution pipe pushed it back from execution.
356 /// @var ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_OTHER_WAIT
357 /// @brief Other types of wait (e.g., wait for XNACK acknowledgment).
359
360/**
361 * @brief (experimental) Data provided by stochastic sampling hardware.
362 *
363 */
364typedef struct ROCPROFILER_SDK_EXPERIMENTAL rocprofiler_pc_sampling_snapshot_v0_t
365{
366 uint32_t reason_not_issued : 4;
367 uint32_t reserved0 : 1; ///< reserved for future use
368 uint32_t arb_state_issue_valu : 1; ///< arbiter issued a VALU instruction
369 uint32_t arb_state_issue_matrix : 1; ///< arbiter issued a matrix instruction
370 uint32_t arb_state_issue_lds : 1; ///< arbiter issued a LDS instruction
371 uint32_t arb_state_issue_lds_direct : 1; ///< arbiter issued a LDS direct instruction
372 uint32_t arb_state_issue_scalar : 1; ///< arbiter issued a scalar (SALU/SMEM) instruction
373 uint32_t arb_state_issue_vmem_tex : 1; ///< arbiter issued a texture instruction
374 uint32_t arb_state_issue_flat : 1; ///< arbiter issued a FLAT instruction
375 uint32_t arb_state_issue_exp : 1; ///< arbiter issued a export instruction
376 uint32_t arb_state_issue_misc : 1; ///< arbiter issued a miscellaneous instruction
377 uint32_t arb_state_issue_brmsg : 1; ///< arbiter issued a branch/message instruction
378 uint32_t arb_state_issue_reserved : 1; ///< reserved for the future use
380 uint32_t arb_state_stall_matrix : 1; ///< matrix instruction was stalled
381 uint32_t arb_state_stall_lds : 1; ///< LDS instruction was stalled
382 uint32_t arb_state_stall_lds_direct : 1; ///< LDS direct instruction was stalled
383 uint32_t arb_state_stall_scalar : 1; ///< Scalar (SALU/SMEM) instruction was stalled
384 uint32_t arb_state_stall_vmem_tex : 1; ///< texture instruction was stalled
385 uint32_t arb_state_stall_flat : 1; ///< flat instruction was stalled
386 uint32_t arb_state_stall_exp : 1; ///< export instruction was stalled
387 uint32_t arb_state_stall_misc : 1; ///< miscellaneous instruction was stalled
388 uint32_t arb_state_stall_brmsg : 1; ///< branch/message instruction was stalled
389 uint32_t arb_state_state_reserved : 1; ///< reserved for the future use
390 // We have two reserved bits
391 uint32_t dual_issue_valu : 1;
392 uint32_t reserved1 : 1; ///< reserved for the future use
393 uint32_t reserved2 : 3; ///< reserved for the future use
394
395 /// @var reason_not_issued
396 /// @brief The reason for not issuing an instruction. The field takes one of the value defined
397 /// in ::rocprofiler_pc_sampling_instruction_not_issued_reason_t
398 /// @var arb_state_stall_valu
399 /// @brief VALU instruction was stalled when a sample was generated
400 /// @var dual_issue_valu
401 /// @brief Two VALU instructions were issued for coexecution (MI3xx specific)
403
404/**
405 * @brief (experimental) Counters of issued but not yet completed instructions.
406 */
407typedef struct ROCPROFILER_SDK_EXPERIMENTAL rocprofiler_pc_sampling_memory_counters_t
408{
409 uint32_t load_cnt : 6;
410 uint32_t store_cnt : 6;
411 uint32_t bvh_cnt : 3;
412 uint32_t sample_cnt : 6;
413 uint32_t ds_cnt : 6;
414 uint32_t km_cnt : 5;
415
416 /// @var load_cnt
417 /// @brief Counts the number of VMEM load instructions issued but not yet completed.
418 /// @var store_cnt
419 /// @brief Counts the number of VMEM store instructions issued but not yet completed.
420 /// @var bvh_cnt
421 /// @brief Counts the number of VMEM BVH instructions issued but not yet completed.
422 /// @var sample_cnt
423 /// @brief Counts the number of VMEM sample instructions issued but not yet completed.
424 /// @var ds_cnt
425 /// @brief Counts the number of LDS instructions issued but not yet completed.
426 /// @var km_cnt
427 /// @brief Counts the number of scalar memory reads and memory instructions issued but not yet
428 /// completed.
430
431/**
432 * @brief (experimental) ROCProfiler Stochastic PC Sampling Record.
433 */
434typedef struct ROCPROFILER_SDK_EXPERIMENTAL rocprofiler_pc_sampling_record_stochastic_v0_t
435{
436 uint64_t size; ///< Size of this struct
439 uint8_t wave_issued : 1;
440 uint8_t inst_type : 5;
441 uint8_t reserved : 2;
444 uint64_t exec_mask;
446 uint32_t wave_count;
447 uint64_t timestamp;
448 uint64_t dispatch_id;
452
453 /// @var flags
454 /// @brief Defines what fields are meaningful for the sample.
455 /// @var wave_in_group
456 /// @brief wave position within the workgroup (0-15)
457 /// @var wave_issued
458 /// @brief wave issued the instruction represented with the PC
459 /// @var inst_type
460 /// @brief instruction type, takes a value defined in @ref
461 /// ::rocprofiler_pc_sampling_instruction_type_t
462 /// @var reserved
463 /// @brief reserved 2 bits must be zero
464 /// @var hw_id
465 /// @brief @see ::rocprofiler_pc_sampling_hw_id_v0_t
466 /// @var pc
467 /// @brief information about sampled program counter
468 /// @var exec_mask
469 /// @brief active SIMD lanes at the moment of sampling
470 /// @var workgroup_id
471 /// @brief wave coordinates within the workgroup
472 /// @var wave_count
473 /// @brief active waves on the CU at the moment of sampling
474 /// @var timestamp
475 /// @brief timestamp when sample is generated
476 /// @var dispatch_id
477 /// @brief originating kernel dispatch ID
478 /// @var correlation_id
479 /// @brief API launch call id that matches dispatch ID
480 /// @var snapshot
481 /// @brief Data provided by stochastic sampling hardware. @see
482 /// ::rocprofiler_pc_sampling_snapshot_v0_t
483 /// @var memory_counters
484 /// @brief Counters of issued but not yet completed instructions. @see
485 /// ::rocprofiler_pc_sampling_memory_counters_t
487
488/**
489 * @brief (experimental) Record representing an invalid PC Sampling Record.
490 */
491typedef struct ROCPROFILER_SDK_EXPERIMENTAL rocprofiler_pc_sampling_record_invalid_t
492{
493 uint64_t size; ///< Size of the struct
495
496/**
497 * @brief (experimental) Return the string encoding of ::rocprofiler_pc_sampling_instruction_type_t
498 * value
499 * @param [in] instruction_type instruction type enum value
500 * @return Will return a nullptr if invalid/unsupported ::rocprofiler_pc_sampling_instruction_type_t
501 * value is provided.
502 */
503ROCPROFILER_SDK_EXPERIMENTAL
504const char*
506 rocprofiler_pc_sampling_instruction_type_t instruction_type) ROCPROFILER_API;
507
508/**
509 * @brief (experimental) Return the string encoding of
510 * ::rocprofiler_pc_sampling_instruction_not_issued_reason_t value
511 * @param [in] not_issued_reason no issue reason enum value
512 * @return Will return a nullptr if invalid/unsupported
513 * ::rocprofiler_pc_sampling_instruction_not_issued_reason_t value is provided.
514 */
515ROCPROFILER_SDK_EXPERIMENTAL const char*
517 rocprofiler_pc_sampling_instruction_not_issued_reason_t not_issued_reason) ROCPROFILER_API;
518
519/** @} */
520
521ROCPROFILER_EXTERN_C_FINI
rocprofiler_pc_sampling_method_t
PC Sampling Method.
Definition fwd.h:350
rocprofiler_pc_sampling_unit_t
PC Sampling Unit.
Definition fwd.h:361
rocprofiler_status_t
Status codes.
Definition fwd.h:49
Agent Identifier.
Definition fwd.h:677
ROCProfiler Correlation ID record for async activity.
Definition fwd.h:643
Context ID.
Definition fwd.h:600
Multi-dimensional struct of data used to describe GPU workgroup and grid sizes.
Definition fwd.h:702
uint32_t arb_state_stall_exp
export instruction was stalled
uint32_t arb_state_stall_valu
VALU instruction was stalled when a sample was generated.
rocprofiler_pc_sampling_snapshot_v0_t snapshot
Data provided by stochastic sampling hardware.
uint64_t timestamp
timestamp when sample is generated
uint32_t reserved1
reserved for the future use
uint32_t arb_state_stall_misc
miscellaneous instruction was stalled
uint32_t arb_state_issue_lds_direct
arbiter issued a LDS direct instruction
uint32_t arb_state_issue_exp
arbiter issued a export instruction
uint8_t wave_in_group
wave position within the workgroup (0-15)
uint64_t reserved0
Reserved for the future use.
uint64_t shader_array_id
Shared array index.
rocprofiler_async_correlation_id_t correlation_id
API launch call id that matches dispatch ID.
rocprofiler_pc_sampling_memory_counters_t memory_counters
Counters of issued but not yet completed instructions.
uint32_t arb_state_stall_flat
flat instruction was stalled
uint32_t sample_cnt
Counts the number of VMEM sample instructions issued but not yet completed.
uint64_t size
Size of this struct.
uint32_t bvh_cnt
Counts the number of VMEM BVH instructions issued but not yet completed.
uint32_t arb_state_issue_lds
arbiter issued a LDS instruction
uint64_t size
Size of the struct.
uint64_t timestamp
timestamp when sample is generated
uint64_t code_object_id
id of the loaded code object instance that contains sampled PC. This fields holds the value ROCPROFIL...
uint32_t arb_state_issue_valu
arbiter issued a VALU instruction
uint64_t workgroup_id
thread_group index on GFX9, and workgroup index on GFX10+
uint8_t reserved
reserved 2 bits must be zero
uint64_t dispatch_id
originating kernel dispatch ID
uint32_t wave_count
active waves on the CU at the moment of sampling
uint32_t arb_state_issue_matrix
arbiter issued a matrix instruction
uint32_t arb_state_issue_brmsg
arbiter issued a branch/message instruction
rocprofiler_pc_t pc
information about sampled program counter
uint32_t arb_state_stall_scalar
Scalar (SALU/SMEM) instruction was stalled.
uint32_t arb_state_stall_matrix
matrix instruction was stalled
uint32_t reserved0
wave position within the workgroup (0-31)
uint64_t dispatch_id
originating kernel dispatch ID
uint64_t microengine_id
ACE (microengine) index.
uint64_t shader_engine_id
shared engine index
uint64_t exec_mask
active SIMD lanes when sampled
rocprofiler_pc_sampling_hw_id_v0_t hw_id
uint32_t dual_issue_valu
Two VALU instructions were issued for coexecution (MI3xx specific)
uint32_t arb_state_issue_scalar
arbiter issued a scalar (SALU/SMEM) instruction
uint64_t flags
take values from rocprofiler_pc_sampling_configuration_flags_t
uint32_t reserved2
reserved for the future use
rocprofiler_pc_sampling_record_stochastic_header_t flags
Defines what fields are meaningful for the sample.
rocprofiler_pc_t pc
information about sampled program counter
uint32_t wave_in_group
wave position within the workgroup (0-31)
rocprofiler_pc_sampling_method_t method
Sampling method supported by the GPU agent. Currently, it can take one of the following two values:
unsigned long max_interval
the lowest possible frequency for generating samples using method
uint32_t ds_cnt
Counts the number of LDS instructions issued but not yet completed.
uint64_t chiplet
chiplet index (3 bits allocated by the ROCr runtime)
uint64_t cu_or_wgp_id
Compute unit index on GFX9 or workgroup processor index on GFX10+.
rocprofiler_dim3_t workgroup_id
wave coordinates within the workgroup
uint32_t arb_state_state_reserved
reserved for the future use
uint8_t wave_issued
wave issued the instruction represented with the PC
unsigned long min_interval
the highest possible frequencey for generating samples using method.
uint32_t arb_state_stall_lds
LDS instruction was stalled.
rocprofiler_async_correlation_id_t correlation_id
API launch call id that matches dispatch ID.
uint32_t arb_state_issue_flat
arbiter issued a FLAT instruction
uint32_t arb_state_stall_brmsg
branch/message instruction was stalled
uint32_t reason_not_issued
The reason for not issuing an instruction. The field takes one of the value defined in rocprofiler_pc...
uint64_t exec_mask
active SIMD lanes at the moment of sampling
rocprofiler_dim3_t workgroup_id
wave coordinates within the workgroup
uint64_t wave_id
wave slot index
uint8_t has_memory_counter
pc sample provides memory counters information via rocprofiler_pc_sampling_memory_counters_t
uint32_t arb_state_stall_vmem_tex
texture instruction was stalled
uint32_t km_cnt
Counts the number of scalar memory reads and memory instructions issued but not yet completed.
uint32_t arb_state_issue_vmem_tex
arbiter issued a texture instruction
uint64_t vm_id
virtual memory ID
uint32_t arb_state_stall_lds_direct
LDS direct instruction was stalled.
uint32_t arb_state_issue_reserved
reserved for the future use
uint32_t arb_state_issue_misc
arbiter issued a miscellaneous instruction
rocprofiler_pc_sampling_hw_id_v0_t hw_id
uint32_t load_cnt
Counts the number of VMEM load instructions issued but not yet completed.
uint32_t store_cnt
Counts the number of VMEM store instructions issued but not yet completed.
uint32_t reserved0
reserved for future use
uint64_t code_object_offset
If code_object_id is different than ROCPROFILER_CODE_OBJECT_ID_NONE, then this field contains the off...
rocprofiler_pc_sampling_unit_t unit
A unit used to specify the interval of the method for samples generation.
uint8_t inst_type
instruction type, takes a value defined in rocprofiler_pc_sampling_instruction_type_t
rocprofiler_pc_sampling_configuration_flags_t
(experimental) Enumeration describing values of flags of rocprofiler_pc_sampling_configuration_t.
rocprofiler_status_t(* rocprofiler_available_pc_sampling_configurations_cb_t)(const rocprofiler_pc_sampling_configuration_t *configs, unsigned long num_config, void *user_data)
(experimental) Rocprofiler SDK's callback function to deliver the list of available PC sampling confi...
rocprofiler_pc_sampling_instruction_not_issued_reason_t
(experimental) Enumeration describing reason for not issuing an instruction.
const char * rocprofiler_get_pc_sampling_instruction_type_name(rocprofiler_pc_sampling_instruction_type_t instruction_type)
(experimental) Return the string encoding of rocprofiler_pc_sampling_instruction_type_t value
rocprofiler_status_t rocprofiler_configure_pc_sampling_service(rocprofiler_context_id_t context_id, rocprofiler_agent_id_t agent_id, rocprofiler_pc_sampling_method_t method, rocprofiler_pc_sampling_unit_t unit, uint64_t interval, rocprofiler_buffer_id_t buffer_id, int flags)
(experimental) Function used to configure the PC sampling service on the GPU agent with agent_id.
rocprofiler_status_t rocprofiler_query_pc_sampling_agent_configurations(rocprofiler_agent_id_t agent_id, rocprofiler_available_pc_sampling_configurations_cb_t cb, void *user_data)
(experimental) Query PC Sampling Configuration.
rocprofiler_pc_sampling_instruction_type_t
(experimental) Enumeration describing type of sampled issued instruction.
const char * rocprofiler_get_pc_sampling_instruction_not_issued_reason_name(rocprofiler_pc_sampling_instruction_not_issued_reason_t not_issued_reason)
(experimental) Return the string encoding of rocprofiler_pc_sampling_instruction_not_issued_reason_t ...
@ ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_SLEEP_WAIT
wave was sleeping
@ ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_BARRIER_WAIT
waiting on a barrier
@ ROCPROFILER_PC_SAMPLING_INSTRUCTION_NOT_ISSUED_REASON_WAITCNT
waitcnt dependency
@ ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_LDS
LDS memory instruction.
@ ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_BARRIER
barrier instruction
@ ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_OTHER
other types of instruction
@ ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_MATRIX
matrix instruction
@ ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_TEX
texture memory instruction
@ ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_NO_INST
no instruction issued
@ ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_MESSAGE
message instruction
@ ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_EXPORT
export instruction
@ ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_SCALAR
scalar (memory) instruction
@ ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_VALU
vector ALU instruction
@ ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_LDS_DIRECT
LDS direct memory instruction.
@ ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_JUMP
jump instruction
@ ROCPROFILER_PC_SAMPLING_INSTRUCTION_TYPE_FLAT
flat memory instruction
(experimental) PC sampling configuration supported by a GPU agent.
(experimental) Information about the GPU part where wave was executing at the moment of sampling.
(experimental) Counters of issued but not yet completed instructions.
(experimental) ROCProfiler Host-Trap PC Sampling Record.
(experimental) Record representing an invalid PC Sampling Record.
(experimental) The header of the rocprofiler_pc_sampling_record_stochastic_v0_t, indicating what fiel...
(experimental) ROCProfiler Stochastic PC Sampling Record.
(experimental) Data provided by stochastic sampling hardware.
(experimental) Sampled program counter.