Agent Information

Agent Information#

Rocprofiler SDK Developer API: Agent Information
Rocprofiler SDK Developer API 0.6.0
ROCm Profiling API and tools

needs brief description More...

Data Structures

struct  rocprofiler_agent_cache_t
 Cache information for an agent. More...
 
struct  rocprofiler_agent_mem_bank_t
 Memory bank information for an agent. More...
 
struct  rocprofiler_agent_v0_t
 Stores the properties of an agent (CPU, GPU, etc.) More...
 

Typedefs

typedef rocprofiler_agent_v0_t rocprofiler_agent_t
 
typedef rocprofiler_status_t(* rocprofiler_query_available_agents_cb_t) (rocprofiler_agent_version_t version, const void **agents, unsigned long num_agents, void *user_data)
 Callback function type for querying the available agents.
 

Enumerations

enum  rocprofiler_agent_version_t {
  ROCPROFILER_AGENT_INFO_VERSION_NONE = 0 ,
  ROCPROFILER_AGENT_INFO_VERSION_0 = 1 ,
  ROCPROFILER_AGENT_INFO_VERSION_LAST
}
 Enumeration ID for version of the rocprofiler_agent_v*_t struct in rocprofiler_i. More...
 

Functions

rocprofiler_status_t rocprofiler_query_available_agents (rocprofiler_agent_version_t version, rocprofiler_query_available_agents_cb_t callback, unsigned long agent_size, void *user_data)
 Receive synchronous callback with an array of available agents at moment of invocation.
 

Detailed Description

needs brief description


Data Structure Documentation

◆ rocprofiler_agent_cache_t

struct rocprofiler_agent_cache_t

Cache information for an agent.

Definition at line 55 of file agent.h.

+ Collaboration diagram for rocprofiler_agent_cache_t:
Data Fields
uint32_t association Cache Associativity.
uint32_t cache_line_size Cache line size in bytes.
uint32_t cache_lines_per_tag Cache lines per Cache Tag.
uint32_t latency Cache latency in ns.
uint32_t level Integer representing level: 1, 2, 3, 4, etc.
uint64_t processor_id_low Identifies the processor number.
uint64_t size Size of the cache.
HsaCacheType type

◆ rocprofiler_agent_io_link_t

struct rocprofiler_agent_io_link_t

IO link information for an agent.

Definition at line 70 of file agent.h.

+ Collaboration diagram for rocprofiler_agent_io_link_t:
Data Fields
HSA_LINKPROPERTY flags override flags (may be active for specific platforms)
uint32_t max_bandwidth maximum interface Bandwidth in MB/s
uint32_t max_latency maximum cost of time to transfer (rounded to ns)
uint32_t min_bandwidth minimum interface Bandwidth in MB/s
uint32_t min_latency minimum cost of time to transfer (rounded to ns)
uint32_t node_from See rocprofiler_agent_id_t.
uint32_t node_to See rocprofiler_agent_id_t.
uint32_t recommended_transfer_size recommended transfer size to reach maximum bandwidth in bytes
HSA_IOLINKTYPE type Discoverable IoLink Properties (optional)
uint32_t version_major Bus interface version (optional)
uint32_t version_minor Bus interface version (optional)
uint32_t weight weight factor (derived from CDIT)

◆ rocprofiler_agent_mem_bank_t

struct rocprofiler_agent_mem_bank_t

Memory bank information for an agent.

Definition at line 90 of file agent.h.

+ Collaboration diagram for rocprofiler_agent_mem_bank_t:
Data Fields
HSA_MEMORYPROPERTY flags
HSA_HEAPTYPE heap_type
uint32_t mem_clk_max clock for the memory, this allows computing the available bandwidth to the memory when needed
uint64_t size_in_bytes physical memory size of the memory range in bytes
uint32_t width the number of parallel bits of the memoryinterface

◆ rocprofiler_agent_v0_t

struct rocprofiler_agent_v0_t

Stores the properties of an agent (CPU, GPU, etc.)

The node_id member is the KFD topology node id. It should be considered the "universal" indexing number. It is equivalent to the HSA-runtime HSA_AMD_AGENT_INFO_DRIVER_NODE_ID property of a hsa_agent_t. The const char* fields (name, vendor_name, etc.) are guaranteed to be valid pointers to null-terminated strings during tool finalization. Pointers to the agents via

See also
rocprofiler_query_available_agents are constant and will not be deallocated until after tool finalization. Making copies of the agent struct is also valid.

Definition at line 110 of file agent.h.

+ Collaboration diagram for rocprofiler_agent_v0_t:
Data Fields
uint32_t array_count Number of SIMD arrays.
const rocprofiler_agent_cache_t * caches
uint32_t caches_count

of discoverable cache affinity properties on this "H-NUMA" node.

HSA_CAPABILITY capability GPU only.
uint32_t cpu_core_id_base low value of the logical processor ID of the latency (= CPU) cores available on this node
uint32_t cpu_cores_count

of latency (= CPU) cores present on this HSA node. This value

is 0 for a HSA node with no such cores, e.g a "discrete HSA GPU"

uint32_t cu_count Number of compute units.
uint32_t cu_per_engine computed
uint32_t cu_per_simd_array Number of Compute Units (CU) per SIMD array.
uint16_t device_id GPU device id; 0 on latency (= CPU)-only nodes.
uint32_t domain PCI domain of the GPU.
uint32_t drm_render_minor DRM render device minor device number.
uint32_t family_id Family code.
HSA_ENGINE_ID fw_version GPU only. Identifier (rev) of the GPU uEngine or Firmware, may be 0.
uint32_t gds_size_in_kb Size of Global Data Store in Kilobytes shared across SIMD Wavefronts.
uint32_t gfx_target_version major_version=((value / 10000) % 100) minor_version=((value / 100) % 100) patch_version=(value % 100)
uint64_t gpu_id GPU only. KFD identifier.
rocprofiler_dim3_t grid_max_dim GPU only. Maximum number of work-items of each dimension of a grid.
uint32_t grid_max_size GPU only. Maximum number of fbarriers per work-group. Must be at least 32.
uint64_t hive_id XGMI Hive the GPU node belongs to in the system. It is an opaque and static number hash created by the PSP.
rocprofiler_agent_id_t id Internal opaque identifier.
const rocprofiler_agent_io_link_t * io_links
uint32_t io_links_count

of discoverable IO link affinity properties of this node

connecting to other nodes.

uint32_t lds_size_in_kb Size of Local Data Store in Kilobytes per SIMD Wavefront.
uint64_t local_mem_size GPU only. Local memory size.
uint32_t location_id GPU BDF (Bus/Device/function number) - identifies the device location in the overall system.
int32_t logical_node_id Logical sequence number. This will always be [0..N) where N is the total number of agents.
int32_t logical_node_type_id Logical sequence number with respect to other agents of same type. This will always be [0..N) where N is the total number of X agents (where X is a rocprofiler_agent_type_t value). This field is intended to help with environment variable indexing used to mask GPUs at runtime (i.e. HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES) which start at zero and only apply to GPUs, e.g., logical_node_type_id value for first GPU will be 0, second GPU will have value of 1, etc., regardless of however many agents of a different type preceeded (and thus increased the node_id or logical_node_id).

Example: a system with 2 CPUs and 2 GPUs, where the node ids are 0=CPU, 1=GPU, 2=CPU, 3=GPU, then then CPU node_ids 0 and 2 would have logical_node_type_id values of 0 and 1, respectively, and GPU node_ids 1 and 3 would also have logical_node_type_id values of 0 and 1.

uint32_t max_engine_clk_ccompute maximum engine clocks for CPU, including any boost capabilities
uint32_t max_engine_clk_fcompute GPU only. Maximum engine clocks for GPU, including any boost capabilities.
uint32_t max_slots_scratch_cu Number of temp. memory ("scratch") wave slots available to access, may be 0 if HW has no restrictions.
uint32_t max_waves_per_cu computed
uint32_t max_waves_per_simd This identifies the max. number of launched waves per SIMD. If NumFComputeCores is 0, this value is ignored.
const rocprofiler_agent_mem_bank_t * mem_banks
uint32_t mem_banks_count

of discoverable memory bank affinity properties on this

"H-NUMA" node.

const char * model_name GPU only. Will be something like vega20, mi200, etc.
const char * name Name of the agent. Will be identical to product name for CPU.
uint32_t node_id Node sequence number. This will be equivalent to the HSA-runtime HSA_AMD_AGENT_INFO_DRIVER_NODE_ID property.
uint32_t num_cp_queues number of Compute queues
uint32_t num_gws Number of GWS barriers.
uint32_t num_sdma_engines number of PCIe optimized SDMA engines
uint32_t num_sdma_queues_per_engine number of SDMA queue per one engine
uint32_t num_sdma_xgmi_engines number of XGMI optimized SDMA engines
uint32_t num_shader_banks Number of Shader Banks or Shader Engines, typical values are 1 or 2.
uint32_t num_xcc Number of XCC.
const char * product_name Marketing name.
int32_t reserved_padding0 padding logical_node_id to 64 bytes
HSA_ENGINE_VERSION sdma_fw_version GPU only.
uint32_t simd_arrays_per_engine Number of SIMD arrays per engine.
uint32_t simd_count

of HSA throughtput (= GPU) FCompute cores ("SIMD") present in a

node. This value is 0 if no FCompute cores are present (e.g. pure "CPU node").

uint32_t simd_id_base low value of the logical processor ID of the throughput (= GPU) units available on this node
uint32_t simd_per_cu Number of SIMD representing a Compute Unit (CU)
uint64_t size set to sizeof(rocprofiler_agent_t) by rocprofiler. This can be used for versioning and compatibility handling
rocprofiler_agent_type_t type Enumeration for identifying the agent type (CPU, GPU, etc.)
uint16_t vendor_id GPU vendor id; 0 on latency (= CPU)-only nodes.
const char * vendor_name Vendor of agent (will be AMD)
uint32_t wave_front_size Number of SIMD cores per wavefront executed, typically 64, may be 32 or a different value for some HSA based architectures.
rocprofiler_dim3_t workgroup_max_dim GPU only. Maximum number of work-items of each dimension of a work-group.
uint32_t workgroup_max_size GPU only. Maximum total number of work-items in a work-group.

Typedef Documentation

◆ rocprofiler_agent_t

Definition at line 212 of file agent.h.

◆ rocprofiler_query_available_agents_cb_t

typedef rocprofiler_status_t(* rocprofiler_query_available_agents_cb_t) (rocprofiler_agent_version_t version, const void **agents, unsigned long num_agents, void *user_data)

Callback function type for querying the available agents.

If callback is invoked, returns the rocprofiler_status_t value returned from callback

Parameters
[in]versionEnum specifying the version of agent info
[in]agentsArray of pointers to agents
[in]num_agentsNumber of agents in array
[in]user_dataData pointer passback
Returns
rocprofiler_status_t
Return values
ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_ABIsize of the agent struct in application is larger than the agent struct for rocprofiler-sdk
ROCPROFILER_STATUS_ERROR_INVALID_ARGUMENTInvalid rocprofiler_agent_version_t value

Definition at line 228 of file agent.h.

Enumeration Type Documentation

◆ rocprofiler_agent_version_t

Enumeration ID for version of the rocprofiler_agent_v*_t struct in rocprofiler_i.

Enumerator
ROCPROFILER_AGENT_INFO_VERSION_NONE 
ROCPROFILER_AGENT_INFO_VERSION_0 
ROCPROFILER_AGENT_INFO_VERSION_LAST 

Definition at line 45 of file agent.h.

46{
rocprofiler_agent_version_t
Enumeration ID for version of the rocprofiler_agent_v*_t struct in rocprofiler_i.
Definition agent.h:46
@ ROCPROFILER_AGENT_INFO_VERSION_NONE
Definition agent.h:47
@ ROCPROFILER_AGENT_INFO_VERSION_0
Definition agent.h:48
@ ROCPROFILER_AGENT_INFO_VERSION_LAST
Definition agent.h:49

Function Documentation

◆ rocprofiler_query_available_agents()

rocprofiler_status_t rocprofiler_query_available_agents ( rocprofiler_agent_version_t  version,
rocprofiler_query_available_agents_cb_t  callback,
unsigned long  agent_size,
void *  user_data 
)

Receive synchronous callback with an array of available agents at moment of invocation.

Parameters
[in]versionEnum value specifying the struct type of the agent info
[in]callbackCallback function accepting list of agents
[in]agent_sizeShould be set to sizeof(rocprofiler_agent_t)
[in]user_dataData pointer provided to callback
Returns
rocprofiler_status_t