Agent Information

Agent Information#

Rocprofiler SDK Developer API: Agent Information
Rocprofiler SDK Developer API 0.6.0
ROCm Profiling API and tools

needs brief description More...

Data Structures

struct  rocprofiler_agent_cache_t
 Cache information for an agent. More...
 
struct  rocprofiler_agent_mem_bank_t
 Memory bank information for an agent. More...
 
struct  rocprofiler_agent_runtime_visiblity_t
 Provides an estimate about the runtime visibility of an agent based on the environment variables (ROCR_VISIBLE_DEVICES, HIP_VISIBLE_DEVICES, GPU_DEVICE_ORDINAL, CUDA_VISIBLE_DEVICES). Reference: https://rocm.docs.amd.com/en/latest/conceptual/gpu-isolation.html. More...
 
struct  rocprofiler_agent_v0_t
 Stores the properties of an agent (CPU, GPU, etc.) More...
 

Typedefs

typedef rocprofiler_agent_v0_t rocprofiler_agent_t
 
typedef rocprofiler_status_t(* rocprofiler_query_available_agents_cb_t) (rocprofiler_agent_version_t version, const void **agents, unsigned long num_agents, void *user_data)
 Callback function type for querying the available agents.
 

Enumerations

enum  rocprofiler_agent_version_t {
  ROCPROFILER_AGENT_INFO_VERSION_NONE = 0 ,
  ROCPROFILER_AGENT_INFO_VERSION_0 = 1 ,
  ROCPROFILER_AGENT_INFO_VERSION_LAST
}
 Enumeration ID for version of the rocprofiler_agent_v*_t struct in rocprofiler_i. More...
 

Functions

rocprofiler_status_t rocprofiler_query_available_agents (rocprofiler_agent_version_t version, rocprofiler_query_available_agents_cb_t callback, unsigned long agent_size, void *user_data)
 Receive synchronous callback with an array of available agents at moment of invocation.
 

Detailed Description

needs brief description


Data Structure Documentation

◆ rocprofiler_agent_cache_t

struct rocprofiler_agent_cache_t

Cache information for an agent.

Definition at line 55 of file agent.h.

+ Collaboration diagram for rocprofiler_agent_cache_t:
Data Fields
uint32_t association Cache Associativity.
uint32_t cache_line_size Cache line size in bytes.
uint32_t cache_lines_per_tag Cache lines per Cache Tag.
uint32_t latency Cache latency in ns.
uint32_t level Integer representing level: 1, 2, 3, 4, etc.
uint64_t processor_id_low Identifies the processor number.
uint64_t size Size of the cache.
HsaCacheType type

◆ rocprofiler_agent_io_link_t

struct rocprofiler_agent_io_link_t

IO link information for an agent.

Definition at line 70 of file agent.h.

+ Collaboration diagram for rocprofiler_agent_io_link_t:
Data Fields
HSA_LINKPROPERTY flags override flags (may be active for specific platforms)
uint32_t max_bandwidth maximum interface Bandwidth in MB/s
uint32_t max_latency maximum cost of time to transfer (rounded to ns)
uint32_t min_bandwidth minimum interface Bandwidth in MB/s
uint32_t min_latency minimum cost of time to transfer (rounded to ns)
uint32_t node_from See rocprofiler_agent_id_t.
uint32_t node_to See rocprofiler_agent_id_t.
uint32_t recommended_transfer_size recommended transfer size to reach maximum bandwidth in bytes
HSA_IOLINKTYPE type Discoverable IoLink Properties (optional)
uint32_t version_major Bus interface version (optional)
uint32_t version_minor Bus interface version (optional)
uint32_t weight weight factor (derived from CDIT)

◆ rocprofiler_agent_mem_bank_t

struct rocprofiler_agent_mem_bank_t

Memory bank information for an agent.

Definition at line 90 of file agent.h.

+ Collaboration diagram for rocprofiler_agent_mem_bank_t:
Data Fields
HSA_MEMORYPROPERTY flags
HSA_HEAPTYPE heap_type
uint32_t mem_clk_max clock for the memory, this allows computing the available bandwidth to the memory when needed
uint64_t size_in_bytes physical memory size of the memory range in bytes
uint32_t width the number of parallel bits of the memoryinterface

◆ rocprofiler_agent_runtime_visiblity_t

struct rocprofiler_agent_runtime_visiblity_t

Provides an estimate about the runtime visibility of an agent based on the environment variables (ROCR_VISIBLE_DEVICES, HIP_VISIBLE_DEVICES, GPU_DEVICE_ORDINAL, CUDA_VISIBLE_DEVICES). Reference: https://rocm.docs.amd.com/en/latest/conceptual/gpu-isolation.html.

Definition at line 105 of file agent.h.

+ Collaboration diagram for rocprofiler_agent_runtime_visiblity_t:
Data Fields
uint32_t hip: 1

‍if not visible to HSA, agent not visible to anything built on HSA

uint32_t hsa: 1
uint32_t rccl: 1

‍Built on HSA

uint32_t reserved: 28

‍Built on HIP

uint32_t rocdecode: 1

‍Built on HIP

◆ rocprofiler_agent_v0_t

struct rocprofiler_agent_v0_t

Stores the properties of an agent (CPU, GPU, etc.)

The node_id member is the KFD topology node id. It should be considered the "universal" indexing number. It is equivalent to the HSA-runtime HSA_AMD_AGENT_INFO_DRIVER_NODE_ID property of a hsa_agent_t. The const char* fields (name, vendor_name, etc.) are guaranteed to be valid pointers to null-terminated strings during tool finalization. Pointers to the agents via

See also
rocprofiler_query_available_agents are constant and will not be deallocated until after tool finalization. Making copies of the agent struct is also valid.

Definition at line 130 of file agent.h.

+ Collaboration diagram for rocprofiler_agent_v0_t:
Data Fields
uint32_t array_count Number of SIMD arrays.
const rocprofiler_agent_cache_t * caches
uint32_t caches_count

of discoverable cache affinity properties on this "H-NUMA" node.

HSA_CAPABILITY capability GPU only.
uint32_t cpu_core_id_base low value of the logical processor ID of the latency (= CPU) cores available on this node
uint32_t cpu_cores_count

of latency (= CPU) cores present on this HSA node. This value

is 0 for a HSA node with no such cores, e.g a "discrete HSA GPU"

uint32_t cu_count Number of compute units.
uint32_t cu_per_engine computed
uint32_t cu_per_simd_array Number of Compute Units (CU) per SIMD array.
uint16_t device_id GPU device id; 0 on latency (= CPU)-only nodes.
uint32_t domain PCI domain of the GPU.
uint32_t drm_render_minor DRM render device minor device number.
uint32_t family_id Family code.
HSA_ENGINE_ID fw_version GPU only. Identifier (rev) of the GPU uEngine or Firmware, may be 0.
uint32_t gds_size_in_kb Size of Global Data Store in Kilobytes shared across SIMD Wavefronts.
uint32_t gfx_target_version major_version=((value / 10000) % 100) minor_version=((value / 100) % 100) patch_version=(value % 100)
uint64_t gpu_id GPU only. KFD identifier.
rocprofiler_dim3_t grid_max_dim GPU only. Maximum number of work-items of each dimension of a grid.
uint32_t grid_max_size GPU only. Maximum number of fbarriers per work-group. Must be at least 32.
uint64_t hive_id XGMI Hive the GPU node belongs to in the system. It is an opaque and static number hash created by the PSP.
rocprofiler_agent_id_t id Internal opaque identifier.
const rocprofiler_agent_io_link_t * io_links
uint32_t io_links_count

of discoverable IO link affinity properties of this node

connecting to other nodes.

uint32_t lds_size_in_kb Size of Local Data Store in Kilobytes per SIMD Wavefront.
uint64_t local_mem_size GPU only. Local memory size.
uint32_t location_id GPU BDF (Bus/Device/function number) - identifies the device location in the overall system.
int32_t logical_node_id Logical sequence number. This will always be [0..N) where N is the total number of agents.
int32_t logical_node_type_id Logical sequence number with respect to other agents of same type. This will always be [0..N) where N is the total number of X agents (where X is a rocprofiler_agent_type_t value). This field is intended to help with environment variable indexing used to mask GPUs at runtime (i.e. HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES) which start at zero and only apply to GPUs, e.g., logical_node_type_id value for first GPU will be 0, second GPU will have value of 1, etc., regardless of however many agents of a different type preceeded (and thus increased the node_id or logical_node_id).

Example: a system with 2 CPUs and 2 GPUs, where the node ids are 0=CPU, 1=GPU, 2=CPU, 3=GPU, then then CPU node_ids 0 and 2 would have logical_node_type_id values of 0 and 1, respectively, and GPU node_ids 1 and 3 would also have logical_node_type_id values of 0 and 1.

uint32_t max_engine_clk_ccompute maximum engine clocks for CPU, including any boost capabilities
uint32_t max_engine_clk_fcompute GPU only. Maximum engine clocks for GPU, including any boost capabilities.
uint32_t max_slots_scratch_cu Number of temp. memory ("scratch") wave slots available to access, may be 0 if HW has no restrictions.
uint32_t max_waves_per_cu computed
uint32_t max_waves_per_simd This identifies the max. number of launched waves per SIMD. If NumFComputeCores is 0, this value is ignored.
const rocprofiler_agent_mem_bank_t * mem_banks
uint32_t mem_banks_count

of discoverable memory bank affinity properties on this

"H-NUMA" node.

const char * model_name GPU only. Will be something like vega20, mi200, etc.
const char * name Name of the agent. Will be identical to product name for CPU.
uint32_t node_id Node sequence number. This will be equivalent to the HSA-runtime HSA_AMD_AGENT_INFO_DRIVER_NODE_ID property.
uint32_t num_cp_queues number of Compute queues
uint32_t num_gws Number of GWS barriers.
uint32_t num_sdma_engines number of PCIe optimized SDMA engines
uint32_t num_sdma_queues_per_engine number of SDMA queue per one engine
uint32_t num_sdma_xgmi_engines number of XGMI optimized SDMA engines
uint32_t num_shader_banks Number of Shader Banks or Shader Engines, typical values are 1 or 2.
uint32_t num_xcc Number of XCC.
const char * product_name Marketing name.
rocprofiler_agent_runtime_visiblity_t runtime_visibility See @rocprofiler_runtime_library_t. This is an estimate about whether this agent will be visible for the runtimes, e.g. if (agent.runtime_visibility & ROCPROFILER_HIP_LIBRARY) != 0 then we believe this agent will be visible to the HIP library. However, this is an estimate and we cannot be certain until the HIP runtime is initialized. This will always be true for CPU agents.
HSA_ENGINE_VERSION sdma_fw_version GPU only.
uint32_t simd_arrays_per_engine Number of SIMD arrays per engine.
uint32_t simd_count

of HSA throughtput (= GPU) FCompute cores ("SIMD") present in a

node. This value is 0 if no FCompute cores are present (e.g. pure "CPU node").

uint32_t simd_id_base low value of the logical processor ID of the throughput (= GPU) units available on this node
uint32_t simd_per_cu Number of SIMD representing a Compute Unit (CU)
uint64_t size set to sizeof(rocprofiler_agent_t) by rocprofiler. This can be used for versioning and compatibility handling
rocprofiler_agent_type_t type Enumeration for identifying the agent type (CPU, GPU, etc.)
rocprofiler_uuid_t uuid GPU only. Universally unique identifier.
uint16_t vendor_id GPU vendor id; 0 on latency (= CPU)-only nodes.
const char * vendor_name Vendor of agent (will be AMD)
uint32_t wave_front_size Number of SIMD cores per wavefront executed, typically 64, may be 32 or a different value for some HSA based architectures.
rocprofiler_dim3_t workgroup_max_dim GPU only. Maximum number of work-items of each dimension of a work-group.
uint32_t workgroup_max_size GPU only. Maximum total number of work-items in a work-group.

Typedef Documentation

◆ rocprofiler_agent_t

Definition at line 252 of file agent.h.

◆ rocprofiler_query_available_agents_cb_t

typedef rocprofiler_status_t(* rocprofiler_query_available_agents_cb_t) (rocprofiler_agent_version_t version, const void **agents, unsigned long num_agents, void *user_data)

Callback function type for querying the available agents.

If callback is invoked, returns the rocprofiler_status_t value returned from callback

Parameters
[in]versionEnum specifying the version of agent info
[in]agentsArray of pointers to agents
[in]num_agentsNumber of agents in array
[in]user_dataData pointer passback
Returns
rocprofiler_status_t
Return values
ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_ABIsize of the agent struct in application is larger than the agent struct for rocprofiler-sdk
ROCPROFILER_STATUS_ERROR_INVALID_ARGUMENTInvalid rocprofiler_agent_version_t value

Definition at line 268 of file agent.h.

Enumeration Type Documentation

◆ rocprofiler_agent_version_t

Enumeration ID for version of the rocprofiler_agent_v*_t struct in rocprofiler_i.

Enumerator
ROCPROFILER_AGENT_INFO_VERSION_NONE 
ROCPROFILER_AGENT_INFO_VERSION_0 
ROCPROFILER_AGENT_INFO_VERSION_LAST 

Definition at line 45 of file agent.h.

46{
rocprofiler_agent_version_t
Enumeration ID for version of the rocprofiler_agent_v*_t struct in rocprofiler_i.
Definition agent.h:46
@ ROCPROFILER_AGENT_INFO_VERSION_NONE
Definition agent.h:47
@ ROCPROFILER_AGENT_INFO_VERSION_0
Definition agent.h:48
@ ROCPROFILER_AGENT_INFO_VERSION_LAST
Definition agent.h:49

Function Documentation

◆ rocprofiler_query_available_agents()

rocprofiler_status_t rocprofiler_query_available_agents ( rocprofiler_agent_version_t  version,
rocprofiler_query_available_agents_cb_t  callback,
unsigned long  agent_size,
void *  user_data 
)

Receive synchronous callback with an array of available agents at moment of invocation.

Parameters
[in]versionEnum value specifying the struct type of the agent info
[in]callbackCallback function accepting list of agents
[in]agent_sizeShould be set to sizeof(rocprofiler_agent_t)
[in]user_dataData pointer provided to callback
Returns
rocprofiler_status_t