Agent Information#
Rocprofiler SDK Developer API 0.4.0
ROCm Profiling API and tools
|
needs brief description More...
Data Structures | |
struct | rocprofiler_agent_cache_t |
Cache information for an agent. More... | |
struct | rocprofiler_agent_io_link_t |
IO link information for an agent. More... | |
struct | rocprofiler_agent_mem_bank_t |
Memory bank information for an agent. More... | |
struct | rocprofiler_agent_v0_t |
Stores the properties of an agent (CPU, GPU, etc.) More... | |
Typedefs | |
typedef rocprofiler_agent_v0_t | rocprofiler_agent_t |
typedef rocprofiler_status_t(* | rocprofiler_query_available_agents_cb_t) (rocprofiler_agent_version_t version, const void **agents, unsigned long num_agents, void *user_data) |
Callback function type for querying the available agents. | |
Enumerations | |
enum | rocprofiler_agent_version_t { ROCPROFILER_AGENT_INFO_VERSION_NONE = 0 , ROCPROFILER_AGENT_INFO_VERSION_0 = 1 , ROCPROFILER_AGENT_INFO_VERSION_LAST } |
Enumeration ID for version of the rocprofiler_agent_v*_t struct in rocprofiler_i. More... | |
Functions | |
rocprofiler_status_t | rocprofiler_query_available_agents (rocprofiler_agent_version_t version, rocprofiler_query_available_agents_cb_t callback, unsigned long agent_size, void *user_data) |
Receive synchronous callback with an array of available agents at moment of invocation. | |
Detailed Description
needs brief description
Data Structure Documentation
◆ rocprofiler_agent_cache_t
struct rocprofiler_agent_cache_t |
◆ rocprofiler_agent_io_link_t
struct rocprofiler_agent_io_link_t |
Data Fields | ||
---|---|---|
HSA_LINKPROPERTY | flags | override flags (may be active for specific platforms) |
uint32_t | max_bandwidth | maximum interface Bandwidth in MB/s |
uint32_t | max_latency | maximum cost of time to transfer (rounded to ns) |
uint32_t | min_bandwidth | minimum interface Bandwidth in MB/s |
uint32_t | min_latency | minimum cost of time to transfer (rounded to ns) |
uint32_t | node_from | See rocprofiler_agent_id_t. |
uint32_t | node_to | See rocprofiler_agent_id_t. |
uint32_t | recommended_transfer_size | recommended transfer size to reach maximum bandwidth in bytes |
HSA_IOLINKTYPE | type | Discoverable IoLink Properties (optional) |
uint32_t | version_major | Bus interface version (optional) |
uint32_t | version_minor | Bus interface version (optional) |
uint32_t | weight | weight factor (derived from CDIT) |
◆ rocprofiler_agent_mem_bank_t
struct rocprofiler_agent_mem_bank_t |
◆ rocprofiler_agent_v0_t
struct rocprofiler_agent_v0_t |
Stores the properties of an agent (CPU, GPU, etc.)
The node_id
member is the KFD topology node id. It should be considered the "universal" indexing number. It is equivalent to the HSA-runtime HSA_AMD_AGENT_INFO_DRIVER_NODE_ID property of a hsa_agent_t
. The const char*
fields (name
, vendor_name
, etc.) are guaranteed to be valid pointers to null-terminated strings during tool finalization. Pointers to the agents via
- See also
- rocprofiler_query_available_agents are constant and will not be deallocated until after tool finalization. Making copies of the agent struct is also valid.
Data Fields | ||
---|---|---|
uint32_t | array_count | Number of SIMD arrays. |
const rocprofiler_agent_cache_t * | caches | |
uint32_t | caches_count |
of discoverable cache affinity properties on this "H-NUMA" node. |
HSA_CAPABILITY | capability | GPU only. |
uint32_t | cpu_core_id_base | low value of the logical processor ID of the latency (= CPU) cores available on this node |
uint32_t | cpu_cores_count |
of latency (= CPU) cores present on this HSA node. This valueis 0 for a HSA node with no such cores, e.g a "discrete HSA GPU" |
uint32_t | cu_count | Number of compute units. |
uint32_t | cu_per_engine | computed |
uint32_t | cu_per_simd_array | Number of Compute Units (CU) per SIMD array. |
uint16_t | device_id | GPU device id; 0 on latency (= CPU)-only nodes. |
uint32_t | domain | PCI domain of the GPU. |
uint32_t | drm_render_minor | DRM render device minor device number. |
uint32_t | family_id | Family code. |
HSA_ENGINE_ID | fw_version | GPU only. Identifier (rev) of the GPU uEngine or Firmware, may be 0. |
uint32_t | gds_size_in_kb | Size of Global Data Store in Kilobytes shared across SIMD Wavefronts. |
uint32_t | gfx_target_version | major_version=((value / 10000) % 100) minor_version=((value / 100) % 100) patch_version=(value % 100) |
uint64_t | gpu_id | GPU only. KFD identifier. |
rocprofiler_dim3_t | grid_max_dim | GPU only. Maximum number of work-items of each dimension of a grid. |
uint32_t | grid_max_size | GPU only. Maximum number of fbarriers per work-group. Must be at least 32. |
uint64_t | hive_id | XGMI Hive the GPU node belongs to in the system. It is an opaque and static number hash created by the PSP. |
rocprofiler_agent_id_t | id | Internal opaque identifier. |
const rocprofiler_agent_io_link_t * | io_links | |
uint32_t | io_links_count |
of discoverable IO link affinity properties of this nodeconnecting to other nodes. |
uint32_t | lds_size_in_kb | Size of Local Data Store in Kilobytes per SIMD Wavefront. |
uint64_t | local_mem_size | GPU only. Local memory size. |
uint32_t | location_id | GPU BDF (Bus/Device/function number) - identifies the device location in the overall system. |
int32_t | logical_node_id | Logical sequence number. This will always be [0..N) where N is the total number of agents. |
int32_t | logical_node_type_id |
Logical sequence number with respect to other agents of same type. This will always be [0..N) where N is the total number of X agents (where X is a rocprofiler_agent_type_t value). This field is intended to help with environment variable indexing used to mask GPUs at runtime (i.e. HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES) which start at zero and only apply to GPUs, e.g., logical_node_type_id value for first GPU will be 0, second GPU will have value of 1, etc., regardless of however many agents of a different type preceeded (and thus increased the node_id or logical_node_id). Example: a system with 2 CPUs and 2 GPUs, where the node ids are 0=CPU, 1=GPU, 2=CPU, 3=GPU, then then CPU node_ids 0 and 2 would have logical_node_type_id values of 0 and 1, respectively, and GPU node_ids 1 and 3 would also have logical_node_type_id values of 0 and 1. |
uint32_t | max_engine_clk_ccompute | maximum engine clocks for CPU, including any boost capabilities |
uint32_t | max_engine_clk_fcompute | GPU only. Maximum engine clocks for GPU, including any boost capabilities. |
uint32_t | max_slots_scratch_cu | Number of temp. memory ("scratch") wave slots available to access, may be 0 if HW has no restrictions. |
uint32_t | max_waves_per_cu | computed |
uint32_t | max_waves_per_simd | This identifies the max. number of launched waves per SIMD. If NumFComputeCores is 0, this value is ignored. |
const rocprofiler_agent_mem_bank_t * | mem_banks | |
uint32_t | mem_banks_count |
of discoverable memory bank affinity properties on this"H-NUMA" node. |
const char * | model_name | GPU only. Will be something like vega20, mi200, etc. |
const char * | name | Name of the agent. Will be identical to product name for CPU. |
uint32_t | node_id | Node sequence number. This will be equivalent to the HSA-runtime HSA_AMD_AGENT_INFO_DRIVER_NODE_ID property. |
uint32_t | num_cp_queues | number of Compute queues |
uint32_t | num_gws | Number of GWS barriers. |
uint32_t | num_sdma_engines | number of PCIe optimized SDMA engines |
uint32_t | num_sdma_queues_per_engine | number of SDMA queue per one engine |
uint32_t | num_sdma_xgmi_engines | number of XGMI optimized SDMA engines |
uint32_t | num_shader_banks | Number of Shader Banks or Shader Engines, typical values are 1 or 2. |
uint32_t | num_xcc | Number of XCC. |
const char * | product_name | Marketing name. |
int32_t | reserved_padding0 | padding logical_node_id to 64 bytes |
HSA_ENGINE_VERSION | sdma_fw_version | GPU only. |
uint32_t | simd_arrays_per_engine | Number of SIMD arrays per engine. |
uint32_t | simd_count |
of HSA throughtput (= GPU) FCompute cores ("SIMD") present in anode. This value is 0 if no FCompute cores are present (e.g. pure "CPU node"). |
uint32_t | simd_id_base | low value of the logical processor ID of the throughput (= GPU) units available on this node |
uint32_t | simd_per_cu | Number of SIMD representing a Compute Unit (CU) |
uint64_t | size | set to sizeof(rocprofiler_agent_t) by rocprofiler. This can be used for versioning and compatibility handling |
rocprofiler_agent_type_t | type | Enumeration for identifying the agent type (CPU, GPU, etc.) |
uint16_t | vendor_id | GPU vendor id; 0 on latency (= CPU)-only nodes. |
const char * | vendor_name | Vendor of agent (will be AMD) |
uint32_t | wave_front_size | Number of SIMD cores per wavefront executed, typically 64, may be 32 or a different value for some HSA based architectures. |
rocprofiler_dim3_t | workgroup_max_dim | GPU only. Maximum number of work-items of each dimension of a work-group. |
uint32_t | workgroup_max_size | GPU only. Maximum total number of work-items in a work-group. |
Typedef Documentation
◆ rocprofiler_agent_t
◆ rocprofiler_query_available_agents_cb_t
typedef rocprofiler_status_t(* rocprofiler_query_available_agents_cb_t) (rocprofiler_agent_version_t version, const void **agents, unsigned long num_agents, void *user_data) |
Callback function type for querying the available agents.
If callback is invoked, returns the rocprofiler_status_t value returned from callback
- Parameters
-
[in] version Enum specifying the version of agent info [in] agents Array of pointers to agents [in] num_agents Number of agents in array [in] user_data Data pointer passback
- Returns
- rocprofiler_status_t
- Return values
-
ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_ABI size of the agent struct in application is larger than the agent struct for rocprofiler-sdk ROCPROFILER_STATUS_ERROR_INVALID_ARGUMENT Invalid rocprofiler_agent_version_t value
Enumeration Type Documentation
◆ rocprofiler_agent_version_t
Enumeration ID for version of the rocprofiler_agent_v*_t struct in rocprofiler_i.
Enumerator | |
---|---|
ROCPROFILER_AGENT_INFO_VERSION_NONE | |
ROCPROFILER_AGENT_INFO_VERSION_0 | |
ROCPROFILER_AGENT_INFO_VERSION_LAST |
Definition at line 45 of file agent.h.
Function Documentation
◆ rocprofiler_query_available_agents()
rocprofiler_status_t rocprofiler_query_available_agents | ( | rocprofiler_agent_version_t | version, |
rocprofiler_query_available_agents_cb_t | callback, | ||
unsigned long | agent_size, | ||
void * | user_data | ||
) |
Receive synchronous callback with an array of available agents at moment of invocation.
- Parameters
-
[in] version Enum value specifying the struct type of the agent info [in] callback Callback function accepting list of agents [in] agent_size Should be set to sizeof(rocprofiler_agent_t) [in] user_data Data pointer provided to callback
- Returns
- rocprofiler_status_t
Generated by 1.9.8