GPU Monitoring

GPU Monitoring#

AMD SMI: GPU Monitoring
GPU Monitoring

Functions

amdsmi_status_t amdsmi_get_temp_metric (amdsmi_processor_handle processor_handle, amdsmi_temperature_type_t sensor_type, amdsmi_temperature_metric_t metric, int64_t *temperature)
 Get the temperature metric value for the specified metric, from the specified temperature sensor on the specified device. It is not supported on virtual machine guest.
 
amdsmi_status_t amdsmi_get_gpu_activity (amdsmi_processor_handle processor_handle, amdsmi_engine_usage_t *info)
 Returns the current usage of the GPU engines (GFX, MM and MEM). Each usage is reported as a percentage from 0-100%.
 
amdsmi_status_t amdsmi_get_power_info (amdsmi_processor_handle processor_handle, amdsmi_power_info_t *info)
 Returns the current power and voltage of the GPU.
 
amdsmi_status_t amdsmi_is_gpu_power_management_enabled (amdsmi_processor_handle processor_handle, bool *enabled)
 Returns is power management enabled.
 
amdsmi_status_t amdsmi_get_clock_info (amdsmi_processor_handle processor_handle, amdsmi_clk_type_t clk_type, amdsmi_clk_info_t *info)
 Returns the measurements of the clocks in the GPU for the GFX and multimedia engines and Memory. This call reports the averages over 1s in MHz. It is not supported on virtual machine guest.
 
amdsmi_status_t amdsmi_get_gpu_vram_usage (amdsmi_processor_handle processor_handle, amdsmi_vram_usage_t *info)
 Returns the VRAM usage (both total and used memory) in MegaBytes.
 
amdsmi_status_t amdsmi_get_violation_status (amdsmi_processor_handle processor_handle, amdsmi_violation_status_t *info)
 Returns the violations for a processor.
 

Detailed Description

Function Documentation

◆ amdsmi_get_temp_metric()

amdsmi_status_t amdsmi_get_temp_metric ( amdsmi_processor_handle  processor_handle,
amdsmi_temperature_type_t  sensor_type,
amdsmi_temperature_metric_t  metric,
int64_t *  temperature 
)

Get the temperature metric value for the specified metric, from the specified temperature sensor on the specified device. It is not supported on virtual machine guest.

Platform:

gpu_bm_linux

host

guest_windows

Given a processor handle processor_handle, a sensor type sensor_type, a amdsmi_temperature_metric_t metric and a pointer to an int64_t temperature, this function will write the value of the metric indicated by metric and sensor_type to the memory location temperature.

Parameters
[in]processor_handlea processor handle
[in]sensor_typepart of device from which temperature should be obtained. This should come from the enum amdsmi_temperature_type_t
[in]metricenum indicated which temperature value should be retrieved
[in,out]temperaturea pointer to int64_t to which the temperature is in Celsius. If this parameter is nullptr, this function will return AMDSMI_STATUS_INVAL if the function is supported with the provided, arguments and AMDSMI_STATUS_NOT_SUPPORTED if it is not supported with the provided arguments.
Returns
amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail

◆ amdsmi_get_gpu_activity()

amdsmi_status_t amdsmi_get_gpu_activity ( amdsmi_processor_handle  processor_handle,
amdsmi_engine_usage_t info 
)

Returns the current usage of the GPU engines (GFX, MM and MEM). Each usage is reported as a percentage from 0-100%.

Platform:

gpu_bm_linux

host

guest_windows

Parameters
[in]processor_handleDevice which to query
[out]infoReference to the gpu engine usage structure. Must be allocated by user.
Returns
amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail

◆ amdsmi_get_power_info()

amdsmi_status_t amdsmi_get_power_info ( amdsmi_processor_handle  processor_handle,
amdsmi_power_info_t info 
)

Returns the current power and voltage of the GPU.

Platform:

gpu_bm_linux

host

guest_windows

guest_1vf

Note
amdsmi_power_info_t::socket_power metric can rarely spike above the socket power limit in some cases
unsupported struct members are set to UINT32_MAX
Parameters
[in]processor_handlePF of a processor for which to query
[out]infoReference to the gpu power structure. Must be allocated by user.
Returns
amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail

◆ amdsmi_is_gpu_power_management_enabled()

amdsmi_status_t amdsmi_is_gpu_power_management_enabled ( amdsmi_processor_handle  processor_handle,
bool *  enabled 
)

Returns is power management enabled.

Platform:

gpu_bm_linux

host

Parameters
[in]processor_handlePF of a processor for which to query
[out]enabledReference to bool. Must be allocated by user.
Returns
amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail

◆ amdsmi_get_clock_info()

amdsmi_status_t amdsmi_get_clock_info ( amdsmi_processor_handle  processor_handle,
amdsmi_clk_type_t  clk_type,
amdsmi_clk_info_t info 
)

Returns the measurements of the clocks in the GPU for the GFX and multimedia engines and Memory. This call reports the averages over 1s in MHz. It is not supported on virtual machine guest.

Platform:

gpu_bm_linux

host

guest_windows

Parameters
[in]processor_handleDevice which to query
[in]clk_typeEnum representing the clock type to query.
[out]infoReference to the gpu clock structure. Must be allocated by user.
Returns
amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail

◆ amdsmi_get_gpu_vram_usage()

amdsmi_status_t amdsmi_get_gpu_vram_usage ( amdsmi_processor_handle  processor_handle,
amdsmi_vram_usage_t info 
)

Returns the VRAM usage (both total and used memory) in MegaBytes.

Platform:
gpu_bm_linux
Parameters
[in]processor_handleDevice which to query
[out]infoReference to vram information. Must be allocated by user.
Returns
amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail

◆ amdsmi_get_violation_status()

amdsmi_status_t amdsmi_get_violation_status ( amdsmi_processor_handle  processor_handle,
amdsmi_violation_status_t info 
)

Returns the violations for a processor.

Warning: API will be slow due to polling driver for 2 samples. Require a minimum wait of 100ms between the 2 samples in order to calculate. Otherwise users would need to use amdsmi_get_gpu_metrics_info for BM. See that API's struct for calculations.

Platform:
gpu_bm_linux
Parameters
[in]processor_handleDevice which to query
[out]infoReference to all violation status details available. Must be allocated by user.
Returns
amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail