Error Queries#
Functions | |
rsmi_status_t | rsmi_dev_ecc_count_get (uint32_t dv_ind, rsmi_gpu_block_t block, rsmi_error_count_t *ec) |
Retrieve the error counts for a GPU block. More... | |
rsmi_status_t | rsmi_dev_ecc_enabled_get (uint32_t dv_ind, uint64_t *enabled_blocks) |
Retrieve the enabled ECC bit-mask. More... | |
rsmi_status_t | rsmi_dev_ecc_status_get (uint32_t dv_ind, rsmi_gpu_block_t block, rsmi_ras_err_state_t *state) |
Retrieve the ECC status for a GPU block. More... | |
rsmi_status_t | rsmi_status_string (rsmi_status_t status, const char **status_string) |
Get a description of a provided RSMI error status. More... | |
Detailed Description
These functions provide error information about RSMI calls as well as device errors.
Function Documentation
◆ rsmi_dev_ecc_count_get()
rsmi_status_t rsmi_dev_ecc_count_get | ( | uint32_t | dv_ind, |
rsmi_gpu_block_t | block, | ||
rsmi_error_count_t * | ec | ||
) |
Retrieve the error counts for a GPU block.
Given a device index dv_ind
, an rsmi_gpu_block_t block
and a pointer to an rsmi_error_count_t ec
, this function will write the error count values for the GPU block indicated by block
to memory pointed to by ec
.
- Parameters
-
[in] dv_ind a device index [in] block The block for which error counts should be retrieved [in,out] ec A pointer to an rsmi_error_count_t to which the error counts should be written If this parameter is nullptr, this function will return RSMI_STATUS_INVALID_ARGS if the function is supported with the provided, arguments and RSMI_STATUS_NOT_SUPPORTED if it is not supported with the provided arguments.
- Return values
-
RSMI_STATUS_SUCCESS call was successful RSMI_STATUS_NOT_SUPPORTED installed software or hardware does not support this function with the given arguments RSMI_STATUS_INVALID_ARGS the provided arguments are not valid
◆ rsmi_dev_ecc_enabled_get()
rsmi_status_t rsmi_dev_ecc_enabled_get | ( | uint32_t | dv_ind, |
uint64_t * | enabled_blocks | ||
) |
Retrieve the enabled ECC bit-mask.
Given a device index dv_ind
, and a pointer to a uint64_t enabled_mask
, this function will write bits to memory pointed to by enabled_blocks
. Upon a successful call, enabled_blocks
can then be AND'd with elements of the rsmi_gpu_block_t ennumeration to determine if the corresponding block has ECC enabled. Note that whether a block has ECC enabled or not in the device is independent of whether there is kernel support for error counting for that block. Although a block may be enabled, but there may not be kernel support for reading error counters for that block.
- Parameters
-
[in] dv_ind a device index [in,out] enabled_blocks A pointer to a uint64_t to which the enabled blocks bits will be written. If this parameter is nullptr, this function will return RSMI_STATUS_INVALID_ARGS if the function is supported with the provided, arguments and RSMI_STATUS_NOT_SUPPORTED if it is not supported with the provided arguments.
- Return values
-
RSMI_STATUS_SUCCESS call was successful RSMI_STATUS_NOT_SUPPORTED installed software or hardware does not support this function with the given arguments RSMI_STATUS_INVALID_ARGS the provided arguments are not valid
◆ rsmi_dev_ecc_status_get()
rsmi_status_t rsmi_dev_ecc_status_get | ( | uint32_t | dv_ind, |
rsmi_gpu_block_t | block, | ||
rsmi_ras_err_state_t * | state | ||
) |
Retrieve the ECC status for a GPU block.
Given a device index dv_ind
, an rsmi_gpu_block_t block
and a pointer to an rsmi_ras_err_state_t state
, this function will write the current state for the GPU block indicated by block
to memory pointed to by state
.
- Parameters
-
[in] dv_ind a device index [in] block The block for which error counts should be retrieved [in,out] state A pointer to an rsmi_ras_err_state_t to which the ECC state should be written If this parameter is nullptr, this function will return RSMI_STATUS_INVALID_ARGS if the function is supported with the provided, arguments and RSMI_STATUS_NOT_SUPPORTED if it is not supported with the provided arguments.
- Return values
-
RSMI_STATUS_SUCCESS call was successful RSMI_STATUS_NOT_SUPPORTED installed software or hardware does not support this function with the given arguments RSMI_STATUS_INVALID_ARGS the provided arguments are not valid
◆ rsmi_status_string()
rsmi_status_t rsmi_status_string | ( | rsmi_status_t | status, |
const char ** | status_string | ||
) |
Get a description of a provided RSMI error status.
Set the provided pointer to a const char *, status_string
, to a string containing a description of the provided error code status
.
- Parameters
-
[in] status The error status for which a description is desired [in,out] status_string A pointer to a const char * which will be made to point to a description of the provided error code
- Return values
-
RSMI_STATUS_SUCCESS is returned upon successful call