ECC Information#
Functions | |
| amdsmi_status_t | amdsmi_get_gpu_ecc_count (amdsmi_processor_handle processor_handle, amdsmi_gpu_block_t block, amdsmi_error_count_t *ec) |
| Retrieve the error counts for a GPU block. It is not supported on virtual machine guest. More... | |
| amdsmi_status_t | amdsmi_get_gpu_ecc_enabled (amdsmi_processor_handle processor_handle, uint64_t *enabled_blocks) |
| Retrieve the enabled ECC bit-mask. It is not supported on virtual machine guest. More... | |
| amdsmi_status_t | amdsmi_get_gpu_total_ecc_count (amdsmi_processor_handle processor_handle, amdsmi_error_count_t *ec) |
| Returns the total number of ECC errors (correctable, uncorrectable and deferred) in the given GPU. It is not supported on virtual machine guest. More... | |
Detailed Description
Function Documentation
◆ amdsmi_get_gpu_ecc_count()
| amdsmi_status_t amdsmi_get_gpu_ecc_count | ( | amdsmi_processor_handle | processor_handle, |
| amdsmi_gpu_block_t | block, | ||
| amdsmi_error_count_t * | ec | ||
| ) |
Retrieve the error counts for a GPU block. It is not supported on virtual machine guest.
See RAS Error Count sysfs Interface (AMDGPU RAS Support - Linux Kernel documentation) to learn how these error counts are accessed.
- Platform:
gpu_bm_linux
host
Given a processor handle processor_handle, an amdsmi_gpu_block_t block and a pointer to an amdsmi_error_count_t ec, this function will write the error count values for the GPU block indicated by block to memory pointed to by ec.
- Parameters
-
[in] processor_handle a processor handle [in] block The block for which error counts should be retrieved [in,out] ec A pointer to an amdsmi_error_count_t to which the error counts should be written If this parameter is nullptr, this function will return AMDSMI_STATUS_INVAL if the function is supported with the provided, arguments and AMDSMI_STATUS_NOT_SUPPORTED if it is not supported with the provided arguments.
- Returns
- amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail
◆ amdsmi_get_gpu_ecc_enabled()
| amdsmi_status_t amdsmi_get_gpu_ecc_enabled | ( | amdsmi_processor_handle | processor_handle, |
| uint64_t * | enabled_blocks | ||
| ) |
Retrieve the enabled ECC bit-mask. It is not supported on virtual machine guest.
See RAS Error Count sysfs Interface (AMDGPU RAS Support - Linux Kernel documentation) to learn how these error counts are accessed.
- Platform:
gpu_bm_linux
host
Given a processor handle processor_handle, and a pointer to a uint64_t enabled_mask, this function will write bits to memory pointed to by enabled_blocks. Upon a successful call, enabled_blocks can then be AND'd with elements of the amdsmi_gpu_block_t ennumeration to determine if the corresponding block has ECC enabled. Note that whether a block has ECC enabled or not in the device is independent of whether there is kernel support for error counting for that block. Although a block may be enabled, but there may not be kernel support for reading error counters for that block.
- Parameters
-
[in] processor_handle a processor handle [in,out] enabled_blocks A pointer to a uint64_t to which the enabled blocks bits will be written. If this parameter is nullptr, this function will return AMDSMI_STATUS_INVAL if the function is supported with the provided, arguments and AMDSMI_STATUS_NOT_SUPPORTED if it is not supported with the provided arguments.
- Returns
- amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail
◆ amdsmi_get_gpu_total_ecc_count()
| amdsmi_status_t amdsmi_get_gpu_total_ecc_count | ( | amdsmi_processor_handle | processor_handle, |
| amdsmi_error_count_t * | ec | ||
| ) |
Returns the total number of ECC errors (correctable, uncorrectable and deferred) in the given GPU. It is not supported on virtual machine guest.
See RAS Error Count sysfs Interface (AMDGPU RAS Support - Linux Kernel documentation) to learn how these error counts are accessed.
- Platform:
gpu_bm_linux
host
guest_windows
- Parameters
-
[in] processor_handle Device which to query [out] ec Reference to ecc error count structure. Must be allocated by user.
- Returns
- amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail