Driver control mechanisms

Driver control mechanisms#

AMD SMI: Driver control mechanisms
Driver control mechanisms

Functions

amdsmi_status_t amdsmi_gpu_driver_reload (void)
 Restart the device driver (kmod module) for all AMD GPUs on the system.
 

Detailed Description

These functions provide control over the driver. Users should use with caution as they may cause the driver to become unstable.

Function Documentation

◆ amdsmi_gpu_driver_reload()

amdsmi_status_t amdsmi_gpu_driver_reload ( void  )

Restart the device driver (kmod module) for all AMD GPUs on the system.

Platform:

gpu_bm_linux

guest_1vf

guest_mvf

This function will reload the AMD GPU driver as described in the Linux kernel documentation - https://docs.kernel.org/admin-guide/sysctl/kernel.html#modprobe with no extra parameters as specified in https://docs.kernel.org/gpu/amdgpu/module-parameters.html.

Use this function with caution, as it will unload and reload the AMD GPU driver: modprobe -r amdgpu && modprobe amdgpu.

Any process or workload using the AMD GPU driver is REQUIRED to be stopped before calling this function. Otherwise, function will return AMDSMI_STATUS_AMDGPU_RESTART_ERR could not successfully restart the amdgpu driver.

User is REQUIRED to have root/admin privileges to call this function. Otherwise, this function will return AMDSMI_STATUS_NO_PERM.

This API will take time to complete, as we are checking the driver's loading status to confirm it reloaded properly. If AMDSMI_STATUS_AMDGPU_RESTART_ERR is returned, it means the driver did not reload properly and the user should check dmesg logs.

This function has been created in order to conviently reload the AMD GPU driver once amdsmi_set_gpu_memory_partition() or amdsmi_set_gpu_memory_partition_mode() successfully has been changed on Baremetal systems. Now users can control the reload once all GPU processes/workloads have been stopped on the AMD GPU driver. A (AMD GPU) driver reload is REQUIRED to complete changing to the new memory partition configuration (amdsmi_set_gpu_memory_partition()/amdsmi_set_gpu_memory_partition_mode()) operation MUST be successful. This function WILL EFFECT all GPUs in the hive to be reconfigured with the specified memory partition configuration.

Returns
amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success
| AMDSMI_STATUS_NO_PERM function requires root access
| AMDSMI_STATUS_AMDGPU_RESTART_ERR could not successfully restart the amdgpu driver.