AMD SMI C++ library usage and examples

AMD SMI C++ library usage and examples#

This section presents a brief overview and some basic examples on the AMD SMI library’s usage. Whether you are developing applications for performance monitoring, system diagnostics, or resource allocation, the AMD SMI C++ library serves as a valuable tool for leveraging the full potential of AMD hardware in your projects.

Note

hipcc and other compilers will not automatically link in the libamd_smi dynamic library. To compile code that uses the AMD SMI library API, ensure the libamd_smi.so can be located by setting the LD_LIBRARY_PATH environment variable to the directory containing librocm_smi64.so (usually /opt/rocm/lib) or by passing the -lamd_smi flag to the compiler.

See also

Refer to the C++ library API reference.

Device and socket handles#

Many functions in the library take a socket handle or device handle. A socket refers to a physical hardware socket, abstracted by the library to represent the hardware more effectively to the user. While there is always one unique GPU per socket, an APU may house both a GPU and CPU on the same socket. For MI200 GPUs, multiple GCDs may reside within a single socket

To identify the sockets in a system, use the amdsmi_get_socket_handles() function, which returns a list of socket handles. These handles can then be used with amdsmi_get_processor_handles() to query devices within each socket. The device handle is used to differentiate between detected devices; however, it’s important to note that a device handle may change after restarting the application, so it should not be considered a persistent identifier across processes.

The list of socket handles obtained from amdsmi_get_socket_handles() can also be used to query the CPUs in each socket by calling amdsmi_get_processor_handles_by_type(). This function can then be called again to query the cores within each CPU.

Hello AMD SMI#

An application using AMD SMI must call amdsmi_init() to initialize the AMI SMI library before all other calls. This call initializes the internal data structures required for subsequent AMD SMI operations. In the call, a flag can be passed to indicate if the application is interested in a specific device type.

amdsmi_shut_down() must be the last call to properly close connection to driver and make sure that any resources held by AMD SMI are released.

  1. A simple “Hello World” type program that displays the temperature of detected devices looks like this:

    #include <iostream>
    #include <vector>
    #include "amd_smi/amdsmi.h"
    
    int main() {
      amdsmi_status_t ret;
    
      // Init amdsmi for sockets and devices. Here we are only interested in AMD_GPUS.
      ret = amdsmi_init(AMDSMI_INIT_AMD_GPUS);
    
      // Get all sockets
      uint32_t socket_count = 0;
    
      // Get the socket count available in the system.
      ret = amdsmi_get_socket_handles(&socket_count, nullptr);
    
      // Allocate the memory for the sockets
      std::vector<amdsmi_socket_handle> sockets(socket_count);
      // Get the socket handles in the system
      ret = amdsmi_get_socket_handles(&socket_count, &sockets[0]);
    
      std::cout << "Total Socket: " << socket_count << std::endl;
    
      // For each socket, get identifier and devices
      for (uint32_t i=0; i < socket_count; i++) {
        // Get Socket info
        char socket_info[128];
        ret = amdsmi_get_socket_info(sockets[i], 128, socket_info);
        std::cout << "Socket " << socket_info<< std::endl;
    
        // Get the device count for the socket.
        uint32_t device_count = 0;
        ret = amdsmi_get_processor_handles(sockets[i], &device_count, nullptr);
    
        // Allocate the memory for the device handlers on the socket
        std::vector<amdsmi_processor_handle> processor_handles(device_count);
        // Get all devices of the socket
        ret = amdsmi_get_processor_handles(sockets[i],
                  &device_count, &processor_handles[0]);
    
        // For each device of the socket, get name and temperature.
        for (uint32_t j=0; j < device_count; j++) {
          // Get device type. Since the amdsmi is initialized with
          // AMD_SMI_INIT_AMD_GPUS, the processor_type must be AMDSMI_PROCESSOR_TYPE_AMD_GPU.
          processor_type_t processor_type;
          ret = amdsmi_get_processor_type(processor_handles[j], &processor_type);
          if (processor_type != AMDSMI_PROCESSOR_TYPE_AMD_GPU) {
            std::cout << "Expect AMDSMI_PROCESSOR_TYPE_AMD_GPU device type!\n";
            return 1;
          }
    
          // Get device name
          amdsmi_board_info_t board_info;
          ret = amdsmi_get_gpu_board_info(processor_handles[j], &board_info);
          std::cout << "\tdevice "
                      << j <<"\n\t\tName:" << board_info.product_name << std::endl;
    
          // Get temperature
          int64_t val_i64 = 0;
          ret =  amdsmi_get_temp_metric(processor_handles[j], AMDSMI_TEMPERATURE_TYPE_EDGE,
                  AMDSMI_TEMP_CURRENT, &val_i64);
          std::cout << "\t\tTemperature: " << val_i64 << "C" << std::endl;
        }
      }
    
      // Clean up resources allocated at amdsmi_init. It will invalidate sockets
      // and devices pointers
      ret = amdsmi_shut_down();
    
      return 0;
    }
    
  2. A sample program that displays the power of detected CPUs looks like this:

    #include <iostream>
    #include <vector>
    #include "amd_smi/amdsmi.h"
    
    int main(int argc, char **argv) {
        amdsmi_status_t ret;
        uint32_t socket_count = 0;
    
        // Initialize amdsmi for AMD CPUs
        ret = amdsmi_init(AMDSMI_INIT_AMD_CPUS);
    
        ret = amdsmi_get_socket_handles(&socket_count, nullptr);
    
        // Allocate the memory for the sockets
        std::vector<amdsmi_socket_handle> sockets(socket_count);
    
        // Get the sockets of the system
        ret = amdsmi_get_socket_handles(&socket_count, &sockets[0]);
    
        std::cout << "Total Socket: " << socket_count << std::endl;
    
        // For each socket, get cpus
        for (uint32_t i = 0; i < socket_count; i++) {
            uint32_t cpu_count = 0;
    
            // Set processor type as AMDSMI_PROCESSOR_TYPE_AMD_CPU
            processor_type_t processor_type = AMDSMI_PROCESSOR_TYPE_AMD_CPU;
            ret = amdsmi_get_processor_handles_by_type(sockets[i], processor_type, nullptr, &cpu_count);
    
            // Allocate the memory for the cpus
            std::vector<amdsmi_processor_handle> plist(cpu_count);
    
            // Get the cpus for each socket
            ret = amdsmi_get_processor_handles_by_type(sockets[i], processor_type, &plist[0], &cpu_count);
    
            for (uint32_t index = 0; index < plist.size(); index++) {
                uint32_t socket_power;
                std::cout<<"CPU "<<index<<"\t"<< std::endl;
                std::cout<<"Power (Watts): ";
    
                ret = amdsmi_get_cpu_socket_power(plist[index], &socket_power);
                if(ret != AMDSMI_STATUS_SUCCESS)
                    std::cout<<"Failed to get cpu socket power"<<"["<<index<<"] , Err["<<ret<<"] "<< std::endl;
    
                if (!ret) {
                    std::cout<<static_cast<double>(socket_power)/1000<<std::endl;
                }
                std::cout<<std::endl;
            }
        }
    
        // Clean up resources allocated at amdsmi_init
        ret = amdsmi_shut_down();
    
        return 0;
    }