AMD SMI CLI Tool#

This tool acts as a command line interface for manipulating and monitoring the amdgpu kernel, and is intended to replace and deprecate the existing rocm_smi CLI tool & gpuv-smi tool. It uses Ctypes to call the amd_smi_lib API. Recommended: At least one AMD GPU with AMD driver installed

Install CLI Tool and Python Library#

Requirements#

  • python 3.6.8+ 64-bit

  • amdgpu driver must be loaded for amdsmi_init() to pass

Installation#

  • Install amdgpu driver

  • Install amd-smi-lib package through package manager

  • amd-smi –help

Install Example for Ubuntu 22.04#

apt install amd-smi-lib
amd-smi --help

Optional autocompletion#

amd-smi cli application supports autocompletion. The package should attempt to install it, if argcomplete is not installed you can enable it by using the following commands:

python3 -m pip install argcomplete
activate-global-python-argcomplete --user
# restart shell to enable

Manual/Multiple Rocm Instance Python Library Install#

In the event there are multiple rocm installations and pyenv is not being used, to use the correct amdsmi version you must uninstall previous versions of amd-smi and install the version you want directly from your rocm instance.

Python Library Install Example for Ubuntu 22.04#

Remove previous amdsmi installation:

python3 -m pip list | grep amd
python3 -m pip uninstall amdsmi

Then install Python library from your target rocm instance:

apt install amd-smi-lib
amd-smi --help
cd /opt/rocm/share/amd_smi
python3 -m pip install --upgrade pip
python3 -m pip install --user .

Now you have the amdsmi python library in your python path:

~$ python3
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import amdsmi
>>>

Usage#

amd-smi will report the version and current platform detected when running the command without arguments:

~$ amd-smi
usage: amd-smi [-h]  ...

AMD System Management Interface | Version: 24.2.0.0 | ROCm version: 6.1.0 | Platform: Linux
Baremetal

options:
  -h, --help          show this help message and exit

AMD-SMI Commands:
                      Descriptions:
    version           Display version information
    list              List GPU information
    static            Gets static information about the specified GPU
    firmware (ucode)  Gets firmware information about the specified GPU
    bad-pages         Gets bad page information about the specified GPU
    metric            Gets metric/performance information about the specified GPU
    process           Lists general process information running on the specified GPU
    event             Displays event information for the given GPU
    topology          Displays topology information of the devices
    set               Set options for devices
    reset             Reset options for devices
    monitor           Monitor metrics for target devices

More detailed verison information is available from amd-smi version

Each command will have detailed information via amd-smi [command] --help

Commands#

For convenience, here is the help output for each command

~$ amd-smi list --help
usage: amd-smi list [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL]
                    [-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]]

Lists all the devices on the system and the links between devices.
Lists all the sockets and for each socket, GPUs and/or CPUs associated to
that socket alongside some basic information for each device.
In virtualization environments, it can also list VFs associated to each
GPU with some basic information for each VF.

options:
  -h, --help                  show this help message and exit
  -g, --gpu GPU [GPU ...]     Select a GPU ID, BDF, or UUID from the possible choices:
                              ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817
                              ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a
                              ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6
                              ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499
                                all | Selects all devices
  -U, --cpu CPU [CPU ...]     Select a CPU ID from the possible choices:
                              ID: 0
                              ID: 1
                              ID: 2
                              ID: 3
                                all | Selects all devices
  -O, --core CORE [CORE ...]  Select a Core ID from the possible choices:
                              ID: 0 - 95
                                all  | Selects all devices

Command Modifiers:
  --json                      Displays output in JSON format (human readable by default).
  --csv                       Displays output in CSV format (human readable by default).
  --file FILE                 Saves output into a file on the provided path (stdout by default).
  --loglevel LEVEL            Set the logging level from the possible choices:
                                DEBUG, INFO, WARNING, ERROR, CRITICAL
~$ amd-smi static --help
usage: amd-smi static [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL]
                      [-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]] [-a] [-b]
                      [-V] [-d] [-v] [-c] [-B] [-s] [-i] [-r] [-p] [-l] [-u]

If no GPU is specified, returns static information for all GPUs on the system.
If no static argument is provided, all static information will be displayed.

Static Arguments:
  -h, --help                  show this help message and exit
  -g, --gpu GPU [GPU ...]     Select a GPU ID, BDF, or UUID from the possible choices:
                              ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817
                              ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a
                              ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6
                              ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499
                                all | Selects all devices
  -U, --cpu CPU [CPU ...]     Select a CPU ID from the possible choices:
                              ID: 0
                              ID: 1
                              ID: 2
                              ID: 3
                                all | Selects all devices
  -O, --core CORE [CORE ...]  Select a Core ID from the possible choices:
                              ID: 0 - 95
                                all  | Selects all devices
  -a, --asic                  All asic information
  -b, --bus                   All bus information
  -V, --vbios                 All video bios information (if available)
  -d, --driver                Displays driver version
  -v, --vram                  All vram information
  -c, --cache                 All cache information
  -B, --board                 All board information
  -r, --ras                   Displays RAS features information
  -p, --partition             Partition information
  -l, --limit                 All limit metric values (i.e. power and thermal limits)
  -u, --numa                  All numa node information

CPU Option<s>:
  -s, --smu                   All SMU FW information
  -i, --interface_ver         Displays hsmp interface version

Command Modifiers:
  --json                      Displays output in JSON format (human readable by default).
  --csv                       Displays output in CSV format (human readable by default).
  --file FILE                 Saves output into a file on the provided path (stdout by default).
  --loglevel LEVEL            Set the logging level from the possible choices:
                                DEBUG, INFO, WARNING, ERROR, CRITICAL
~$ amd-smi firmware --help
usage: amd-smi firmware [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL]
                        [-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]] [-f]

If no GPU is specified, return firmware information for all GPUs on the system.

Firmware Arguments:
  -h, --help                   show this help message and exit
  -g, --gpu GPU [GPU ...]      Select a GPU ID, BDF, or UUID from the possible choices:
                               ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817
                               ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a
                               ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6
                               ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499
                                 all | Selects all devices
  -U, --cpu CPU [CPU ...]      Select a CPU ID from the possible choices:
                               ID: 0
                               ID: 1
                               ID: 2
                               ID: 3
                                 all | Selects all devices
  -O, --core CORE [CORE ...]   Select a Core ID from the possible choices:
                               ID: 0 - 95
                                 all  | Selects all devices
  -f, --ucode-list, --fw-list  All FW list information

Command Modifiers:
  --json                       Displays output in JSON format (human readable by default).
  --csv                        Displays output in CSV format (human readable by default).
  --file FILE                  Saves output into a file on the provided path (stdout by default).
  --loglevel LEVEL             Set the logging level from the possible choices:
                                DEBUG, INFO, WARNING, ERROR, CRITICAL
~$ amd-smi bad-pages --help
usage: amd-smi bad-pages [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL]
                         [-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]] [-p]
                         [-r] [-u]

If no GPU is specified, return bad page information for all GPUs on the system.

Bad Pages Arguments:
  -h, --help                  show this help message and exit
  -g, --gpu GPU [GPU ...]     Select a GPU ID, BDF, or UUID from the possible choices:
                              ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817
                              ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a
                              ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6
                              ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499
                                all | Selects all devices
  -U, --cpu CPU [CPU ...]     Select a CPU ID from the possible choices:
                              ID: 0
                              ID: 1
                              ID: 2
                              ID: 3
                                all | Selects all devices
  -O, --core CORE [CORE ...]  Select a Core ID from the possible choices:
                              ID: 0 - 95
                                all  | Selects all devices
  -p, --pending               Displays all pending retired pages
  -r, --retired               Displays retired pages
  -u, --un-res                Displays unreservable pages

Command Modifiers:
  --json                      Displays output in JSON format (human readable by default).
  --csv                       Displays output in CSV format (human readable by default).
  --file FILE                 Saves output into a file on the provided path (stdout by default).
  --loglevel LEVEL            Set the logging level from the possible choices:
                                DEBUG, INFO, WARNING, ERROR, CRITICAL
~$ amd-smi metric --help
usage: amd-smi metric [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL]
                      [-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]]
                      [-w INTERVAL] [-W TIME] [-i ITERATIONS] [-m] [-u] [-p] [-c] [-t]
                      [-e] [-P] [-k] [-f] [-C] [-o] [-l] [-x] [-E] [--cpu_power_metrics]
                      [--cpu_prochot] [--cpu_freq_metrics] [--cpu_c0_res]
                      [--cpu_lclk_dpm_level NBIOID] [--cpu_pwr_svi_telemtry_rails]
                      [--cpu_io_bandwidth IO_BW LINKID_NAME]
                      [--cpu_xgmi_bandwidth XGMI_BW LINKID_NAME] [--cpu_enable_apb]
                      [--cpu_disable_apb DF_PSTATE] [--set_cpu_pow_limit POW_LIMIT]
                      [--set_cpu_xgmi_link_width MIN_WIDTH MAX_WIDTH]
                      [--set_cpu_lclk_dpm_level NBIOID MIN_DPM MAX_DPM]
                      [--core_boost_limit] [--core_curr_active_freq_core_limit]
                      [--set_soc_boost_limit BOOST_LIMIT]
                      [--set_core_boost_limit BOOST_LIMIT] [--cpu_metrics_ver]
                      [--cpu_metrics_table] [--core_energy] [--socket_energy]
                      [--set_cpu_pwr_eff_mode MODE] [--cpu_ddr_bandwidth] [--cpu_temp]
                      [--cpu_dimm_temp_range_rate DIMM_ADDR]
                      [--cpu_dimm_pow_conumption DIMM_ADDR]
                      [--cpu_dimm_thermal_sensor DIMM_ADDR]
                      [--set_cpu_gmi3_link_width MIN_LW MAX_LW]
                      [--set_cpu_pcie_lnk_rate LINK_RATE]
                      [--set_cpu_df_pstate_range MAX_PSTATE MIN_PSTATE]

If no GPU is specified, returns metric information for all GPUs on the system.
If no metric argument is provided all metric information will be displayed.

Metric arguments:
  -h, --help                                       show this help message and exit
  -g, --gpu GPU [GPU ...]                          Select a GPU ID, BDF, or UUID from the possible choices:
                                                   ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817
                                                   ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a
                                                   ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6
                                                   ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499
                                                     all | Selects all devices
  -U, --cpu CPU [CPU ...]                          Select a CPU ID from the possible choices:
                                                   ID: 0
                                                   ID: 1
                                                   ID: 2
                                                   ID: 3
                                                     all | Selects all devices
  -O, --core CORE [CORE ...]                       Select a Core ID from the possible choices:
                                                   ID: 0 - 95
                                                     all  | Selects all devices
  -w, --watch INTERVAL                             Reprint the command in a loop of INTERVAL seconds
  -W, --watch_time TIME                            The total TIME to watch the given command
  -i, --iterations ITERATIONS                      Total number of ITERATIONS to loop on the given command
  -m, --mem-usage                                  Memory usage per block
  -u, --usage                                      Displays engine usage information
  -p, --power                                      Current power usage
  -c, --clock                                      Average, max, and current clock frequencies
  -t, --temperature                                Current temperatures
  -e, --ecc                                        Total number of ECC errors
  -P, --pcie                                       Current PCIe speed, width, and replay count
  -k, --ecc-block                                  Number of ECC errors per block
  -f, --fan                                        Current fan speed
  -C, --voltage-curve                              Display voltage curve
  -o, --overdrive                                  Current GPU clock overdrive level
  -l, --perf-level                                 Current DPM performance level
  -x, --xgmi-err                                   XGMI error information since last read
  -E, --energy                                     Amount of energy consumed

CPU Option<s>:
  --cpu_power_metrics                              Cpu power metrics
  --cpu_prochot                                    Displays prochot status
  --cpu_freq_metrics                               Displays currentFclkMemclk frequencies and cclk frequency limit
  --cpu_c0_res                                     Displays C0 residency
  --cpu_lclk_dpm_level NBIOID                      Displays lclk dpm level range. Requires socket ID and nbio id as inputs
  --cpu_pwr_svi_telemtry_rails                     Displays svi based telemetry for all rails
  --cpu_io_bandwidth IO_BW LINKID_NAME             Displays current IO bandwidth for the selected CPU.
                                                    input parameters are bandwidth type(1) and link ID encodings
                                                    i.e. P2, P3, G0 - G7
  --cpu_xgmi_bandwidth XGMI_BW LINKID_NAME         Displays current XGMI bandwidth for the selected CPU
                                                    input parameters are bandwidth type(1,2,4) and link ID encodings
                                                    i.e. P2, P3, G0 - G7
  --cpu_enable_apb                                 Enables the DF p-state performance boost algorithm
  --cpu_disable_apb DF_PSTATE                      Disables the DF p-state performance boost algorithm.
  --core_boost_limit                               Get booslimit for the selected cores
  --core_curr_active_freq_core_limit               Get Current CCLK limit set per Core
  --cpu_metrics_ver                                Displays metrics table version
  --cpu_metrics_table                              Displays metric table
  --core_energy                                    Displays core energy for the selected core
  --socket_energy                                  Displays socket energy for the selected socket
  --cpu_ddr_bandwidth                              Displays per socket max ddr bw, current utilized bw and current utilized ddr bw in percentage
  --cpu_temp                                       Displays cpu socket temperature
  --cpu_dimm_temp_range_rate DIMM_ADDR             Displays dimm temperature range and refresh rate
  --cpu_dimm_pow_conumption DIMM_ADDR              Displays dimm power consumption
  --cpu_dimm_thermal_sensor DIMM_ADDR              Displays dimm thermal sensor

Set Options<s>:
  --set_cpu_pow_limit POW_LIMIT                    Set power limit for the given socket. Input parameter is power limit value.
  --set_cpu_xgmi_link_width MIN_WIDTH MAX_WIDTH    Set max and Min linkwidth. Input parameters are min and max link width values
  --set_cpu_lclk_dpm_level NBIOID MIN_DPM MAX_DPM  Sets the max and min dpm level on a given NBIO. Inpur parameters are die_index, min dpm, max dpm.
  --set_soc_boost_limit BOOST_LIMIT                Sets the boost limit for the given socket. Input parameter is socket limit value
  --set_core_boost_limit BOOST_LIMIT               Sets the boost limit for the given core. Input parameter is core limit value
  --set_cpu_pwr_eff_mode MODE                      Sets the power efficency mode policy. Input parameter is mode.
  --set_cpu_gmi3_link_width MIN_LW MAX_LW          Sets max and min gmi3 link width range
  --set_cpu_pcie_lnk_rate LINK_RATE                Sets pcie link rate
  --set_cpu_df_pstate_range MAX_PSTATE MIN_PSTATE  Sets max and min df-pstates

Command Modifiers:
  --json                                           Displays output in JSON format (human readable by default).
  --csv                                            Displays output in CSV format (human readable by default).
  --file FILE                                      Saves output into a file on the provided path (stdout by default).
  --loglevel LEVEL                                 Set the logging level from the possible choices:
                                                        DEBUG, INFO, WARNING, ERROR, CRITICAL
~$ amd-smi process --help
usage: amd-smi process [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL]
                       [-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]]
                       [-w INTERVAL] [-W TIME] [-i ITERATIONS] [-G] [-e] [-p PID]
                       [-n NAME]

If no GPU is specified, returns information for all GPUs on the system.
If no process argument is provided all process information will be displayed.

Process arguments:
  -h, --help                   show this help message and exit
  -g, --gpu GPU [GPU ...]      Select a GPU ID, BDF, or UUID from the possible choices:
                               ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817
                               ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a
                               ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6
                               ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499
                                 all | Selects all devices
  -U, --cpu CPU [CPU ...]      Select a CPU ID from the possible choices:
                               ID: 0
                               ID: 1
                               ID: 2
                               ID: 3
                                 all | Selects all devices
  -O, --core CORE [CORE ...]   Select a Core ID from the possible choices:
                               ID: 0 - 95
                                 all  | Selects all devices
  -w, --watch INTERVAL         Reprint the command in a loop of INTERVAL seconds
  -W, --watch_time TIME        The total TIME to watch the given command
  -i, --iterations ITERATIONS  Total number of ITERATIONS to loop on the given command
  -G, --general                pid, process name, memory usage
  -e, --engine                 All engine usages
  -p, --pid PID                Gets all process information about the specified process based on Process ID
  -n, --name NAME              Gets all process information about the specified process based on Process Name.
                               If multiple processes have the same name information is returned for all of them.

Command Modifiers:
  --json                       Displays output in JSON format (human readable by default).
  --csv                        Displays output in CSV format (human readable by default).
  --file FILE                  Saves output into a file on the provided path (stdout by default).
  --loglevel LEVEL             Set the logging level from the possible choices:
                                DEBUG, INFO, WARNING, ERROR, CRITICAL
~$ amd-smi event --help
usage: amd-smi event [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL]
                     [-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]]

If no GPU is specified, returns event information for all GPUs on the system.

Event Arguments:
  -h, --help                  show this help message and exit
  -g, --gpu GPU [GPU ...]     Select a GPU ID, BDF, or UUID from the possible choices:
                              ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817
                              ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a
                              ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6
                              ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499
                                all | Selects all devices
  -U, --cpu CPU [CPU ...]     Select a CPU ID from the possible choices:
                              ID: 0
                              ID: 1
                              ID: 2
                              ID: 3
                                all | Selects all devices
  -O, --core CORE [CORE ...]  Select a Core ID from the possible choices:
                              ID: 0 - 95
                                all  | Selects all devices

Command Modifiers:
  --json                      Displays output in JSON format (human readable by default).
  --csv                       Displays output in CSV format (human readable by default).
  --file FILE                 Saves output into a file on the provided path (stdout by default).
  --loglevel LEVEL            Set the logging level from the possible choices:
                                DEBUG, INFO, WARNING, ERROR, CRITICAL
~$ amd-smi topology --help
usage: amd-smi topology [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL]
                        [-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]] [-a]
                        [-w] [-o] [-t] [-b]

If no GPU is specified, returns information for all GPUs on the system.
If no topology argument is provided all topology information will be displayed.

Topology arguments:
  -h, --help                  show this help message and exit
  -g, --gpu GPU [GPU ...]     Select a GPU ID, BDF, or UUID from the possible choices:
                              ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817
                              ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a
                              ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6
                              ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499
                                all | Selects all devices
  -U, --cpu CPU [CPU ...]     Select a CPU ID from the possible choices:
                              ID: 0
                              ID: 1
                              ID: 2
                              ID: 3
                                all | Selects all devices
  -O, --core CORE [CORE ...]  Select a Core ID from the possible choices:
                              ID: 0 - 95
                                all  | Selects all devices
  -a, --access                Displays link accessibility between GPUs
  -w, --weight                Displays relative weight between GPUs
  -o, --hops                  Displays the number of hops between GPUs
  -t, --link-type             Displays the link type between GPUs
  -b, --numa-bw               Display max and min bandwidth between nodes

Command Modifiers:
  --json                      Displays output in JSON format (human readable by default).
  --csv                       Displays output in CSV format (human readable by default).
  --file FILE                 Saves output into a file on the provided path (stdout by default).
  --loglevel LEVEL            Set the logging level from the possible choices:
                                DEBUG, INFO, WARNING, ERROR, CRITICAL
~$ amd-smi set --help
usage: amd-smi set [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL]
                   (-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]) [-f %]
                   [-l LEVEL] [-P SETPROFILE] [-d SCLKMAX] [-C PARTITION] [-M PARTITION]
                   [-o WATTS]

A GPU must be specified to set a configuration.
A set argument must be provided; Multiple set arguments are accepted

Set Arguments:
  -h, --help                         show this help message and exit
  -g, --gpu GPU [GPU ...]            Select a GPU ID, BDF, or UUID from the possible choices:
                                     ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817
                                     ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a
                                     ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6
                                     ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499
                                       all | Selects all devices
  -U, --cpu CPU [CPU ...]            Select a CPU ID from the possible choices:
                                     ID: 0
                                     ID: 1
                                     ID: 2
                                     ID: 3
                                       all | Selects all devices
  -O, --core CORE [CORE ...]         Select a Core ID from the possible choices:
                                     ID: 0 - 95
                                       all  | Selects all devices
  -f, --fan %                        Set GPU fan speed (0-255 or 0-100%)
  -l, --perf-level LEVEL             Set performance level
  -P, --profile SETPROFILE           Set power profile level (#) or a quoted string of custom profile attributes
  -d, --perf-determinism SCLKMAX     Set GPU clock frequency limit and performance level to determinism to get minimal performance variation
  -C, --compute-partition PARTITION  Set one of the following the compute partition modes:
                                        CPX, SPX, DPX, TPX, QPX
  -M, --memory-partition PARTITION   Set one of the following the memory partition modes:
                                        NPS1, NPS2, NPS4, NPS8
  -o, --power-cap WATTS              Set power capacity limit

Command Modifiers:
  --json                             Displays output in JSON format (human readable by default).
  --csv                              Displays output in CSV format (human readable by default).
  --file FILE                        Saves output into a file on the provided path (stdout by default).
  --loglevel LEVEL                   Set the logging level from the possible choices:
                                        DEBUG, INFO, WARNING, ERROR, CRITICAL
~$ amd-smi reset --help
usage: amd-smi reset [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL]
                     (-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]) [-G] [-c]
                     [-f] [-p] [-x] [-d] [-C] [-M] [-o]

A GPU must be specified to reset a configuration.
A reset argument must be provided; Multiple reset arguments are accepted

Reset Arguments:
  -h, --help                  show this help message and exit
  -g, --gpu GPU [GPU ...]     Select a GPU ID, BDF, or UUID from the possible choices:
                              ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817
                              ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a
                              ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6
                              ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499
                                all | Selects all devices
  -U, --cpu CPU [CPU ...]     Select a CPU ID from the possible choices:
                              ID: 0
                              ID: 1
                              ID: 2
                              ID: 3
                                all | Selects all devices
  -O, --core CORE [CORE ...]  Select a Core ID from the possible choices:
                              ID: 0 - 95
                                all  | Selects all devices
  -G, --gpureset              Reset the specified GPU
  -c, --clocks                Reset clocks and overdrive to default
  -f, --fans                  Reset fans to automatic (driver) control
  -p, --profile               Reset power profile back to default
  -x, --xgmierr               Reset XGMI error counts
  -d, --perf-determinism      Disable performance determinism
  -C, --compute-partition     Reset compute partitions on the specified GPU
  -M, --memory-partition      Reset memory partitions on the specified GPU
  -o, --power-cap             Reset power capacity limit to max capable

Command Modifiers:
  --json                      Displays output in JSON format (human readable by default).
  --csv                       Displays output in CSV format (human readable by default).
  --file FILE                 Saves output into a file on the provided path (stdout by default).
  --loglevel LEVEL            Set the logging level from the possible choices:
                                DEBUG, INFO, WARNING, ERROR, CRITICAL

Example output from amd-smi static#

Here is some example output from the tool:

~$ amd-smi static
CPU: 0
SMU:
    FW_VERSION: 85:81:0
INTERFACE_VERSION:
    PROTO VERSION: 6

CPU: 1
SMU:
    FW_VERSION: 85:81:0
INTERFACE_VERSION:
    PROTO VERSION: 6

CPU: 2
SMU:
    FW_VERSION: 85:81:0
INTERFACE_VERSION:
    PROTO VERSION: 6

CPU: 3
SMU:
    FW_VERSION: 85:81:0
INTERFACE_VERSION:
    PROTO VERSION: 6


GPU: 0
ASIC:
    MARKET_NAME: MI300A
    VENDOR_ID: 0x1002
    VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
    SUBVENDOR_ID: 0
    DEVICE_ID: 0x74a0
    REV_ID: 0x0
    ASIC_SERIAL: 0x71660a3c71d5f817
    OAM_ID: 0
BUS:
    BDF: 0000:01:00.0
    MAX_PCIE_SPEED: 32 GT/s
    MAX_PCIE_LANES: 16
    PCIE_INTERFACE_VERSION: Gen 5
    SLOT_TYPE: PCIE
VBIOS:
    NAME: N/A
    BUILD_DATE: N/A
    PART_NUMBER: N/A
    VERSION: N/A
BOARD:
    MODEL_NUMBER: N/A
    PRODUCT_SERIAL: N/A
    FRU_ID: N/A
    MANUFACTURER_NAME: N/A
    PRODUCT_NAME: N/A
LIMIT:
    MAX_POWER: 550 W
    CURRENT_POWER: 0 W
    SLOWDOWN_EDGE_TEMPERATURE: N/A
    SLOWDOWN_HOTSPOT_TEMPERATURE: 100 °C
    SLOWDOWN_VRAM_TEMPERATURE: 95 °C
    SHUTDOWN_EDGE_TEMPERATURE: N/A
    SHUTDOWN_HOTSPOT_TEMPERATURE: 110 °C
    SHUTDOWN_VRAM_TEMPERATURE: 105 °C
DRIVER:
    DRIVER_NAME: amdgpu
    DRIVER_VERSION: 6.5.2
VRAM:
    VRAM_TYPE: HBM
    VRAM_VENDOR: HYNIX
    VRAM_SIZE_MB: 96432 MB
CACHE:
    CACHE 0:
        CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE
        CACHE_SIZE: 32 KB
        CACHE_LEVEL: 1
        MAX_NUM_CU_SHARED: 2
        NUM_CACHE_INSTANCE: 464
    CACHE 1:
        CACHE_PROPERTIES: INST_CACHE, SIMD_CACHE
        CACHE_SIZE: 64 KB
        CACHE_LEVEL: 1
        MAX_NUM_CU_SHARED: 2
        NUM_CACHE_INSTANCE: 160
    CACHE 2:
        CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE
        CACHE_SIZE: 32768 KB
        CACHE_LEVEL: 2
        MAX_NUM_CU_SHARED: 304
        NUM_CACHE_INSTANCE: 1
    CACHE 3:
        CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE
        CACHE_SIZE: 262144 KB
        CACHE_LEVEL: 3
        MAX_NUM_CU_SHARED: 304
        NUM_CACHE_INSTANCE: 1

RAS:
    EEPROM_VERSION: 0x0
    PARITY_SCHEMA: DISABLED
    SINGLE_BIT_SCHEMA: DISABLED
    DOUBLE_BIT_SCHEMA: DISABLED
    POISON_SCHEMA: ENABLED
    ECC_BLOCK_STATE:
        BLOCK: UMC
        STATUS: DISABLED
        BLOCK: SDMA
        STATUS: ENABLED
        BLOCK: GFX
        STATUS: ENABLED
        BLOCK: MMHUB
        STATUS: ENABLED
        BLOCK: ATHUB
        STATUS: DISABLED
        BLOCK: PCIE_BIF
        STATUS: DISABLED
        BLOCK: HDP
        STATUS: DISABLED
        BLOCK: XGMI_WAFL
        STATUS: DISABLED
        BLOCK: DF
        STATUS: DISABLED
        BLOCK: SMN
        STATUS: DISABLED
        BLOCK: SEM
        STATUS: DISABLED
        BLOCK: MP0
        STATUS: DISABLED
        BLOCK: MP1
        STATUS: DISABLED
        BLOCK: FUSE
        STATUS: DISABLED
PARTITION:
    COMPUTE_PARTITION: SPX
    MEMORY_PARTITION: NPS1
NUMA:
    NODE: 0
    AFFINITY: 0

GPU: 1
ASIC:
    MARKET_NAME: MI300A
    VENDOR_ID: 0x1002
    VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
    SUBVENDOR_ID: 0
    DEVICE_ID: 0x74a0
    REV_ID: 0x0
    ASIC_SERIAL: 0xb4b2fa0be8628b1a
    OAM_ID: 1
BUS:
    BDF: 0001:01:00.0
    MAX_PCIE_SPEED: 32 GT/s
    MAX_PCIE_LANES: 16
    PCIE_INTERFACE_VERSION: Gen 5
    SLOT_TYPE: PCIE
VBIOS:
    NAME: N/A
    BUILD_DATE: N/A
    PART_NUMBER: N/A
    VERSION: N/A
BOARD:
    MODEL_NUMBER: N/A
    PRODUCT_SERIAL: N/A
    FRU_ID: N/A
    MANUFACTURER_NAME: N/A
    PRODUCT_NAME: N/A
LIMIT:
    MAX_POWER: 550 W
    CURRENT_POWER: 0 W
    SLOWDOWN_EDGE_TEMPERATURE: N/A
    SLOWDOWN_HOTSPOT_TEMPERATURE: 100 °C
    SLOWDOWN_VRAM_TEMPERATURE: 95 °C
    SHUTDOWN_EDGE_TEMPERATURE: N/A
    SHUTDOWN_HOTSPOT_TEMPERATURE: 110 °C
    SHUTDOWN_VRAM_TEMPERATURE: 105 °C
DRIVER:
    DRIVER_NAME: amdgpu
    DRIVER_VERSION: 6.5.2
VRAM:
    VRAM_TYPE: HBM
    VRAM_VENDOR: HYNIX
    VRAM_SIZE_MB: 96432 MB
CACHE:
 CACHE:
    CACHE 0:
        CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE
        CACHE_SIZE: 32 KB
        CACHE_LEVEL: 1
        MAX_NUM_CU_SHARED: 2
        NUM_CACHE_INSTANCE: 464
    CACHE 1:
        CACHE_PROPERTIES: INST_CACHE, SIMD_CACHE
        CACHE_SIZE: 64 KB
        CACHE_LEVEL: 1
        MAX_NUM_CU_SHARED: 2
        NUM_CACHE_INSTANCE: 160
    CACHE 2:
        CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE
        CACHE_SIZE: 32768 KB
        CACHE_LEVEL: 2
        MAX_NUM_CU_SHARED: 304
        NUM_CACHE_INSTANCE: 1
    CACHE 3:
        CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE
        CACHE_SIZE: 262144 KB
        CACHE_LEVEL: 3
        MAX_NUM_CU_SHARED: 304
        NUM_CACHE_INSTANCE: 1
RAS:
    EEPROM_VERSION: 0x0
    PARITY_SCHEMA: DISABLED
    SINGLE_BIT_SCHEMA: DISABLED
    DOUBLE_BIT_SCHEMA: DISABLED
    POISON_SCHEMA: ENABLED
    ECC_BLOCK_STATE:
        BLOCK: UMC
        STATUS: DISABLED
        BLOCK: SDMA
        STATUS: ENABLED
        BLOCK: GFX
        STATUS: ENABLED
        BLOCK: MMHUB
        STATUS: ENABLED
        BLOCK: ATHUB
        STATUS: DISABLED
        BLOCK: PCIE_BIF
        STATUS: DISABLED
        BLOCK: HDP
        STATUS: DISABLED
        BLOCK: XGMI_WAFL
        STATUS: DISABLED
        BLOCK: DF
        STATUS: DISABLED
        BLOCK: SMN
        STATUS: DISABLED
        BLOCK: SEM
        STATUS: DISABLED
        BLOCK: MP0
        STATUS: DISABLED
        BLOCK: MP1
        STATUS: DISABLED
        BLOCK: FUSE
        STATUS: DISABLED
PARTITION:
    COMPUTE_PARTITION: SPX
    MEMORY_PARTITION: NPS1
NUMA:
    NODE: 1
    AFFINITY: 1

GPU: 2
ASIC:
    MARKET_NAME: MI300A
    VENDOR_ID: 0x1002
    VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
    SUBVENDOR_ID: 0
    DEVICE_ID: 0x74a0
    REV_ID: 0x0
    ASIC_SERIAL: 0xa9073066a98ba4a6
    OAM_ID: 2
BUS:
    BDF: 0002:01:00.0
    MAX_PCIE_SPEED: 32 GT/s
    MAX_PCIE_LANES: 16
    PCIE_INTERFACE_VERSION: Gen 5
    SLOT_TYPE: PCIE
VBIOS:
    NAME: N/A
    BUILD_DATE: N/A
    PART_NUMBER: N/A
    VERSION: N/A
BOARD:
    MODEL_NUMBER: N/A
    PRODUCT_SERIAL: N/A
    FRU_ID: N/A
    MANUFACTURER_NAME: N/A
    PRODUCT_NAME: N/A
LIMIT:
    MAX_POWER: 550 W
    CURRENT_POWER: 0 W
    SLOWDOWN_EDGE_TEMPERATURE: N/A
    SLOWDOWN_HOTSPOT_TEMPERATURE: 100 °C
    SLOWDOWN_VRAM_TEMPERATURE: 95 °C
    SHUTDOWN_EDGE_TEMPERATURE: N/A
    SHUTDOWN_HOTSPOT_TEMPERATURE: 110 °C
    SHUTDOWN_VRAM_TEMPERATURE: 105 °C
DRIVER:
    DRIVER_NAME: amdgpu
    DRIVER_VERSION: 6.5.2
VRAM:
    VRAM_TYPE: HBM
    VRAM_VENDOR: HYNIX
    VRAM_SIZE_MB: 96432 MB
CACHE:
 CACHE:
    CACHE 0:
        CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE
        CACHE_SIZE: 32 KB
        CACHE_LEVEL: 1
        MAX_NUM_CU_SHARED: 2
        NUM_CACHE_INSTANCE: 464
    CACHE 1:
        CACHE_PROPERTIES: INST_CACHE, SIMD_CACHE
        CACHE_SIZE: 64 KB
        CACHE_LEVEL: 1
        MAX_NUM_CU_SHARED: 2
        NUM_CACHE_INSTANCE: 160
    CACHE 2:
        CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE
        CACHE_SIZE: 32768 KB
        CACHE_LEVEL: 2
        MAX_NUM_CU_SHARED: 304
        NUM_CACHE_INSTANCE: 1
    CACHE 3:
        CACHE_PROPERTIES: INST_CACHE, SIMD_CACHE
        CACHE_SIZE: 262144 KB
        CACHE_LEVEL: 3
        MAX_NUM_CU_SHARED: 304
        NUM_CACHE_INSTANCE: 1
RAS:
    EEPROM_VERSION: 0x0
    PARITY_SCHEMA: DISABLED
    SINGLE_BIT_SCHEMA: DISABLED
    DOUBLE_BIT_SCHEMA: DISABLED
    POISON_SCHEMA: ENABLED
    ECC_BLOCK_STATE:
        BLOCK: UMC
        STATUS: DISABLED
        BLOCK: SDMA
        STATUS: ENABLED
        BLOCK: GFX
        STATUS: ENABLED
        BLOCK: MMHUB
        STATUS: ENABLED
        BLOCK: ATHUB
        STATUS: DISABLED
        BLOCK: PCIE_BIF
        STATUS: DISABLED
        BLOCK: HDP
        STATUS: DISABLED
        BLOCK: XGMI_WAFL
        STATUS: DISABLED
        BLOCK: DF
        STATUS: DISABLED
        BLOCK: SMN
        STATUS: DISABLED
        BLOCK: SEM
        STATUS: DISABLED
        BLOCK: MP0
        STATUS: DISABLED
        BLOCK: MP1
        STATUS: DISABLED
        BLOCK: FUSE
        STATUS: DISABLED
PARTITION:
    COMPUTE_PARTITION: SPX
    MEMORY_PARTITION: NPS1
NUMA:
    NODE: 2
    AFFINITY: 2

GPU: 3
ASIC:
    MARKET_NAME: MI300A
    VENDOR_ID: 0x1002
    VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
    SUBVENDOR_ID: 0
    DEVICE_ID: 0x74a0
    REV_ID: 0x0
    ASIC_SERIAL: 0x53a0a1ff3830f499
    OAM_ID: 3
BUS:
    BDF: 0003:01:00.0
    MAX_PCIE_SPEED: 32 GT/s
    MAX_PCIE_LANES: 16
    PCIE_INTERFACE_VERSION: Gen 5
    SLOT_TYPE: PCIE
VBIOS:
    NAME: N/A
    BUILD_DATE: N/A
    PART_NUMBER: N/A
    VERSION: N/A
BOARD:
    MODEL_NUMBER: N/A
    PRODUCT_SERIAL: N/A
    FRU_ID: N/A
    MANUFACTURER_NAME: N/A
    PRODUCT_NAME: N/A
LIMIT:
    MAX_POWER: 550 W
    CURRENT_POWER: 0 W
    SLOWDOWN_EDGE_TEMPERATURE: N/A
    SLOWDOWN_HOTSPOT_TEMPERATURE: 100 °C
    SLOWDOWN_VRAM_TEMPERATURE: 95 °C
    SHUTDOWN_EDGE_TEMPERATURE: N/A
    SHUTDOWN_HOTSPOT_TEMPERATURE: 110 °C
    SHUTDOWN_VRAM_TEMPERATURE: 105 °C
DRIVER:
    DRIVER_NAME: amdgpu
    DRIVER_VERSION: 6.5.2
VRAM:
    VRAM_TYPE: HBM
    VRAM_VENDOR: HYNIX
    VRAM_SIZE_MB: 96432 MB
CACHE:
    CACHE 0:
CACHE:
    CACHE 0:
        CACHE_PROPERTIES: INST_CACHE, SIMD_CACHE
        CACHE_SIZE: 32 KB
        CACHE_LEVEL: 1
        MAX_NUM_CU_SHARED: 2
        NUM_CACHE_INSTANCE: 464
    CACHE 1:
        CACHE_PROPERTIES: INST_CACHE, SIMD_CACHE
        CACHE_SIZE: 64 KB
        CACHE_LEVEL: 1
        MAX_NUM_CU_SHARED: 2
        NUM_CACHE_INSTANCE: 160
    CACHE 2:
        CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE
        CACHE_SIZE: 32768 KB
        CACHE_LEVEL: 2
        MAX_NUM_CU_SHARED: 304
        NUM_CACHE_INSTANCE: 1
    CACHE 3:
        CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE
        CACHE_SIZE: 262144 KB
        CACHE_LEVEL: 3
        MAX_NUM_CU_SHARED: 304
        NUM_CACHE_INSTANCE: 1
RAS:
    EEPROM_VERSION: 0x0
    PARITY_SCHEMA: DISABLED
    SINGLE_BIT_SCHEMA: DISABLED
    DOUBLE_BIT_SCHEMA: DISABLED
    POISON_SCHEMA: ENABLED
    ECC_BLOCK_STATE:
        BLOCK: UMC
        STATUS: DISABLED
        BLOCK: SDMA
        STATUS: ENABLED
        BLOCK: GFX
        STATUS: ENABLED
        BLOCK: MMHUB
        STATUS: ENABLED
        BLOCK: ATHUB
        STATUS: DISABLED
        BLOCK: PCIE_BIF
        STATUS: DISABLED
        BLOCK: HDP
        STATUS: DISABLED
        BLOCK: XGMI_WAFL
        STATUS: DISABLED
        BLOCK: DF
        STATUS: DISABLED
        BLOCK: SMN
        STATUS: DISABLED
        BLOCK: SEM
        STATUS: DISABLED
        BLOCK: MP0
        STATUS: DISABLED
        BLOCK: MP1
        STATUS: DISABLED
        BLOCK: FUSE
        STATUS: DISABLED
PARTITION:
    COMPUTE_PARTITION: SPX
    MEMORY_PARTITION: NPS1
NUMA:
    NODE: 3
    AFFINITY: 3

Disclaimer#

The information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein.

AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Copyright © 2014-2023 Advanced Micro Devices, Inc. All rights reserved.