rocSHMEM environment variables

rocSHMEM environment variables#

This section describes the important environment variables used to control the behavior of rocSHMEM.

Environment variable

Default value

Value

ROCSHMEM_DEBUG_LEVEL
Debug output level

WARN

Levels (from least to most verbose):
NONE: Suppress all output.
ERROR: Print error messages only.
WARN: Print warnings and errors (default).
ENV: Print modified environment variables at startup.
VERSION: Print build/version information at startup.
INFO: Print informational messages and above.
API: Print API call tracing (requires BUILD_DEBUG_TRACE_HOST/BUILD_DEBUG_TRACE_DEVICE).
TRACE: Print all messages including internal traces (requires BUILD_DEBUG_TRACE_HOST/BUILD_DEBUG_TRACE_DEVICE).

Modifiers can be appended with : to suppress specific categories:
:noerror, :nowarn, :noenv, :noversion, :noinfo, :noapi, :notrace
:full or :all after env or :env modifier controls env print detail.
:color (default) or :nocolor enables/disables ANSI color output.

Examples: trace:noversion, env:full, api:noenv, trace:nocolor
ROCSHMEM_HEAP_SIZE
Defines the size of the rocSHMEM symmetric heap in bytes (per PE).

1073741824 (1 GB)

Size in bytes (per PE).
Note: the heap is on GPU memory.
ROCSHMEM_MAX_NUM_HOST_CONTEXTS
Maximum number of host-side communication contexts

1

Maximum number of host-side contexts.

ROCSHMEM_MAX_NUM_CONTEXTS
Defines the number of contexts an application can use.

32

Maximum number of contexts.

ROCSHMEM_MAX_NUM_TEAMS
Defines the number of teams an application can use.

40

Maximum number of teams.

ROCSHMEM_BACKEND
When rocSHMEM is compiled for all backends, this environment variable
selects which backend to execute. The default value is an empty string and rocSHMEM auto-selects the most appropriate backend.

`` ``

ipc: IPC Backend
ro: Reverse Offload Backend
gda: GPU Direct Async Backend
ROCSHMEM_UNIQUEID_WITH_MPI
Defines whether rocSHMEM is expected to use MPI internally when using the uniqueId based initialization.

0

0: Do not use MPI.
1: Use MPI.
ROCSHMEM_DISABLE_MIXED_IPC
Defines whether to force using the network conduit even when IPC is available.

0

0: Use IPC when available.
1: Force network conduit.
ROCSHMEM_USE_IB_HCA
Forces the NIC that this PE uses. When this value is set NIC auto-detection and mapping is disabled, the NIC specified in the variable
will be selected. The default value is an empty string and rocSHMEM auto-detects the most appropriate NIC.

`` ``

Example value: bnxt_re0
ROCSHMEM_HCA_LIST
Comma separated list of NIC names that can be used by rocSHMEM. Unlike ROCSHMEM_USE_IB_HCA, when this variable is set,
NIC auto-detection and mapping still executes, but NICs that are not in the list are discarded before auto-detection runs.
Prefixing the list with ^ turns the list in an exclude list, NICs that are in the list are discarded before auto-detection runs.
The default value is an empty string and rocSHMEM auto-detects the most appropriate NIC.

`` ``

Example value: bnxt_re1,bnxt_re11, ^mlx5_0,mlx5_3
ROCSHMEM_BOOTSTRAP_SOCKET_IFNAME
Chooses the interface to bootstrap rocSHMEM with.
Only valid when not using MPI.
The default value is an empty string and rocSHMEM auto-detects the most appropriate interface.

`` ``

Example value: eno8303
ROCSHMEM_GDA_PROVIDER
When rocSHMEM is compiled with support for multiple NIC vendors,
the environment variable selects the desired provider.
The default value is an empty string and rocSHMEM auto-detects the most appropriate NIC.

`` ``

bnxt: Broadcom Thor 2
pensando: AMD Pensando Pollara
ionic: AMD Pensando Pollara (alias)
mlx5: Mellanox ConnectX-7
ROCSHMEM_GDA_ALTERNATE_QP_PORTS
Enables or disables alternating QP mappings across rocSHMEM contexts.

1

0: Disabled.
1: Enabled. This helps saturate bandwidth on multiport bonded interfaces.
ROCSHMEM_GDA_TRAFFIC_CLASS
When using an NIC with an Ethernet link layer, this sets the traffic class for the QPs.

0

The traffic class number.

ROCSHMEM_GDA_PCIE_RELAXED_ORDERING
Enables PCIe Relaxed Ordering when registering the symmetric heap with the RDMA NICs.

0

0: Disabled.
1: Enabled.
ROCSHMEM_GDA_ENABLE_DMABUF
Enable dmabuf support for memory registration.

0

0: Disabled.
1: Enabled.
ROCSHMEM_GDA_ALLTOALLV_WG_ALGO
Selects between two algorithms to use for GDA based alltoallv.
The GET algorithm uses an initial round of alltoallv
communication to distribute displacements then a second round to
get transfer data. This algorithm has a higher latency but
has better performance for large messages.
The COPY algorithm does an alltoallv communication
pattern into a staging buffer then does a copy into the destination
buffers. This reduces latency but requires more memory, this
algorithm only works for small messages.

GET

GET: GET-based alltoallv algorithm
COPY: Copy alltoallv algorithm
ROCSHMEM_GDA_OVERRIDE_NIC_FIRMWARE_CHECK
This environment variable should be used with caution.
It overrides the NIC firmware check if
a user wants to use an unsupported NIC firmware.
If the firmware check is disabled rocSHMEM is not guaranteed to work.

0

0: Disabled.
1: Enabled.
ROCSHMEM_GDA_SQ_SIZE
This environment variable sets the length of the SQ for GDA.

1024

Maximum number of Work Queue Entries (WQEs) posted on the Send Queue (SQ)
ROCSHMEM_GDA_NUM_QPS_PER_PE_DEFAULT_CTX
Sets the number of Queue Pairs (QPs) to create per PE for the default context.

1

Number of QPs per PE for the default context.

ROCSHMEM_GDA_NUM_QPS_PER_PE_USR_CTX
Sets the number of Queue Pairs (QPs) to create per PE for each user context.

1

Number of QPs per PE for each user context.

ROCSHMEM_MAX_WF_BUFFERS
Maximum number of wavefront buffer arrays in default context (determines size of status, return, and atomic return buffers)

1024

ROCSHMEM_BOOTSTRAP_TIMEOUT
Bootstrap initialization timeout in seconds

5

ROCSHMEM_BOOTSTRAP_HOSTID
Override host identifier for bootstrap. Empty string uses hostname.

`` ``

ROCSHMEM_BOOTSTRAP_SOCKET_FAMILY
Socket family for bootstrap (AF_UNSPEC, AF_INET, AF_INET6)

types::socket_family::UNSPEC