rocSHMEM environment variables#
This section describes the important environment variables used to control the behavior of rocSHMEM.
Environment variable |
Default value |
Value |
|---|---|---|
ROCSHMEM_DEBUG_LEVELDebug output level
|
|
Levels (from least to most verbose):
NONE: Suppress all output.ERROR: Print error messages only.WARN: Print warnings and errors (default).ENV: Print modified environment variables at startup.VERSION: Print build/version information at startup.INFO: Print informational messages and above.API: Print API call tracing (requires BUILD_DEBUG_TRACE_HOST/BUILD_DEBUG_TRACE_DEVICE).TRACE: Print all messages including internal traces (requires BUILD_DEBUG_TRACE_HOST/BUILD_DEBUG_TRACE_DEVICE).Modifiers can be appended with
: to suppress specific categories::noerror, :nowarn, :noenv, :noversion, :noinfo, :noapi, :notrace:full or :all after env or :env modifier controls env print detail.:color (default) or :nocolor enables/disables ANSI color output.Examples:
trace:noversion, env:full, api:noenv, trace:nocolor |
ROCSHMEM_HEAP_SIZEDefines the size of the rocSHMEM symmetric heap in bytes (per PE).
|
|
Size in bytes (per PE).
Note: the heap is on GPU memory.
|
ROCSHMEM_MAX_NUM_HOST_CONTEXTSMaximum number of host-side communication contexts
|
|
Maximum number of host-side contexts. |
ROCSHMEM_MAX_NUM_CONTEXTSDefines the number of contexts an application can use.
|
|
Maximum number of contexts. |
ROCSHMEM_MAX_NUM_TEAMSDefines the number of teams an application can use.
|
|
Maximum number of teams. |
ROCSHMEM_BACKENDWhen rocSHMEM is compiled for all backends, this environment variable
selects which backend to execute. The default value is an empty string and rocSHMEM auto-selects the most appropriate backend.
|
`` `` |
ipc: IPC Backendro: Reverse Offload Backendgda: GPU Direct Async Backend |
ROCSHMEM_UNIQUEID_WITH_MPIDefines whether rocSHMEM is expected to use MPI internally when using the uniqueId based initialization.
|
|
0: Do not use MPI.1: Use MPI. |
ROCSHMEM_DISABLE_MIXED_IPCDefines whether to force using the network conduit even when IPC is available.
|
|
0: Use IPC when available.1: Force network conduit. |
ROCSHMEM_USE_IB_HCAForces the NIC that this PE uses. When this value is set NIC auto-detection and mapping is disabled, the NIC specified in the variable
will be selected. The default value is an empty string and rocSHMEM auto-detects the most appropriate NIC.
|
`` `` |
Example value:
bnxt_re0 |
ROCSHMEM_HCA_LISTComma separated list of NIC names that can be used by rocSHMEM. Unlike
ROCSHMEM_USE_IB_HCA, when this variable is set,NIC auto-detection and mapping still executes, but NICs that are not in the list are discarded before auto-detection runs.
Prefixing the list with
^ turns the list in an exclude list, NICs that are in the list are discarded before auto-detection runs.The default value is an empty string and rocSHMEM auto-detects the most appropriate NIC.
|
`` `` |
Example value:
bnxt_re1,bnxt_re11, ^mlx5_0,mlx5_3 |
ROCSHMEM_BOOTSTRAP_SOCKET_IFNAMEChooses the interface to bootstrap rocSHMEM with.
Only valid when not using MPI.
The default value is an empty string and rocSHMEM auto-detects the most appropriate interface.
|
`` `` |
Example value:
eno8303 |
ROCSHMEM_GDA_PROVIDERWhen rocSHMEM is compiled with support for multiple NIC vendors,
the environment variable selects the desired provider.
The default value is an empty string and rocSHMEM auto-detects the most appropriate NIC.
|
`` `` |
bnxt: Broadcom Thor 2pensando: AMD Pensando Pollaraionic: AMD Pensando Pollara (alias)mlx5: Mellanox ConnectX-7 |
ROCSHMEM_GDA_ALTERNATE_QP_PORTSEnables or disables alternating QP mappings across rocSHMEM contexts.
|
|
0: Disabled.1: Enabled. This helps saturate bandwidth on multiport bonded interfaces. |
ROCSHMEM_GDA_TRAFFIC_CLASSWhen using an NIC with an Ethernet link layer, this sets the traffic class for the QPs.
|
|
The traffic class number. |
ROCSHMEM_GDA_PCIE_RELAXED_ORDERINGEnables PCIe Relaxed Ordering when registering the symmetric heap with the RDMA NICs.
|
|
0: Disabled.1: Enabled. |
ROCSHMEM_GDA_ENABLE_DMABUFEnable dmabuf support for memory registration.
|
|
0: Disabled.1: Enabled. |
ROCSHMEM_GDA_ALLTOALLV_WG_ALGOSelects between two algorithms to use for GDA based alltoallv.
The GET algorithm uses an initial round of alltoallv
communication to distribute displacements then a second round to
get transfer data. This algorithm has a higher latency but
has better performance for large messages.
The COPY algorithm does an alltoallv communication
pattern into a staging buffer then does a copy into the destination
buffers. This reduces latency but requires more memory, this
algorithm only works for small messages.
|
|
GET: GET-based alltoallv algorithmCOPY: Copy alltoallv algorithm |
ROCSHMEM_GDA_OVERRIDE_NIC_FIRMWARE_CHECKThis environment variable should be used with caution.
It overrides the NIC firmware check if
a user wants to use an unsupported NIC firmware.
If the firmware check is disabled rocSHMEM is not guaranteed to work.
|
|
0: Disabled.1: Enabled. |
ROCSHMEM_GDA_SQ_SIZEThis environment variable sets the length of the SQ for GDA.
|
|
Maximum number of Work Queue Entries (WQEs) posted on the Send Queue (SQ)
|
ROCSHMEM_GDA_NUM_QPS_PER_PE_DEFAULT_CTXSets the number of Queue Pairs (QPs) to create per PE for the default context.
|
|
Number of QPs per PE for the default context. |
ROCSHMEM_GDA_NUM_QPS_PER_PE_USR_CTXSets the number of Queue Pairs (QPs) to create per PE for each user context.
|
|
Number of QPs per PE for each user context. |
ROCSHMEM_MAX_WF_BUFFERSMaximum number of wavefront buffer arrays in default context (determines size of status, return, and atomic return buffers)
|
|
|
ROCSHMEM_BOOTSTRAP_TIMEOUTBootstrap initialization timeout in seconds
|
|
|
ROCSHMEM_BOOTSTRAP_HOSTIDOverride host identifier for bootstrap. Empty string uses hostname.
|
`` `` |
|
ROCSHMEM_BOOTSTRAP_SOCKET_FAMILYSocket family for bootstrap (AF_UNSPEC, AF_INET, AF_INET6)
|
|