RCCL environment variables#
This section describes the most important RCCL environment variables, which are grouped by functionality.
Configuration and setup#
The configuration and setup environment variables for RCCL are collected in the following table.
Environment variable |
Value |
|---|---|
NCCL_CONF_FILESpecifies the path to the RCCL configuration file.
|
String path to configuration file
Default:
~/.rccl.conf or /etc/rccl.conf |
NCCL_HOSTIDSets the host identifier for multi-node communication.
|
String value for host identification
Used for host hash generation
|
Logging and debugging#
The logging and debugging environment variables for RCCL are collected in the following table.
Environment variable |
Value |
|---|---|
RCCL_LOG_LEVELControls RCCL logging verbosity.
|
Integer value (default:
1)Higher values increase logging detail
|
NCCL_DEBUG_SUBSYSControls which subsystems generate debug output.
|
Comma-separated list of subsystems (e.g.,
INIT,COLL)Prefix with
^ to invert selection |
Algorithm and protocol control#
The algorithm and protocol control environment variables for RCCL are collected in the following table.
Environment variable |
Value |
|---|---|
NCCL_ALGOForces specific algorithm selection for collectives.
|
Algorithm name string
Used to override automatic algorithm selection
|
NCCL_PROTOForces specific protocol selection for communication.
|
Protocol name string
Used to override automatic protocol selection
|
Network and topology#
The network and topology environment variables for RCCL are collected in the following table.
Environment variable |
Value |
|---|---|
NCCL_IB_HCASpecifies InfiniBand device:port to use.
|
Device specification string
Prefix with
^ for exclusion, = for exact match |
NCCL_IB_GID_INDEXDefines the Global ID index used in RoCE mode.
|
Integer value (default:
-1)See InfiniBand
show_gids command for valid values |
NCCL_SOCKET_IFNAMESpecifies which IP interfaces to use for communication.
|
Interface prefix string or list
Multiple prefixes separated by
,Prefix with
^ for exclusion, = for exact matchExample:
eth (all eth interfaces), =eth0 (exact match) |
NCCL_SOCKET_FAMILYForces IPv4/IPv6 interface selection.
|
AF_INET: Force IPv4AF_INET6: Force IPv6Unset: Use first available
|
NCCL_NET_MERGE_LEVELControls network device merging behavior.
|
Integer value specifying merge level
Default:
PATH_PORT |
NCCL_NET_FORCE_MERGEForces merging of network devices.
|
String specifying forced merge configuration
|
NCCL_RINGSDefines custom ring topology.
|
Ring topology specification string
Overrides automatic topology detection
|
RCCL_TREESDefines custom tree topology.
|
Tree topology specification string
Alternative to ring topology
|
NCCL_RINGS_REMAPControls ring remapping for specific topologies.
|
Remapping specification string
Used with Rome 4P2H topology
|
Development and testing (advanced)#
The development and testing environment variables for RCCL are collected in the following table. These variables are primarily intended for debugging and development purposes.
Environment variable |
Value |
|---|---|
CUDA_LAUNCH_BLOCKINGControls CUDA kernel launch blocking behavior.
|
0: Non-blocking launches1 or non-zero: Blocking launches |
NCCL_COMM_IDEnables multi-process mode in test applications.
|
Any non-empty value enables multi-process mode
Used with test executables for distributed testing
|