Communicator Initialization/Destruction

Communicator Initialization/Destruction#

Rocprofiler SDK Developer API: Communicator Initialization/Destruction
Rocprofiler SDK Developer API 0.5.0
ROCm Profiling API and tools
Communicator Initialization/Destruction

Functions

ncclResult_t ncclGetUniqueId (ncclUniqueId *uniqueId)
 Generates an ID for ncclCommInitRank.
 
ncclResult_t ncclCommInitRankConfig (ncclComm_t *comm, int nranks, ncclUniqueId commId, int rank, ncclConfig_t *config)
 Create a new communicator with config.
 
ncclResult_t ncclCommInitRank (ncclComm_t *comm, int nranks, ncclUniqueId commId, int rank)
 Creates a new communicator (multi thread/process version).
 
ncclResult_t ncclCommInitAll (ncclComm_t *comm, int ndev, const int *devlist)
 Creates a clique of communicators (single process version).
 
ncclResult_t ncclCommFinalize (ncclComm_t comm)
 Finalize a communicator.
 
ncclResult_t ncclCommDestroy (ncclComm_t comm)
 Frees local resources associated with communicator object.
 
ncclResult_t ncclCommAbort (ncclComm_t comm)
 Abort any in-progress calls and destroy the communicator object.
 
ncclResult_t ncclCommSplit (ncclComm_t comm, int color, int key, ncclComm_t *newcomm, ncclConfig_t *config)
 Create one or more communicators from an existing one.
 

Detailed Description

API calls that operate on communicators. Communicators objects are used to launch collective communication operations. Unique ranks between 0 and N-1 must be assigned to each HIP device participating in the same Communicator. Using the same HIP device for multiple ranks of the same Communicator is not supported at this time.

Function Documentation

◆ ncclCommAbort()

ncclResult_t ncclCommAbort ( ncclComm_t  comm)

Abort any in-progress calls and destroy the communicator object.

Frees resources associated with communicator object and aborts any operations that might still be running on the device.

Returns
Result code. See Result Codes for more details.
Parameters
[in]commCommunicator to abort and destroy

◆ ncclCommDestroy()

ncclResult_t ncclCommDestroy ( ncclComm_t  comm)

Frees local resources associated with communicator object.

Destroy all local resources associated with the passed in communicator object

Returns
Result code. See Result Codes for more details.
Parameters
[in]commCommunicator to destroy

◆ ncclCommFinalize()

ncclResult_t ncclCommFinalize ( ncclComm_t  comm)

Finalize a communicator.

ncclCommFinalize flushes all issued communications and marks communicator state as ncclInProgress. The state will change to ncclSuccess when the communicator is globally quiescent and related resources are freed; then, calling ncclCommDestroy can locally free the rest of the resources (e.g. communicator itself) without blocking.

Returns
Result code. See Result Codes for more details.
Parameters
[in]commCommunicator to finalize

◆ ncclCommInitAll()

ncclResult_t ncclCommInitAll ( ncclComm_t comm,
int  ndev,
const int *  devlist 
)

Creates a clique of communicators (single process version).

This is a convenience function to create a single-process communicator clique. Returns an array of ndev newly initialized communicators in comm. comm should be pre-allocated with size at least ndev*sizeof(ncclComm_t). If devlist is NULL, the first ndev HIP devices are used. Order of devlist defines user-order of processors within the communicator.

Returns
Result code. See Result Codes for more details.
Parameters
[out]commPointer to array of created communicators
[in]ndevTotal number of ranks participating in this communicator
[in]devlistArray of GPU device indices to create for

◆ ncclCommInitRank()

ncclResult_t ncclCommInitRank ( ncclComm_t comm,
int  nranks,
ncclUniqueId  commId,
int  rank 
)

Creates a new communicator (multi thread/process version).

Rank must be between 0 and nranks-1 and unique within a communicator clique. Each rank is associated to a CUDA device, which has to be set before calling ncclCommInitRank. ncclCommInitRank implicitly syncronizes with other ranks, so it must be called by different threads/processes or use ncclGroupStart/ncclGroupEnd.

Returns
Result code. See Result Codes for more details.
Parameters
[out]commPointer to created communicator
[in]nranksTotal number of ranks participating in this communicator
[in]commIdUniqueId required for initialization
[in]rankCurrent rank to create communicator for

◆ ncclCommInitRankConfig()

ncclResult_t ncclCommInitRankConfig ( ncclComm_t comm,
int  nranks,
ncclUniqueId  commId,
int  rank,
ncclConfig_t config 
)

Create a new communicator with config.

Create a new communicator (multi thread/process version) with a configuration set by users. See Communicator Configuration for more details. Each rank is associated to a CUDA device, which has to be set before calling ncclCommInitRank.

Returns
Result code. See Result Codes for more details.
Parameters
[out]commPointer to created communicator
[in]nranksTotal number of ranks participating in this communicator
[in]commIdUniqueId required for initialization
[in]rankCurrent rank to create communicator for. [0 to nranks-1]
[in]configPointer to communicator configuration

◆ ncclCommSplit()

ncclResult_t ncclCommSplit ( ncclComm_t  comm,
int  color,
int  key,
ncclComm_t newcomm,
ncclConfig_t config 
)

Create one or more communicators from an existing one.

Creates one or more communicators from an existing one. Ranks with the same color will end up in the same communicator. Within the new communicator, key will be used to order ranks. NCCL_SPLIT_NOCOLOR as color will indicate the rank will not be part of any group and will therefore return a NULL communicator. If config is NULL, the new communicator will inherit the original communicator's configuration

Returns
Result code. See Result Codes for more details.
Parameters
[in]commOriginal communicator object for this rank
[in]colorColor to assign this rank
[in]keyKey used to order ranks within the same new communicator
[out]newcommPointer to new communicator
[in]configConfig file for new communicator. May be NULL to inherit from comm

◆ ncclGetUniqueId()

ncclResult_t ncclGetUniqueId ( ncclUniqueId uniqueId)

Generates an ID for ncclCommInitRank.

Generates an ID to be used in ncclCommInitRank. ncclGetUniqueId should be called once by a single rank and the ID should be distributed to all ranks in the communicator before using it as a parameter for ncclCommInitRank.

Returns
Result code. See Result Codes for more details.
Parameters
[out]uniqueIdPointer to where uniqueId will be stored