API#
This section provides details of the library API
Communicator Functions#
-
ncclResult_t ncclGetUniqueId(ncclUniqueId *uniqueId)
-
ncclResult_t ncclCommInitRank(ncclComm_t *comm, int nranks, ncclUniqueId commId, int rank)
-
ncclResult_t ncclCommInitAll(ncclComm_t *comm, int ndev, const int *devlist)
-
ncclResult_t ncclCommDestroy(ncclComm_t comm)
-
ncclResult_t ncclCommAbort(ncclComm_t comm)
-
ncclResult_t ncclCommCount(const ncclComm_t comm, int *count)
-
ncclResult_t ncclCommCuDevice(const ncclComm_t comm, int *device)
-
ncclResult_t ncclCommUserRank(const ncclComm_t comm, int *rank)
Collective Communication Operations#
Collective communication operations must be called separately for each communicator in a communicator clique.
They return when operations have been enqueued on the hipstream.
Since they may perform inter-CPU synchronization, each call has to be done from a different thread or process, or need to use Group Semantics (see below).
-
ncclResult_t ncclReduce(const void *sendbuff, void *recvbuff, size_t count, ncclDataType_t datatype, ncclRedOp_t op, int root, ncclComm_t comm, hipStream_t stream)
-
ncclResult_t ncclBcast(void *buff, size_t count, ncclDataType_t datatype, int root, ncclComm_t comm, hipStream_t stream)
-
ncclResult_t ncclBroadcast(const void *sendbuff, void *recvbuff, size_t count, ncclDataType_t datatype, int root, ncclComm_t comm, hipStream_t stream)
-
ncclResult_t ncclAllReduce(const void *sendbuff, void *recvbuff, size_t count, ncclDataType_t datatype, ncclRedOp_t op, ncclComm_t comm, hipStream_t stream)
-
ncclResult_t ncclReduceScatter(const void *sendbuff, void *recvbuff, size_t recvcount, ncclDataType_t datatype, ncclRedOp_t op, ncclComm_t comm, hipStream_t stream)
-
ncclResult_t ncclAllGather(const void *sendbuff, void *recvbuff, size_t sendcount, ncclDataType_t datatype, ncclComm_t comm, hipStream_t stream)
Warning
doxygenfunction: Cannot find function “ncclSend” in doxygen xml output for project “RCCL 2.10.3 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-rccl/checkouts/docs-5.0.2/docs/doxygen/docBin/xml
Warning
doxygenfunction: Cannot find function “ncclRecv” in doxygen xml output for project “RCCL 2.10.3 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-rccl/checkouts/docs-5.0.2/docs/doxygen/docBin/xml
Warning
doxygenfunction: Cannot find function “ncclGather” in doxygen xml output for project “RCCL 2.10.3 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-rccl/checkouts/docs-5.0.2/docs/doxygen/docBin/xml
Warning
doxygenfunction: Cannot find function “ncclScatter” in doxygen xml output for project “RCCL 2.10.3 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-rccl/checkouts/docs-5.0.2/docs/doxygen/docBin/xml
Warning
doxygenfunction: Cannot find function “ncclAllToAll” in doxygen xml output for project “RCCL 2.10.3 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-rccl/checkouts/docs-5.0.2/docs/doxygen/docBin/xml
Group Semantics#
When managing multiple GPUs from a single thread, and since NCCL collective calls may perform inter-CPU synchronization, we need to “group” calls for different ranks/devices into a single call.
Grouping NCCL calls as being part of the same collective operation is done using ncclGroupStart and ncclGroupEnd. ncclGroupStart will enqueue all collective calls until the ncclGroupEnd call, which will wait for all calls to be complete. Note that for collective communication, ncclGroupEnd only guarantees that the operations are enqueued on the streams, not that the operation is effectively done.
Both collective communication and ncclCommInitRank can be used in conjunction of ncclGroupStart/ncclGroupEnd.
-
ncclResult_t ncclGroupStart()
-
ncclResult_t ncclGroupEnd()
Library Functions#
-
ncclResult_t ncclGetVersion(int *version)
-
const char *ncclGetErrorString(ncclResult_t result)
Types#
There are few data structures that are internal to the library. The pointer types to these structures are given below. The user would need to use these types to create handles and pass them between different library functions.
-
typedef struct ncclComm *ncclComm_t
-
struct ncclUniqueId
Enumerations#
This section provides all the enumerations used.
-
enum ncclResult_t
Values:
-
enumerator ncclSuccess
-
enumerator ncclUnhandledCudaError
-
enumerator ncclSystemError
-
enumerator ncclInternalError
-
enumerator ncclInvalidArgument
-
enumerator ncclInvalidUsage
-
enumerator ncclNumResults
-
enumerator ncclSuccess
-
enum ncclRedOp_t
Values:
-
enumerator ncclSum
-
enumerator ncclProd
-
enumerator ncclMax
-
enumerator ncclMin
-
enumerator ncclNumOps
-
enumerator ncclSum
-
enum ncclDataType_t
Values:
-
enumerator ncclInt8
-
enumerator ncclChar
-
enumerator ncclUint8
-
enumerator ncclInt32
-
enumerator ncclInt
-
enumerator ncclUint32
-
enumerator ncclInt64
-
enumerator ncclUint64
-
enumerator ncclFloat16
-
enumerator ncclHalf
-
enumerator ncclFloat32
-
enumerator ncclFloat
-
enumerator ncclFloat64
-
enumerator ncclDouble
-
enumerator ncclBfloat16
-
enumerator ncclNumTypes
-
enumerator ncclInt8