Group semantics

Group semantics#

Rocprofiler SDK Developer API: Group semantics
Rocprofiler SDK Developer API 0.5.0
ROCm Profiling API and tools
Group semantics

Functions

ncclResult_t ncclGroupStart ()
 Group Start.
 
ncclResult_t ncclGroupEnd ()
 Group End.
 

Detailed Description

When managing multiple GPUs from a single thread, and since RCCL collective calls may perform inter-CPU synchronization, we need to "group" calls for different ranks/devices into a single call.

Grouping RCCL calls as being part of the same collective operation is done using ncclGroupStart and ncclGroupEnd. ncclGroupStart will enqueue all collective calls until the ncclGroupEnd call, which will wait for all calls to be complete. Note that for collective communication, ncclGroupEnd only guarantees that the operations are enqueued on the streams, not that the operation is effectively done.

Both collective communication and ncclCommInitRank can be used in conjunction of ncclGroupStart/ncclGroupEnd, but not together.

Group semantics also allow to fuse multiple operations on the same device to improve performance (for aggregated collective calls), or to permit concurrent progress of multiple send/receive operations.

Function Documentation

◆ ncclGroupEnd()

ncclResult_t ncclGroupEnd ( )

Group End.

End a group call. Start a fused RCCL operation consisting of all calls since ncclGroupStart. Operations on the HIP stream depending on the RCCL operations need to be called after ncclGroupEnd.

Returns
Result code. See Result Codes for more details.

◆ ncclGroupStart()

ncclResult_t ncclGroupStart ( )

Group Start.

Start a group call. All calls to RCCL until ncclGroupEnd will be fused into a single RCCL operation. Nothing will be started on the HIP stream until ncclGroupEnd.

Returns
Result code. See Result Codes for more details.