hip.rccl#
(No short description)
- Attributes:
- NCCL_MAJOR (
int
): Macro constant.
- NCCL_MINOR (
int
): Macro constant.
- NCCL_PATCH (
int
): Macro constant.
- NCCL_SUFFIX (
bytes
): Macro constant.
- NCCL_VERSION_CODE (
int
): Macro constant.
- RCCL_BFLOAT16 (
int
): Macro constant.
- RCCL_GATHER_SCATTER (
int
): Macro constant.
- RCCL_ALLTOALLV (
int
): Macro constant.
- NCCL_UNIQUE_ID_BYTES (
int
): Macro constant.
- ncclComm_t:
alias of
ncclComm
- ncclConfig_t:
alias of
ncclConfig_v21700
- NCCL_MAJOR (
- class hip.rccl.ncclComm#
Bases:
Pointer
Python wrapper for cdef class crccl.ncclComm.
Python wrapper for cdef class crccl.ncclComm.
If this type is initialized via its
__init__
method, it allocates a member of the underlying C type and destroys it again if the wrapper type is deallocated.This type also serves as adapter when appearing as argument type in a function signature. In this case, the type can further be initialized from a number of Python objects:
None
:This will set the
self._ptr
attribute toNULL
.int
:Interprets the integer value as pointer address and writes it to
self._ptr
. No ownership is transferred.-
Takes the pointer address
pyobj.value
and writes it toself._ptr
. No ownership is transferred. object
that implements the CUDA Array Interface protocol:Takes the integer-valued pointer address, i.e. the first entry of the
data
tuple frompyobj
’s member__cuda_array_interface__
and writes it toself._ptr
.object
that implements the Python buffer protocol:If the object represents a simple contiguous array, writes the
Py_buffer
associated withpyobj
toself._py_buffer
, sets theself._py_buffer_acquired
flag toTrue
, and writesself._py_buffer.buf
to the data pointerself._ptr
.-
Takes the pointer address
pyobj._ptr
and writes it toself._ptr
. No ownership is transferred.
Type checks are performed in the above order.
- C Attributes:
- _ptr (C type
void *
, protected): Stores a pointer to the data of the original Python object.
- _is_ptr_owner (C type
bint
, protected): If this wrapper is the owner of the underlying data.
- _py_buffer (C type ``Py_buffer`, protected):
Stores a pointer to the data of the original Python object.
- _py_buffer_acquired (C type
bint
, protected): Stores a pointer to the data of the original Python object.
- _ptr (C type
- static PROPERTIES()#
- __getitem__(key, /)#
Return self[key].
- __init__()#
Constructor.
- as_c_void_p(self)#
Returns the data’s address as
ctypes.c_void_p
Note:Implemented as function to not collide with autogenerated property names.
- createRef(self) Pointer #
Creates are reference to this pointer.
Returns a
Pointer
that stores the address of this `~.Pointer’s data pointer.- Note:
No ownership information is transferred.
- static fromObj(pyobj)#
Creates a ncclComm from a Python object.
Derives a ncclComm from the given Python object
pyobj
. In casepyobj
is itself anncclComm
reference, this method returns it directly. No newncclComm
is created in this case.
- is_ptr_null#
If data pointer is NULL.
- class hip.rccl.ncclUniqueId(*args, **kwargs)#
Bases:
Pointer
Python wrapper for cdef class crccl.ncclUniqueId.
Python wrapper for cdef class crccl.ncclUniqueId.
If this type is initialized via its
__init__
method, it allocates a member of the underlying C type and destroys it again if the wrapper type is deallocated.This type also serves as adapter when appearing as argument type in a function signature. In this case, the type can further be initialized from a number of Python objects:
None
:This will set the
self._ptr
attribute toNULL
.int
:Interprets the integer value as pointer address and writes it to
self._ptr
. No ownership is transferred.-
Takes the pointer address
pyobj.value
and writes it toself._ptr
. No ownership is transferred. object
that implements the CUDA Array Interface protocol:Takes the integer-valued pointer address, i.e. the first entry of the
data
tuple frompyobj
’s member__cuda_array_interface__
and writes it toself._ptr
.object
that implements the Python buffer protocol:If the object represents a simple contiguous array, writes the
Py_buffer
associated withpyobj
toself._py_buffer
, sets theself._py_buffer_acquired
flag toTrue
, and writesself._py_buffer.buf
to the data pointerself._ptr
.-
Takes the pointer address
pyobj._ptr
and writes it toself._ptr
. No ownership is transferred.
Type checks are performed in the above order.
- C Attributes:
- _ptr (C type
void *
, protected): Stores a pointer to the data of the original Python object.
- _is_ptr_owner (C type
bint
, protected): If this wrapper is the owner of the underlying data.
- _py_buffer (C type ``Py_buffer`, protected):
Stores a pointer to the data of the original Python object.
- _py_buffer_acquired (C type
bint
, protected): Stores a pointer to the data of the original Python object.
- _ptr (C type
- static PROPERTIES()#
- __getitem__(key, /)#
Return self[key].
- __init__()#
Constructor type ncclUniqueId.
Constructor for type ncclUniqueId.
- as_c_void_p(self)#
Returns the data’s address as
ctypes.c_void_p
Note:Implemented as function to not collide with autogenerated property names.
- c_sizeof(self)#
Returns the size of the underlying C type in bytes. Note:
Implemented as function to not collide with autogenerated property names.
- createRef(self) Pointer #
Creates are reference to this pointer.
Returns a
Pointer
that stores the address of this `~.Pointer’s data pointer.- Note:
No ownership information is transferred.
- static fromObj(pyobj)#
Creates a ncclUniqueId from a Python object.
Derives a ncclUniqueId from the given Python object
pyobj
. In casepyobj
is itself anncclUniqueId
reference, this method returns it directly. No newncclUniqueId
is created in this case.
- get_internal(self, i)#
Get value of
internal
of(<crccl.ncclUniqueId*>self._ptr)[i]
.
- internal#
Opaque array>
- is_ptr_null#
If data pointer is NULL.
- class hip.rccl.ncclResult_t(value)#
Bases:
_ncclResult_t__Base
Result type
- Attributes:
- ncclSuccess:
No error
- ncclUnhandledCudaError:
Unhandled HIP error
- ncclSystemError:
Unhandled system error
- ncclInternalError:
Internal Error - Please report to RCCL developers
- ncclInvalidArgument:
Invalid argument
- ncclInvalidUsage:
Invalid usage
- ncclRemoteError:
Remote process exited or there was a network error
- ncclInProgress:
RCCL operation in progress
- ncclNumResults:
Number of result types
- ncclSuccess = 0#
- ncclUnhandledCudaError = 1#
- ncclSystemError = 2#
- ncclInternalError = 3#
- ncclInvalidArgument = 4#
- ncclInvalidUsage = 5#
- ncclRemoteError = 6#
- ncclInProgress = 7#
- ncclNumResults = 8#
- static ctypes_type()#
The type of the enum constants as ctypes type.
- class hip.rccl.ncclConfig_v21700(*args, **kwargs)#
Bases:
Pointer
Python wrapper for cdef class crccl.ncclConfig_v21700.
Python wrapper for cdef class crccl.ncclConfig_v21700.
If this type is initialized via its
__init__
method, it allocates a member of the underlying C type and destroys it again if the wrapper type is deallocated.This type also serves as adapter when appearing as argument type in a function signature. In this case, the type can further be initialized from a number of Python objects:
None
:This will set the
self._ptr
attribute toNULL
.int
:Interprets the integer value as pointer address and writes it to
self._ptr
. No ownership is transferred.-
Takes the pointer address
pyobj.value
and writes it toself._ptr
. No ownership is transferred. object
that implements the CUDA Array Interface protocol:Takes the integer-valued pointer address, i.e. the first entry of the
data
tuple frompyobj
’s member__cuda_array_interface__
and writes it toself._ptr
.object
that implements the Python buffer protocol:If the object represents a simple contiguous array, writes the
Py_buffer
associated withpyobj
toself._py_buffer
, sets theself._py_buffer_acquired
flag toTrue
, and writesself._py_buffer.buf
to the data pointerself._ptr
.-
Takes the pointer address
pyobj._ptr
and writes it toself._ptr
. No ownership is transferred.
Type checks are performed in the above order.
- C Attributes:
- _ptr (C type
void *
, protected): Stores a pointer to the data of the original Python object.
- _is_ptr_owner (C type
bint
, protected): If this wrapper is the owner of the underlying data.
- _py_buffer (C type ``Py_buffer`, protected):
Stores a pointer to the data of the original Python object.
- _py_buffer_acquired (C type
bint
, protected): Stores a pointer to the data of the original Python object.
- _ptr (C type
- static PROPERTIES()#
- __getitem__(key, /)#
Return self[key].
- __init__()#
Constructor type ncclConfig_v21700.
Constructor for type ncclConfig_v21700.
- as_c_void_p(self)#
Returns the data’s address as
ctypes.c_void_p
Note:Implemented as function to not collide with autogenerated property names.
- blocking#
Whether or not calls should block or not
- c_sizeof(self)#
Returns the size of the underlying C type in bytes. Note:
Implemented as function to not collide with autogenerated property names.
- cgaClusterSize#
Cooperative group array cluster size
- createRef(self) Pointer #
Creates are reference to this pointer.
Returns a
Pointer
that stores the address of this `~.Pointer’s data pointer.- Note:
No ownership information is transferred.
- static fromObj(pyobj)#
Creates a ncclConfig_v21700 from a Python object.
Derives a ncclConfig_v21700 from the given Python object
pyobj
. In casepyobj
is itself anncclConfig_v21700
reference, this method returns it directly. No newncclConfig_v21700
is created in this case.
- get_blocking(self, i)#
Get value
blocking
of(<crccl.ncclConfig_v21700*>self._ptr)[i]
.
- get_cgaClusterSize(self, i)#
Get value
cgaClusterSize
of(<crccl.ncclConfig_v21700*>self._ptr)[i]
.
- get_magic(self, i)#
Get value
magic
of(<crccl.ncclConfig_v21700*>self._ptr)[i]
.
- get_maxCTAs(self, i)#
Get value
maxCTAs
of(<crccl.ncclConfig_v21700*>self._ptr)[i]
.
- get_minCTAs(self, i)#
Get value
minCTAs
of(<crccl.ncclConfig_v21700*>self._ptr)[i]
.
- get_netName(self, i)#
Get value
netName
of(<crccl.ncclConfig_v21700*>self._ptr)[i]
.
- get_size(self, i)#
Get value
size
of(<crccl.ncclConfig_v21700*>self._ptr)[i]
.
Get value
splitShare
of(<crccl.ncclConfig_v21700*>self._ptr)[i]
.
- get_version(self, i)#
Get value
version
of(<crccl.ncclConfig_v21700*>self._ptr)[i]
.
- is_ptr_null#
If data pointer is NULL.
- magic#
Should not be touched
- maxCTAs#
Maximum number of cooperative thread arrays (blocks)
- minCTAs#
Minimum number of cooperative thread arrays (blocks)
- netName#
Force NCCL to use a specfic network
- set_blocking(self, i, int value)#
Set value
blocking
of(<crccl.ncclConfig_v21700*>self._ptr)[i]
.
- set_cgaClusterSize(self, i, int value)#
Set value
cgaClusterSize
of(<crccl.ncclConfig_v21700*>self._ptr)[i]
.
- set_magic(self, i, unsigned int value)#
Set value
magic
of(<crccl.ncclConfig_v21700*>self._ptr)[i]
.
- set_maxCTAs(self, i, int value)#
Set value
maxCTAs
of(<crccl.ncclConfig_v21700*>self._ptr)[i]
.
- set_minCTAs(self, i, int value)#
Set value
minCTAs
of(<crccl.ncclConfig_v21700*>self._ptr)[i]
.
- set_netName(self, i, const char *value)#
Set value
netName
of(<crccl.ncclConfig_v21700*>self._ptr)[i]
.
- set_size(self, i, unsigned long value)#
Set value
size
of(<crccl.ncclConfig_v21700*>self._ptr)[i]
.
Set value
splitShare
of(<crccl.ncclConfig_v21700*>self._ptr)[i]
.
- set_version(self, i, unsigned int value)#
Set value
version
of(<crccl.ncclConfig_v21700*>self._ptr)[i]
.
- size#
Should not be touched
Allow communicators to share resources
- version#
Should not be touched
- hip.rccl.ncclConfig_t#
alias of
ncclConfig_v21700
- hip.rccl.ncclGetVersion()#
Return the RCCL_VERSION_CODE of RCCL in the supplied integer.
This integer is coded with the MAJOR, MINOR and PATCH level of RCCL.
- Returns:
A
tuple
of size 2 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.int
:Pointer to where version will be stored
- hip.rccl.pncclGetVersion()#
(No short description, might be part of a group.)
- hip.rccl.ncclGetUniqueId(uniqueId)#
Generates an ID for ncclCommInitRank.
Generates an ID to be used in ncclCommInitRank. ncclGetUniqueId should be called once by a single rank and the ID should be distributed to all ranks in the communicator before using it as a parameter for ncclCommInitRank.
- Args:
- uniqueId (
ncclUniqueId
/object
) – OUT: Pointer to where uniqueId will be stored
- uniqueId (
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclGetUniqueId(uniqueId)#
(No short description, might be part of a group.)
- Args:
- uniqueId (
ncclUniqueId
/object
): (undocumented)
- uniqueId (
- hip.rccl.ncclCommInitRankConfig(int nranks, commId, int rank, config)#
Create a new communicator with config.
Create a new communicator (multi thread/process version) with a configuration set by users. See
rccl_config_type
for more details. Each rank is associated to a CUDA device, which has to be set before calling ncclCommInitRank.- Args:
- nranks (
int
) – IN: Total number of ranks participating in this communicator
- commId (
ncclUniqueId
) – IN: UniqueId required for initialization
- rank (
int
) – IN: Current rank to create communicator for. [0 to nranks-1]
- config (
ncclConfig_v21700
/object
) – IN: Pointer to communicator configuration
- nranks (
- Returns:
A
tuple
of size 2 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.ncclComm
:Pointer to created communicator
- hip.rccl.pncclCommInitRankConfig(int nranks, commId, int rank, config)#
(No short description, might be part of a group.)
- Args:
- nranks (
int
): (undocumented)
- commId (
ncclUniqueId
): (undocumented)
- rank (
int
): (undocumented)
- config (
ncclConfig_v21700
/object
): (undocumented)
- nranks (
- Returns:
A
tuple
of size 1 that contains (in that order):- comm (
ncclComm
): (undocumented)
- comm (
- hip.rccl.ncclCommInitRank(int nranks, commId, int rank)#
Creates a new communicator (multi thread/process version).
Rank must be between 0 and nranks-1 and unique within a communicator clique. Each rank is associated to a CUDA device, which has to be set before calling ncclCommInitRank. ncclCommInitRank implicitly syncronizes with other ranks, so it must be called by different threads/processes or use ncclGroupStart/ncclGroupEnd.
- Args:
- nranks (
int
) – IN: Total number of ranks participating in this communicator
- commId (
ncclUniqueId
) – IN: UniqueId required for initialization
- rank (
int
) – IN: Current rank to create communicator for
- nranks (
- Returns:
A
tuple
of size 2 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.ncclComm
:Pointer to created communicator
- hip.rccl.pncclCommInitRank(int nranks, commId, int rank)#
(No short description, might be part of a group.)
- Args:
- nranks (
int
): (undocumented)
- commId (
ncclUniqueId
): (undocumented)
- rank (
int
): (undocumented)
- nranks (
- Returns:
A
tuple
of size 1 that contains (in that order):- comm (
ncclComm
): (undocumented)
- comm (
- hip.rccl.ncclCommInitAll(comm, int ndev, devlist)#
Creates a clique of communicators (single process version).
This is a convenience function to create a single-process communicator clique. Returns an array of ndev newly initialized communicators in comm. comm should be pre-allocated with size at least ndev*sizeof(ncclComm_t). If devlist is NULL, the first ndev HIP devices are used. Order of devlist defines user-order of processors within the communicator.
- Args:
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclCommInitAll(comm, int ndev, devlist)#
(No short description, might be part of a group.)
- hip.rccl.ncclCommFinalize(comm)#
Finalize a communicator.
ncclCommFinalize flushes all issued communications and marks communicator state as ncclInProgress. The state will change to ncclSuccess when the communicator is globally quiescent and related resources are freed; then, calling ncclCommDestroy can locally free the rest of the resources (e.g. communicator itself) without blocking.
- Args:
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclCommFinalize(comm)#
(No short description, might be part of a group.)
- hip.rccl.ncclCommDestroy(comm)#
Frees local resources associated with communicator object.
Destroy all local resources associated with the passed in communicator object
- Args:
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclCommDestroy(comm)#
(No short description, might be part of a group.)
- hip.rccl.ncclCommAbort(comm)#
Abort any in-progress calls and destroy the communicator object.
Frees resources associated with communicator object and aborts any operations that might still be running on the device.
- Args:
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclCommAbort(comm)#
(No short description, might be part of a group.)
- hip.rccl.ncclCommSplit(comm, int color, int key, config)#
Create one or more communicators from an existing one.
Creates one or more communicators from an existing one. Ranks with the same color will end up in the same communicator. Within the new communicator, key will be used to order ranks. NCCL_SPLIT_NOCOLOR as color will indicate the rank will not be part of any group and will therefore return a NULL communicator. If config is NULL, the new communicator will inherit the original communicator’s configuration
- Args:
- Returns:
A
tuple
of size 2 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.ncclComm
:Pointer to new communicator
- hip.rccl.pncclCommSplit(comm, int color, int key, config)#
(No short description, might be part of a group.)
- hip.rccl.ncclGetErrorString(result)#
Returns a string for each result code.
Returns a human-readable string describing the given result code.
- Args:
- result (
ncclResult_t
) – IN: Result code to get description for
- result (
- Returns:
A
tuple
of size 1 that contains (in that order):bytes
: String containing description of result code.
- hip.rccl.pncclGetErrorString(result)#
(No short description, might be part of a group.)
- Args:
- result (
ncclResult_t
): (undocumented)
- result (
- hip.rccl.ncclGetLastError(comm)#
Returns mesage on last result that occured.
Returns a human-readable message of the last error that occurred.
- hip.rccl.pncclGetLastError(comm)#
(No short description, might be part of a group.)
- hip.rccl.ncclCommGetAsyncError(comm, asyncError)#
Checks whether the comm has encountered any asynchronous errors
Query whether the provided communicator has encountered any asynchronous errors
- hip.rccl.pncclCommGetAsyncError(comm, asyncError)#
(No short description, might be part of a group.)
- hip.rccl.ncclCommCount(comm)#
Gets the number of ranks in the communicator clique.
Returns the number of ranks in the communicator clique (as set during initialization)
- Args:
- Returns:
A
tuple
of size 2 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.int
:Pointer to where number of ranks will be stored
- hip.rccl.pncclCommCount(comm)#
(No short description, might be part of a group.)
- hip.rccl.ncclCommCuDevice(comm)#
Get the ROCm device index associated with a communicator
Returns the ROCm device number associated with the provided communicator.
- Args:
- Returns:
A
tuple
of size 2 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.int
:Pointer to where the associated ROCm device index will be stored
- hip.rccl.pncclCommCuDevice(comm)#
(No short description, might be part of a group.)
- hip.rccl.ncclCommUserRank(comm)#
Get the rank associated with a communicator
Returns the user-ordered “rank” associated with the provided communicator.
- Args:
- Returns:
A
tuple
of size 2 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.int
:Pointer to where the associated rank will be stored
- hip.rccl.pncclCommUserRank(comm)#
(No short description, might be part of a group.)
- class hip.rccl.ncclRedOp_dummy_t(value)#
Bases:
_ncclRedOp_dummy_t__Base
Dummy reduction enumeration
- Attributes:
- ncclNumOps_dummy:
(undocumented)
- ncclNumOps_dummy = 5#
- static ctypes_type()#
The type of the enum constants as ctypes type.
- class hip.rccl.ncclRedOp_t(value)#
Bases:
_ncclRedOp_t__Base
Reduction operation selector
- Attributes:
- ncclSum:
Sum
- ncclProd:
Product
- ncclMax:
Max
- ncclMin:
Min
- ncclAvg:
Average
- ncclNumOps:
Number of built-in reduction ops
- ncclMaxRedOp:
Largest value for ncclRedOp_t
- ncclSum = 0#
- ncclProd = 1#
- ncclMax = 2#
- ncclMin = 3#
- ncclAvg = 4#
- ncclNumOps = 5#
- ncclMaxRedOp = 2147483647#
- static ctypes_type()#
The type of the enum constants as ctypes type.
- class hip.rccl.ncclDataType_t(value)#
Bases:
_ncclDataType_t__Base
Data types
- Attributes:
- ncclInt8:
(undocumented)
- ncclChar:
(undocumented)
- ncclUint8:
(undocumented)
- ncclInt32:
(undocumented)
- ncclInt:
(undocumented)
- ncclUint32:
(undocumented)
- ncclInt64:
(undocumented)
- ncclUint64:
(undocumented)
- ncclFloat16:
(undocumented)
- ncclHalf:
(undocumented)
- ncclFloat32:
(undocumented)
- ncclFloat:
(undocumented)
- ncclFloat64:
(undocumented)
- ncclDouble:
(undocumented)
- ncclBfloat16:
(undocumented)
- ncclNumTypes:
(undocumented)
- ncclInt8 = 0#
- ncclChar = 0#
- ncclUint8 = 1#
- ncclInt32 = 2#
- ncclInt = 2#
- ncclUint32 = 3#
- ncclInt64 = 4#
- ncclUint64 = 5#
- ncclFloat16 = 6#
- ncclHalf = 6#
- ncclFloat32 = 7#
- ncclFloat = 7#
- ncclFloat64 = 8#
- ncclDouble = 8#
- ncclBfloat16 = 9#
- ncclNumTypes = 10#
- static ctypes_type()#
The type of the enum constants as ctypes type.
- class hip.rccl.ncclScalarResidence_t(value)#
Bases:
_ncclScalarResidence_t__Base
Location and dereferencing logic for scalar arguments.
- Attributes:
- ncclScalarDevice:
Scalar is in device-visible memory
- ncclScalarHostImmediate:
Scalar is in host-visible memory
- ncclScalarDevice = 0#
- ncclScalarHostImmediate = 1#
- static ctypes_type()#
The type of the enum constants as ctypes type.
- hip.rccl.ncclRedOpCreatePreMulSum(op, scalar, datatype, residence, comm)#
Create a custom pre-multiplier reduction operator
Creates a new reduction operator which pre-multiplies input values by a given scalar locally before reducing them with peer values via summation. For use only with collectives launched against comm and datatype. The
residence* argument indicates how/when the memory pointed to by scalar
will be dereferenced. Upon return, the newly created operator’s handle is stored in op.
- Args:
- op (
Pointer
/object
) – OUT: Pointer to where newly created custom reduction operator is to be stored
- scalar (
Pointer
/object
) – IN: Pointer to scalar value.
- datatype (
ncclDataType_t
) – IN: Scalar value datatype
- residence (
ncclScalarResidence_t
) – IN: Memory type of the scalar value
- comm (
ncclComm
/object
) – IN: Communicator to associate with this custom reduction operator
- op (
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclRedOpCreatePreMulSum(op, scalar, datatype, residence, comm)#
(No short description, might be part of a group.)
- Args:
- op (
Pointer
/object
): (undocumented)
- scalar (
Pointer
/object
): (undocumented)
- datatype (
ncclDataType_t
): (undocumented)
- residence (
ncclScalarResidence_t
): (undocumented)
- comm (
ncclComm
/object
): (undocumented)
- op (
- hip.rccl.ncclRedOpDestroy(op, comm)#
Destroy custom reduction operator
Destroys the reduction operator op. The operator must have been created by ncclRedOpCreatePreMul with the matching communicator comm. An operator may be destroyed as soon as the last RCCL function which is given that operator returns.
- Args:
- op (
ncclRedOp_t
) – IN: Custom reduction operator is to be destroyed
- comm (
ncclComm
/object
) – IN: Communicator associated with this reduction operator
- op (
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclRedOpDestroy(op, comm)#
(No short description, might be part of a group.)
- Args:
- op (
ncclRedOp_t
): (undocumented)
- comm (
ncclComm
/object
): (undocumented)
- op (
- hip.rccl.ncclReduce(sendbuff, recvbuff, unsigned long count, datatype, op, int root, comm, stream)#
Reduce
Reduces data arrays of length count in sendbuff into recvbuff using op operation.
recvbuff* may be NULL on all calls except for root device. root* is the rank (not the HIP device) where data will reside after the operation is complete.
In-place operation will happen if sendbuff == recvbuff.
- Args:
- sendbuff (
Pointer
/object
) – IN: Local device data buffer to be reduced
- recvbuff (
Pointer
/object
) – OUT: Data buffer where result is stored (only for root rank). May be null for other ranks.
- count (
int
) – IN: Number of elements in every send buffer
- datatype (
ncclDataType_t
) – IN: Data buffer element datatype
- op (
ncclRedOp_t
) – IN: Reduction operator type
- root (
int
) – IN: Rank where result data array will be stored
- comm (
ncclComm
/object
) – IN: Communicator group object to execute on
- stream (
ihipStream_t
/object
) – IN: HIP stream to execute collective on
- sendbuff (
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclReduce(sendbuff, recvbuff, unsigned long count, datatype, op, int root, comm, stream)#
(No short description, might be part of a group.)
- Args:
- sendbuff (
Pointer
/object
): (undocumented)
- recvbuff (
Pointer
/object
): (undocumented)
- count (
int
): (undocumented)
- datatype (
ncclDataType_t
): (undocumented)
- op (
ncclRedOp_t
): (undocumented)
- root (
int
): (undocumented)
- comm (
ncclComm
/object
): (undocumented)
- stream (
ihipStream_t
/object
): (undocumented)
- sendbuff (
- hip.rccl.ncclBcast(buff, unsigned long count, datatype, int root, comm, stream)#
(Deprecated) Broadcast (in-place)
Copies count values from root to all other devices. root is the rank (not the CUDA device) where data resides before the operation is started. This operation is implicitly in-place.
- Args:
- buff (
Pointer
/object
) – IN,OUT: Input array on root to be copied to other ranks. Output array for all ranks.
- count (
int
) – IN: Number of elements in data buffer
- datatype (
ncclDataType_t
) – IN: Data buffer element datatype
- root (
int
) – IN: Rank owning buffer to be copied to others
- comm (
ncclComm
/object
) – IN: Communicator group object to execute on
- stream (
ihipStream_t
/object
) – IN: HIP stream to execute collective on
- buff (
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclBcast(buff, unsigned long count, datatype, int root, comm, stream)#
(No short description, might be part of a group.)
- Args:
- buff (
Pointer
/object
): (undocumented)
- count (
int
): (undocumented)
- datatype (
ncclDataType_t
): (undocumented)
- root (
int
): (undocumented)
- comm (
ncclComm
/object
): (undocumented)
- stream (
ihipStream_t
/object
): (undocumented)
- buff (
- hip.rccl.ncclBroadcast(sendbuff, recvbuff, unsigned long count, datatype, int root, comm, stream)#
Broadcast
- Copies count values from sendbuff on root to recvbuff on all devices.
root* is the rank (not the HIP device) where data resides before the operation is started. sendbuff* may be NULL on ranks other than root.
In-place operation will happen if sendbuff == recvbuff.
- Args:
- sendbuff (
Pointer
/object
) – IN: Data array to copy (if root). May be NULL for other ranks
- recvbuff (
Pointer
/object
) – IN: Data array to store received array
- count (
int
) – IN: Number of elements in data buffer
- datatype (
ncclDataType_t
) – IN: Data buffer element datatype
- root (
int
) – IN: Rank of broadcast root
- comm (
ncclComm
/object
) – IN: Communicator group object to execute on
- stream (
ihipStream_t
/object
) – IN: HIP stream to execute collective on
- sendbuff (
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclBroadcast(sendbuff, recvbuff, unsigned long count, datatype, int root, comm, stream)#
(No short description, might be part of a group.)
- hip.rccl.ncclAllReduce(sendbuff, recvbuff, unsigned long count, datatype, op, comm, stream)#
All-Reduce
Reduces data arrays of length count in sendbuff using op operation, and leaves identical copies of result on each recvbuff. In-place operation will happen if sendbuff == recvbuff.
- Args:
- sendbuff (
Pointer
/object
) – IN: Input data array to reduce
- recvbuff (
Pointer
/object
) – OUT: Data array to store reduced result array
- count (
int
) – IN: Number of elements in data buffer
- datatype (
ncclDataType_t
) – IN: Data buffer element datatype
- op (
ncclRedOp_t
) – IN: Reduction operator
- comm (
ncclComm
/object
) – IN: Communicator group object to execute on
- stream (
ihipStream_t
/object
) – IN: HIP stream to execute collective on
- sendbuff (
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclAllReduce(sendbuff, recvbuff, unsigned long count, datatype, op, comm, stream)#
(No short description, might be part of a group.)
- Args:
- sendbuff (
Pointer
/object
): (undocumented)
- recvbuff (
Pointer
/object
): (undocumented)
- count (
int
): (undocumented)
- datatype (
ncclDataType_t
): (undocumented)
- op (
ncclRedOp_t
): (undocumented)
- comm (
ncclComm
/object
): (undocumented)
- stream (
ihipStream_t
/object
): (undocumented)
- sendbuff (
- hip.rccl.ncclReduceScatter(sendbuff, recvbuff, unsigned long recvcount, datatype, op, comm, stream)#
Reduce-Scatter
Reduces data in sendbuff using op operation and leaves reduced result scattered over the devices so that recvbuff on rank i will contain the i-th block of the result. Assumes sendcount is equal to nranks*recvcount, which means that sendbuff should have a size of at least nranks*recvcount elements. In-place operations will happen if recvbuff == sendbuff + rank * recvcount.
- Args:
- sendbuff (
Pointer
/object
) – IN: Input data array to reduce
- recvbuff (
Pointer
/object
) – OUT: Data array to store reduced result subarray
- recvcount (
int
) – IN: Number of elements each rank receives
- datatype (
ncclDataType_t
) – IN: Data buffer element datatype
- op (
ncclRedOp_t
) – IN: Reduction operator
- comm (
ncclComm
/object
) – IN: Communicator group object to execute on
- stream (
ihipStream_t
/object
) – IN: HIP stream to execute collective on
- sendbuff (
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclReduceScatter(sendbuff, recvbuff, unsigned long recvcount, datatype, op, comm, stream)#
(No short description, might be part of a group.)
- Args:
- sendbuff (
Pointer
/object
): (undocumented)
- recvbuff (
Pointer
/object
): (undocumented)
- recvcount (
int
): (undocumented)
- datatype (
ncclDataType_t
): (undocumented)
- op (
ncclRedOp_t
): (undocumented)
- comm (
ncclComm
/object
): (undocumented)
- stream (
ihipStream_t
/object
): (undocumented)
- sendbuff (
- hip.rccl.ncclAllGather(sendbuff, recvbuff, unsigned long sendcount, datatype, comm, stream)#
All-Gather
Each device gathers sendcount values from other GPUs into recvbuff, receiving data from rank i at offset i*sendcount. Assumes recvcount is equal to nranks*sendcount, which means that recvbuff should have a size of at least nranks*sendcount elements. In-place operations will happen if sendbuff == recvbuff + rank * sendcount.
- Args:
- sendbuff (
Pointer
/object
) – IN: Input data array to send
- recvbuff (
Pointer
/object
) – OUT: Data array to store the gathered result
- sendcount (
int
) – IN: Number of elements each rank sends
- datatype (
ncclDataType_t
) – IN: Data buffer element datatype
- comm (
ncclComm
/object
) – IN: Communicator group object to execute on
- stream (
ihipStream_t
/object
) – IN: HIP stream to execute collective on
- sendbuff (
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclAllGather(sendbuff, recvbuff, unsigned long sendcount, datatype, comm, stream)#
(No short description, might be part of a group.)
- Args:
- sendbuff (
Pointer
/object
): (undocumented)
- recvbuff (
Pointer
/object
): (undocumented)
- sendcount (
int
): (undocumented)
- datatype (
ncclDataType_t
): (undocumented)
- comm (
ncclComm
/object
): (undocumented)
- stream (
ihipStream_t
/object
): (undocumented)
- sendbuff (
- hip.rccl.ncclSend(sendbuff, unsigned long count, datatype, int peer, comm, stream)#
Send
Send data from sendbuff to rank peer. Rank peer needs to call ncclRecv with the same datatype and the same count as this rank. This operation is blocking for the GPU. If multiple ncclSend and ncclRecv operations need to progress concurrently to complete, they must be fused within a ncclGroupStart / ncclGroupEnd section.
- Args:
- sendbuff (
Pointer
/object
) – IN: Data array to send
- count (
int
) – IN: Number of elements to send
- datatype (
ncclDataType_t
) – IN: Data buffer element datatype
- peer (
int
) – IN: Peer rank to send to
- comm (
ncclComm
/object
) – IN: Communicator group object to execute on
- stream (
ihipStream_t
/object
) – IN: HIP stream to execute collective on
- sendbuff (
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclSend(sendbuff, unsigned long count, datatype, int peer, comm, stream)#
(No short description, might be part of a group.)
- Args:
- sendbuff (
Pointer
/object
): (undocumented)
- count (
int
): (undocumented)
- datatype (
ncclDataType_t
): (undocumented)
- peer (
int
): (undocumented)
- comm (
ncclComm
/object
): (undocumented)
- stream (
ihipStream_t
/object
): (undocumented)
- sendbuff (
- hip.rccl.ncclRecv(recvbuff, unsigned long count, datatype, int peer, comm, stream)#
Receive
Receive data from rank peer into recvbuff. Rank peer needs to call ncclSend with the same datatype and the same count as this rank. This operation is blocking for the GPU. If multiple ncclSend and ncclRecv operations need to progress concurrently to complete, they must be fused within a ncclGroupStart/ ncclGroupEnd section.
- Args:
- recvbuff (
Pointer
/object
) – OUT: Data array to receive
- count (
int
) – IN: Number of elements to receive
- datatype (
ncclDataType_t
) – IN: Data buffer element datatype
- peer (
int
) – IN: Peer rank to send to
- comm (
ncclComm
/object
) – IN: Communicator group object to execute on
- stream (
ihipStream_t
/object
) – IN: HIP stream to execute collective on
- recvbuff (
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclRecv(recvbuff, unsigned long count, datatype, int peer, comm, stream)#
(No short description, might be part of a group.)
- Args:
- recvbuff (
Pointer
/object
): (undocumented)
- count (
int
): (undocumented)
- datatype (
ncclDataType_t
): (undocumented)
- peer (
int
): (undocumented)
- comm (
ncclComm
/object
): (undocumented)
- stream (
ihipStream_t
/object
): (undocumented)
- recvbuff (
- hip.rccl.ncclGather(sendbuff, recvbuff, unsigned long sendcount, datatype, int root, comm, stream)#
Gather
Root device gathers sendcount values from other GPUs into recvbuff, receiving data from rank i at offset i*sendcount. Assumes recvcount is equal to nranks*sendcount, which means that recvbuff should have a size of at least nranks*sendcount elements. In-place operations will happen if sendbuff == recvbuff + rank * sendcount.
recvbuff* may be NULL on ranks other than root.
- Args:
- sendbuff (
Pointer
/object
) – IN: Data array to send
- recvbuff (
Pointer
/object
) – OUT: Data array to receive into on root.
- sendcount (
int
) – IN: Number of elements to send per rank
- datatype (
ncclDataType_t
) – IN: Data buffer element datatype
- root (
int
) – IN: Rank that receives data from all other ranks
- comm (
ncclComm
/object
) – IN: Communicator group object to execute on
- stream (
ihipStream_t
/object
) – IN: HIP stream to execute collective on
- sendbuff (
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclGather(sendbuff, recvbuff, unsigned long sendcount, datatype, int root, comm, stream)#
(No short description, might be part of a group.)
- hip.rccl.ncclScatter(sendbuff, recvbuff, unsigned long recvcount, datatype, int root, comm, stream)#
Scatter
Scattered over the devices so that recvbuff on rank i will contain the i-th block of the data on root. Assumes sendcount is equal to nranks*recvcount, which means that sendbuff should have a size of at least nranks*recvcount elements. In-place operations will happen if recvbuff == sendbuff + rank * recvcount.
- Args:
- sendbuff (
Pointer
/object
) – IN: Data array to send (on root rank). May be NULL on other ranks.
- recvbuff (
Pointer
/object
) – OUT: Data array to receive partial subarray into
- recvcount (
int
) – IN: Number of elements to receive per rank
- datatype (
ncclDataType_t
) – IN: Data buffer element datatype
- root (
int
) – IN: Rank that scatters data to all other ranks
- comm (
ncclComm
/object
) – IN: Communicator group object to execute on
- stream (
ihipStream_t
/object
) – IN: HIP stream to execute collective on
- sendbuff (
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclScatter(sendbuff, recvbuff, unsigned long recvcount, datatype, int root, comm, stream)#
(No short description, might be part of a group.)
- hip.rccl.ncclAllToAll(sendbuff, recvbuff, unsigned long count, datatype, comm, stream)#
All-To-All
Device (i) send (j)th block of data to device (j) and be placed as (i)th block. Each block for sending/receiving has count elements, which means that recvbuff and sendbuff should have a size of nranks*count elements. In-place operation is NOT supported. It is the user’s responsibility to ensure that sendbuff and recvbuff are distinct.
- Args:
- sendbuff (
Pointer
/object
) – IN: Data array to send (contains blocks for each other rank)
- recvbuff (
Pointer
/object
) – OUT: Data array to receive (contains blocks from each other rank)
- count (
int
) – IN: Number of elements to send between each pair of ranks
- datatype (
ncclDataType_t
) – IN: Data buffer element datatype
- comm (
ncclComm
/object
) – IN: Communicator group object to execute on
- stream (
ihipStream_t
/object
) – IN: HIP stream to execute collective on
- sendbuff (
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclAllToAll(sendbuff, recvbuff, unsigned long count, datatype, comm, stream)#
(No short description, might be part of a group.)
- Args:
- sendbuff (
Pointer
/object
): (undocumented)
- recvbuff (
Pointer
/object
): (undocumented)
- count (
int
): (undocumented)
- datatype (
ncclDataType_t
): (undocumented)
- comm (
ncclComm
/object
): (undocumented)
- stream (
ihipStream_t
/object
): (undocumented)
- sendbuff (
- hip.rccl.ncclAllToAllv(sendbuff, sendcounts, sdispls, recvbuff, recvcounts, rdispls, datatype, comm, stream)#
All-To-Allv
Device (i) sends sendcounts[j] of data from offset sdispls[j] to device (j). At the same time, device (i) receives recvcounts[j] of data from device (j) to be placed at rdispls[j]. sendcounts, sdispls, recvcounts and rdispls are all measured in the units of datatype, not bytes. In-place operation will happen if sendbuff == recvbuff.
- Args:
- sendbuff (
Pointer
/object
) – IN: Data array to send (contains blocks for each other rank)
- sendcounts (
ListOfUnsignedLong
/object
) – IN: Array containing number of elements to send to each participating rank
- sdispls (
ListOfUnsignedLong
/object
) – IN: Array of offsets into sendbuff for each participating rank
- recvbuff (
Pointer
/object
) – OUT: Data array to receive (contains blocks from each other rank)
- recvcounts (
ListOfUnsignedLong
/object
) – IN: Array containing number of elements to receive from each participating rank
- rdispls (
ListOfUnsignedLong
/object
) – IN: Array of offsets into recvbuff for each participating rank
- datatype (
ncclDataType_t
) – IN: Data buffer element datatype
- comm (
ncclComm
/object
) – IN: Communicator group object to execute on
- stream (
ihipStream_t
/object
) – IN: HIP stream to execute collective on
- sendbuff (
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclAllToAllv(sendbuff, sendcounts, sdispls, recvbuff, recvcounts, rdispls, datatype, comm, stream)#
(No short description, might be part of a group.)
- Args:
- sendbuff (
Pointer
/object
): (undocumented)
- sendcounts (
ListOfUnsignedLong
/object
): (undocumented)
- sdispls (
ListOfUnsignedLong
/object
): (undocumented)
- recvbuff (
Pointer
/object
): (undocumented)
- recvcounts (
ListOfUnsignedLong
/object
): (undocumented)
- rdispls (
ListOfUnsignedLong
/object
): (undocumented)
- datatype (
ncclDataType_t
): (undocumented)
- comm (
ncclComm
/object
): (undocumented)
- stream (
ihipStream_t
/object
): (undocumented)
- sendbuff (
- hip.rccl.ncclGroupStart()#
Group Start
Start a group call. All calls to RCCL until ncclGroupEnd will be fused into a single RCCL operation. Nothing will be started on the HIP stream until ncclGroupEnd.
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclGroupStart()#
(No short description, might be part of a group.)
- hip.rccl.ncclGroupEnd()#
Group End
End a group call. Start a fused RCCL operation consisting of all calls since ncclGroupStart. Operations on the HIP stream depending on the RCCL operations need to be called after ncclGroupEnd.
- Returns:
A
tuple
of size 1 that contains (in that order):ncclResult_t
: Result code. Seerccl_result_code
for more details.
- hip.rccl.pncclGroupEnd()#
(No short description, might be part of a group.)