hip.rccl#

(No short description)

Attributes:
NCCL_MAJOR (int):

Macro constant.

NCCL_MINOR (int):

Macro constant.

NCCL_PATCH (int):

Macro constant.

NCCL_SUFFIX (bytes):

Macro constant.

NCCL_VERSION_CODE (int):

Macro constant.

RCCL_BFLOAT16 (int):

Macro constant.

RCCL_GATHER_SCATTER (int):

Macro constant.

RCCL_ALLTOALLV (int):

Macro constant.

NCCL_UNIQUE_ID_BYTES (int):

Macro constant.

ncclComm_t:

alias of ncclComm

ncclConfig_t:

alias of ncclConfig_v21700

class hip.rccl.ncclComm#

Bases: Pointer

Python wrapper for cdef class crccl.ncclComm.

Python wrapper for cdef class crccl.ncclComm.

If this type is initialized via its __init__ method, it allocates a member of the underlying C type and destroys it again if the wrapper type is deallocated.

This type also serves as adapter when appearing as argument type in a function signature. In this case, the type can further be initialized from a number of Python objects:

  • None:

    This will set the self._ptr attribute to NULL.

  • int:

    Interprets the integer value as pointer address and writes it to self._ptr. No ownership is transferred.

  • ctypes.c_void_p:

    Takes the pointer address pyobj.value and writes it to self._ptr. No ownership is transferred.

  • object that implements the CUDA Array Interface protocol:

    Takes the integer-valued pointer address, i.e. the first entry of the data tuple from pyobj’s member __cuda_array_interface__ and writes it to self._ptr.

  • object that implements the Python buffer protocol:

    If the object represents a simple contiguous array, writes the Py_buffer associated with pyobj to self._py_buffer, sets the self._py_buffer_acquired flag to True, and writes self._py_buffer.buf to the data pointer self._ptr.

  • hip._util.types.Pointer:

    Takes the pointer address pyobj._ptr and writes it to self._ptr. No ownership is transferred.

Type checks are performed in the above order.

C Attributes:
_ptr (C type void *, protected):

Stores a pointer to the data of the original Python object.

_is_ptr_owner (C type bint, protected):

If this wrapper is the owner of the underlying data.

_py_buffer (C type ``Py_buffer`, protected):

Stores a pointer to the data of the original Python object.

_py_buffer_acquired (C type bint, protected):

Stores a pointer to the data of the original Python object.

static PROPERTIES()#
__getitem__(key, /)#

Return self[key].

__init__()#

Constructor.

Args:
pyobj (object):

See the class description Pointer for information about accepted types for pyobj. Defaults to None.

Raises:

TypeError: If the input object pyobj is not of the right type.

as_c_void_p(self)#

Returns the data’s address as ctypes.c_void_p Note:

Implemented as function to not collide with autogenerated property names.

createRef(self) Pointer#

Creates are reference to this pointer.

Returns a Pointer that stores the address of this `~.Pointer’s data pointer.

Note:

No ownership information is transferred.

static fromObj(pyobj)#

Creates a ncclComm from a Python object.

Derives a ncclComm from the given Python object pyobj. In case pyobj is itself an ncclComm reference, this method returns it directly. No new ncclComm is created in this case.

is_ptr_null#

If data pointer is NULL.

hip.rccl.ncclComm_t#

alias of ncclComm

class hip.rccl.ncclUniqueId(*args, **kwargs)#

Bases: Pointer

Python wrapper for cdef class crccl.ncclUniqueId.

Python wrapper for cdef class crccl.ncclUniqueId.

If this type is initialized via its __init__ method, it allocates a member of the underlying C type and destroys it again if the wrapper type is deallocated.

This type also serves as adapter when appearing as argument type in a function signature. In this case, the type can further be initialized from a number of Python objects:

  • None:

    This will set the self._ptr attribute to NULL.

  • int:

    Interprets the integer value as pointer address and writes it to self._ptr. No ownership is transferred.

  • ctypes.c_void_p:

    Takes the pointer address pyobj.value and writes it to self._ptr. No ownership is transferred.

  • object that implements the CUDA Array Interface protocol:

    Takes the integer-valued pointer address, i.e. the first entry of the data tuple from pyobj’s member __cuda_array_interface__ and writes it to self._ptr.

  • object that implements the Python buffer protocol:

    If the object represents a simple contiguous array, writes the Py_buffer associated with pyobj to self._py_buffer, sets the self._py_buffer_acquired flag to True, and writes self._py_buffer.buf to the data pointer self._ptr.

  • hip._util.types.Pointer:

    Takes the pointer address pyobj._ptr and writes it to self._ptr. No ownership is transferred.

Type checks are performed in the above order.

C Attributes:
_ptr (C type void *, protected):

Stores a pointer to the data of the original Python object.

_is_ptr_owner (C type bint, protected):

If this wrapper is the owner of the underlying data.

_py_buffer (C type ``Py_buffer`, protected):

Stores a pointer to the data of the original Python object.

_py_buffer_acquired (C type bint, protected):

Stores a pointer to the data of the original Python object.

static PROPERTIES()#
__getitem__(key, /)#

Return self[key].

__init__()#

Constructor type ncclUniqueId.

Constructor for type ncclUniqueId.

Args:
*args:

Positional arguments. Initialize all or a subset of the member variables according to their order of declaration.

**kwargs:

Can be used to initialize member variables at construction, Just pass an argument expression of the form <member>=<value> per member that you want to initialize.

as_c_void_p(self)#

Returns the data’s address as ctypes.c_void_p Note:

Implemented as function to not collide with autogenerated property names.

c_sizeof(self)#

Returns the size of the underlying C type in bytes. Note:

Implemented as function to not collide with autogenerated property names.

createRef(self) Pointer#

Creates are reference to this pointer.

Returns a Pointer that stores the address of this `~.Pointer’s data pointer.

Note:

No ownership information is transferred.

static fromObj(pyobj)#

Creates a ncclUniqueId from a Python object.

Derives a ncclUniqueId from the given Python object pyobj. In case pyobj is itself an ncclUniqueId reference, this method returns it directly. No new ncclUniqueId is created in this case.

get_internal(self, i)#

Get value of internal of (<crccl.ncclUniqueId*>self._ptr)[i].

internal#

Opaque array>

is_ptr_null#

If data pointer is NULL.

class hip.rccl.ncclResult_t(value)#

Bases: _ncclResult_t__Base

Result type

Attributes:
ncclSuccess:

No error

ncclUnhandledCudaError:

Unhandled HIP error

ncclSystemError:

Unhandled system error

ncclInternalError:

Internal Error - Please report to RCCL developers

ncclInvalidArgument:

Invalid argument

ncclInvalidUsage:

Invalid usage

ncclRemoteError:

Remote process exited or there was a network error

ncclInProgress:

RCCL operation in progress

ncclNumResults:

Number of result types

ncclSuccess = 0#
ncclUnhandledCudaError = 1#
ncclSystemError = 2#
ncclInternalError = 3#
ncclInvalidArgument = 4#
ncclInvalidUsage = 5#
ncclRemoteError = 6#
ncclInProgress = 7#
ncclNumResults = 8#
static ctypes_type()#

The type of the enum constants as ctypes type.

class hip.rccl.ncclConfig_v21700(*args, **kwargs)#

Bases: Pointer

Python wrapper for cdef class crccl.ncclConfig_v21700.

Python wrapper for cdef class crccl.ncclConfig_v21700.

If this type is initialized via its __init__ method, it allocates a member of the underlying C type and destroys it again if the wrapper type is deallocated.

This type also serves as adapter when appearing as argument type in a function signature. In this case, the type can further be initialized from a number of Python objects:

  • None:

    This will set the self._ptr attribute to NULL.

  • int:

    Interprets the integer value as pointer address and writes it to self._ptr. No ownership is transferred.

  • ctypes.c_void_p:

    Takes the pointer address pyobj.value and writes it to self._ptr. No ownership is transferred.

  • object that implements the CUDA Array Interface protocol:

    Takes the integer-valued pointer address, i.e. the first entry of the data tuple from pyobj’s member __cuda_array_interface__ and writes it to self._ptr.

  • object that implements the Python buffer protocol:

    If the object represents a simple contiguous array, writes the Py_buffer associated with pyobj to self._py_buffer, sets the self._py_buffer_acquired flag to True, and writes self._py_buffer.buf to the data pointer self._ptr.

  • hip._util.types.Pointer:

    Takes the pointer address pyobj._ptr and writes it to self._ptr. No ownership is transferred.

Type checks are performed in the above order.

C Attributes:
_ptr (C type void *, protected):

Stores a pointer to the data of the original Python object.

_is_ptr_owner (C type bint, protected):

If this wrapper is the owner of the underlying data.

_py_buffer (C type ``Py_buffer`, protected):

Stores a pointer to the data of the original Python object.

_py_buffer_acquired (C type bint, protected):

Stores a pointer to the data of the original Python object.

static PROPERTIES()#
__getitem__(key, /)#

Return self[key].

__init__()#

Constructor type ncclConfig_v21700.

Constructor for type ncclConfig_v21700.

Args:
*args:

Positional arguments. Initialize all or a subset of the member variables according to their order of declaration.

**kwargs:

Can be used to initialize member variables at construction, Just pass an argument expression of the form <member>=<value> per member that you want to initialize.

as_c_void_p(self)#

Returns the data’s address as ctypes.c_void_p Note:

Implemented as function to not collide with autogenerated property names.

blocking#

Whether or not calls should block or not

c_sizeof(self)#

Returns the size of the underlying C type in bytes. Note:

Implemented as function to not collide with autogenerated property names.

cgaClusterSize#

Cooperative group array cluster size

createRef(self) Pointer#

Creates are reference to this pointer.

Returns a Pointer that stores the address of this `~.Pointer’s data pointer.

Note:

No ownership information is transferred.

static fromObj(pyobj)#

Creates a ncclConfig_v21700 from a Python object.

Derives a ncclConfig_v21700 from the given Python object pyobj. In case pyobj is itself an ncclConfig_v21700 reference, this method returns it directly. No new ncclConfig_v21700 is created in this case.

get_blocking(self, i)#

Get value blocking of (<crccl.ncclConfig_v21700*>self._ptr)[i].

get_cgaClusterSize(self, i)#

Get value cgaClusterSize of (<crccl.ncclConfig_v21700*>self._ptr)[i].

get_magic(self, i)#

Get value magic of (<crccl.ncclConfig_v21700*>self._ptr)[i].

get_maxCTAs(self, i)#

Get value maxCTAs of (<crccl.ncclConfig_v21700*>self._ptr)[i].

get_minCTAs(self, i)#

Get value minCTAs of (<crccl.ncclConfig_v21700*>self._ptr)[i].

get_netName(self, i)#

Get value netName of (<crccl.ncclConfig_v21700*>self._ptr)[i].

get_size(self, i)#

Get value size of (<crccl.ncclConfig_v21700*>self._ptr)[i].

get_splitShare(self, i)#

Get value splitShare of (<crccl.ncclConfig_v21700*>self._ptr)[i].

get_version(self, i)#

Get value version of (<crccl.ncclConfig_v21700*>self._ptr)[i].

is_ptr_null#

If data pointer is NULL.

magic#

Should not be touched

maxCTAs#

Maximum number of cooperative thread arrays (blocks)

minCTAs#

Minimum number of cooperative thread arrays (blocks)

netName#

Force NCCL to use a specfic network

set_blocking(self, i, int value)#

Set value blocking of (<crccl.ncclConfig_v21700*>self._ptr)[i].

set_cgaClusterSize(self, i, int value)#

Set value cgaClusterSize of (<crccl.ncclConfig_v21700*>self._ptr)[i].

set_magic(self, i, unsigned int value)#

Set value magic of (<crccl.ncclConfig_v21700*>self._ptr)[i].

set_maxCTAs(self, i, int value)#

Set value maxCTAs of (<crccl.ncclConfig_v21700*>self._ptr)[i].

set_minCTAs(self, i, int value)#

Set value minCTAs of (<crccl.ncclConfig_v21700*>self._ptr)[i].

set_netName(self, i, const char *value)#

Set value netName of (<crccl.ncclConfig_v21700*>self._ptr)[i].

set_size(self, i, unsigned long value)#

Set value size of (<crccl.ncclConfig_v21700*>self._ptr)[i].

set_splitShare(self, i, int value)#

Set value splitShare of (<crccl.ncclConfig_v21700*>self._ptr)[i].

set_version(self, i, unsigned int value)#

Set value version of (<crccl.ncclConfig_v21700*>self._ptr)[i].

size#

Should not be touched

splitShare#

Allow communicators to share resources

version#

Should not be touched

hip.rccl.ncclConfig_t#

alias of ncclConfig_v21700

hip.rccl.ncclGetVersion()#

Return the RCCL_VERSION_CODE of RCCL in the supplied integer.

This integer is coded with the MAJOR, MINOR and PATCH level of RCCL.

Returns:

A tuple of size 2 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

  • int:

    Pointer to where version will be stored

hip.rccl.pncclGetVersion()#

(No short description, might be part of a group.)

Returns:

A tuple of size 1 that contains (in that order):

  • version (int):

    (undocumented)

hip.rccl.ncclGetUniqueId(uniqueId)#

Generates an ID for ncclCommInitRank.

Generates an ID to be used in ncclCommInitRank. ncclGetUniqueId should be called once by a single rank and the ID should be distributed to all ranks in the communicator before using it as a parameter for ncclCommInitRank.

Args:
uniqueId (ncclUniqueId/object) – OUT:

Pointer to where uniqueId will be stored

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclGetUniqueId(uniqueId)#

(No short description, might be part of a group.)

Args:
uniqueId (ncclUniqueId/object):

(undocumented)

hip.rccl.ncclCommInitRankConfig(int nranks, commId, int rank, config)#

Create a new communicator with config.

Create a new communicator (multi thread/process version) with a configuration set by users. See rccl_config_type for more details. Each rank is associated to a CUDA device, which has to be set before calling ncclCommInitRank.

Args:
nranks (int) – IN:

Total number of ranks participating in this communicator

commId (ncclUniqueId) – IN:

UniqueId required for initialization

rank (int) – IN:

Current rank to create communicator for. [0 to nranks-1]

config (ncclConfig_v21700/object) – IN:

Pointer to communicator configuration

Returns:

A tuple of size 2 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

  • ncclComm:

    Pointer to created communicator

hip.rccl.pncclCommInitRankConfig(int nranks, commId, int rank, config)#

(No short description, might be part of a group.)

Args:
nranks (int):

(undocumented)

commId (ncclUniqueId):

(undocumented)

rank (int):

(undocumented)

config (ncclConfig_v21700/object):

(undocumented)

Returns:

A tuple of size 1 that contains (in that order):

hip.rccl.ncclCommInitRank(int nranks, commId, int rank)#

Creates a new communicator (multi thread/process version).

Rank must be between 0 and nranks-1 and unique within a communicator clique. Each rank is associated to a CUDA device, which has to be set before calling ncclCommInitRank. ncclCommInitRank implicitly syncronizes with other ranks, so it must be called by different threads/processes or use ncclGroupStart/ncclGroupEnd.

Args:
nranks (int) – IN:

Total number of ranks participating in this communicator

commId (ncclUniqueId) – IN:

UniqueId required for initialization

rank (int) – IN:

Current rank to create communicator for

Returns:

A tuple of size 2 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

  • ncclComm:

    Pointer to created communicator

hip.rccl.pncclCommInitRank(int nranks, commId, int rank)#

(No short description, might be part of a group.)

Args:
nranks (int):

(undocumented)

commId (ncclUniqueId):

(undocumented)

rank (int):

(undocumented)

Returns:

A tuple of size 1 that contains (in that order):

hip.rccl.ncclCommInitAll(comm, int ndev, devlist)#

Creates a clique of communicators (single process version).

This is a convenience function to create a single-process communicator clique. Returns an array of ndev newly initialized communicators in comm. comm should be pre-allocated with size at least ndev*sizeof(ncclComm_t). If devlist is NULL, the first ndev HIP devices are used. Order of devlist defines user-order of processors within the communicator.

Args:
comm (Pointer/object) – OUT:

Pointer to array of created communicators

ndev (int) – IN:

Total number of ranks participating in this communicator

devlist (ListOfInt/object) – IN:

Array of GPU device indices to create for

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclCommInitAll(comm, int ndev, devlist)#

(No short description, might be part of a group.)

Args:
comm (Pointer/object):

(undocumented)

ndev (int):

(undocumented)

devlist (ListOfInt/object):

(undocumented)

hip.rccl.ncclCommFinalize(comm)#

Finalize a communicator.

ncclCommFinalize flushes all issued communications and marks communicator state as ncclInProgress. The state will change to ncclSuccess when the communicator is globally quiescent and related resources are freed; then, calling ncclCommDestroy can locally free the rest of the resources (e.g. communicator itself) without blocking.

Args:
comm (ncclComm/object) – IN:

Communicator to finalize

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclCommFinalize(comm)#

(No short description, might be part of a group.)

Args:
comm (ncclComm/object):

(undocumented)

hip.rccl.ncclCommDestroy(comm)#

Frees local resources associated with communicator object.

Destroy all local resources associated with the passed in communicator object

Args:
comm (ncclComm/object) – IN:

Communicator to destroy

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclCommDestroy(comm)#

(No short description, might be part of a group.)

Args:
comm (ncclComm/object):

(undocumented)

hip.rccl.ncclCommAbort(comm)#

Abort any in-progress calls and destroy the communicator object.

Frees resources associated with communicator object and aborts any operations that might still be running on the device.

Args:
comm (ncclComm/object) – IN:

Communicator to abort and destroy

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclCommAbort(comm)#

(No short description, might be part of a group.)

Args:
comm (ncclComm/object):

(undocumented)

hip.rccl.ncclCommSplit(comm, int color, int key, config)#

Create one or more communicators from an existing one.

Creates one or more communicators from an existing one. Ranks with the same color will end up in the same communicator. Within the new communicator, key will be used to order ranks. NCCL_SPLIT_NOCOLOR as color will indicate the rank will not be part of any group and will therefore return a NULL communicator. If config is NULL, the new communicator will inherit the original communicator’s configuration

Args:
comm (ncclComm/object) – IN:

Original communicator object for this rank

color (int) – IN:

Color to assign this rank

key (int) – IN:

Key used to order ranks within the same new communicator

config (ncclConfig_v21700/object) – IN:

Config file for new communicator. May be NULL to inherit from comm

Returns:

A tuple of size 2 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

  • ncclComm:

    Pointer to new communicator

hip.rccl.pncclCommSplit(comm, int color, int key, config)#

(No short description, might be part of a group.)

Args:
comm (ncclComm/object):

(undocumented)

color (int):

(undocumented)

key (int):

(undocumented)

config (ncclConfig_v21700/object):

(undocumented)

Returns:

A tuple of size 1 that contains (in that order):

hip.rccl.ncclGetErrorString(result)#

Returns a string for each result code.

Returns a human-readable string describing the given result code.

Args:
result (ncclResult_t) – IN:

Result code to get description for

Returns:

A tuple of size 1 that contains (in that order):

  • bytes: String containing description of result code.

hip.rccl.pncclGetErrorString(result)#

(No short description, might be part of a group.)

Args:
result (ncclResult_t):

(undocumented)

hip.rccl.ncclGetLastError(comm)#

Returns mesage on last result that occured.

Returns a human-readable message of the last error that occurred.

Args:
comm (ncclComm/object) – IN:

is currently unused and can be set to NULL

Returns:

A tuple of size 1 that contains (in that order):

  • bytes: String containing the last result

hip.rccl.pncclGetLastError(comm)#

(No short description, might be part of a group.)

Args:
comm (ncclComm/object):

(undocumented)

hip.rccl.ncclCommGetAsyncError(comm, asyncError)#

Checks whether the comm has encountered any asynchronous errors

Query whether the provided communicator has encountered any asynchronous errors

Args:
comm (ncclComm/object) – IN:

Communicator to query

asyncError (Pointer/object) – OUT:

Pointer to where result code will be stored

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclCommGetAsyncError(comm, asyncError)#

(No short description, might be part of a group.)

Args:
comm (ncclComm/object):

(undocumented)

asyncError (Pointer/object):

(undocumented)

hip.rccl.ncclCommCount(comm)#

Gets the number of ranks in the communicator clique.

Returns the number of ranks in the communicator clique (as set during initialization)

Args:
comm (ncclComm/object) – IN:

Communicator to query

Returns:

A tuple of size 2 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

  • int:

    Pointer to where number of ranks will be stored

hip.rccl.pncclCommCount(comm)#

(No short description, might be part of a group.)

Args:
comm (ncclComm/object):

(undocumented)

Returns:

A tuple of size 1 that contains (in that order):

  • count (int):

    (undocumented)

hip.rccl.ncclCommCuDevice(comm)#

Get the ROCm device index associated with a communicator

Returns the ROCm device number associated with the provided communicator.

Args:
comm (ncclComm/object) – IN:

Communicator to query

Returns:

A tuple of size 2 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

  • int:

    Pointer to where the associated ROCm device index will be stored

hip.rccl.pncclCommCuDevice(comm)#

(No short description, might be part of a group.)

Args:
comm (ncclComm/object):

(undocumented)

Returns:

A tuple of size 1 that contains (in that order):

  • device (int):

    (undocumented)

hip.rccl.ncclCommUserRank(comm)#

Get the rank associated with a communicator

Returns the user-ordered “rank” associated with the provided communicator.

Args:
comm (ncclComm/object) – IN:

Communicator to query

Returns:

A tuple of size 2 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

  • int:

    Pointer to where the associated rank will be stored

hip.rccl.pncclCommUserRank(comm)#

(No short description, might be part of a group.)

Args:
comm (ncclComm/object):

(undocumented)

Returns:

A tuple of size 1 that contains (in that order):

  • rank (int):

    (undocumented)

class hip.rccl.ncclRedOp_dummy_t(value)#

Bases: _ncclRedOp_dummy_t__Base

Dummy reduction enumeration

Attributes:
ncclNumOps_dummy:

(undocumented)

ncclNumOps_dummy = 5#
static ctypes_type()#

The type of the enum constants as ctypes type.

class hip.rccl.ncclRedOp_t(value)#

Bases: _ncclRedOp_t__Base

Reduction operation selector

Attributes:
ncclSum:

Sum

ncclProd:

Product

ncclMax:

Max

ncclMin:

Min

ncclAvg:

Average

ncclNumOps:

Number of built-in reduction ops

ncclMaxRedOp:

Largest value for ncclRedOp_t

ncclSum = 0#
ncclProd = 1#
ncclMax = 2#
ncclMin = 3#
ncclAvg = 4#
ncclNumOps = 5#
ncclMaxRedOp = 2147483647#
static ctypes_type()#

The type of the enum constants as ctypes type.

class hip.rccl.ncclDataType_t(value)#

Bases: _ncclDataType_t__Base

Data types

Attributes:
ncclInt8:

(undocumented)

ncclChar:

(undocumented)

ncclUint8:

(undocumented)

ncclInt32:

(undocumented)

ncclInt:

(undocumented)

ncclUint32:

(undocumented)

ncclInt64:

(undocumented)

ncclUint64:

(undocumented)

ncclFloat16:

(undocumented)

ncclHalf:

(undocumented)

ncclFloat32:

(undocumented)

ncclFloat:

(undocumented)

ncclFloat64:

(undocumented)

ncclDouble:

(undocumented)

ncclBfloat16:

(undocumented)

ncclNumTypes:

(undocumented)

ncclInt8 = 0#
ncclChar = 0#
ncclUint8 = 1#
ncclInt32 = 2#
ncclInt = 2#
ncclUint32 = 3#
ncclInt64 = 4#
ncclUint64 = 5#
ncclFloat16 = 6#
ncclHalf = 6#
ncclFloat32 = 7#
ncclFloat = 7#
ncclFloat64 = 8#
ncclDouble = 8#
ncclBfloat16 = 9#
ncclNumTypes = 10#
static ctypes_type()#

The type of the enum constants as ctypes type.

class hip.rccl.ncclScalarResidence_t(value)#

Bases: _ncclScalarResidence_t__Base

Location and dereferencing logic for scalar arguments.

Attributes:
ncclScalarDevice:

Scalar is in device-visible memory

ncclScalarHostImmediate:

Scalar is in host-visible memory

ncclScalarDevice = 0#
ncclScalarHostImmediate = 1#
static ctypes_type()#

The type of the enum constants as ctypes type.

hip.rccl.ncclRedOpCreatePreMulSum(op, scalar, datatype, residence, comm)#

Create a custom pre-multiplier reduction operator

Creates a new reduction operator which pre-multiplies input values by a given scalar locally before reducing them with peer values via summation. For use only with collectives launched against comm and datatype. The

residence* argument indicates how/when the memory pointed to by scalar

will be dereferenced. Upon return, the newly created operator’s handle is stored in op.

Args:
op (Pointer/object) – OUT:

Pointer to where newly created custom reduction operator is to be stored

scalar (Pointer/object) – IN:

Pointer to scalar value.

datatype (ncclDataType_t) – IN:

Scalar value datatype

residence (ncclScalarResidence_t) – IN:

Memory type of the scalar value

comm (ncclComm/object) – IN:

Communicator to associate with this custom reduction operator

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclRedOpCreatePreMulSum(op, scalar, datatype, residence, comm)#

(No short description, might be part of a group.)

Args:
op (Pointer/object):

(undocumented)

scalar (Pointer/object):

(undocumented)

datatype (ncclDataType_t):

(undocumented)

residence (ncclScalarResidence_t):

(undocumented)

comm (ncclComm/object):

(undocumented)

hip.rccl.ncclRedOpDestroy(op, comm)#

Destroy custom reduction operator

Destroys the reduction operator op. The operator must have been created by ncclRedOpCreatePreMul with the matching communicator comm. An operator may be destroyed as soon as the last RCCL function which is given that operator returns.

Args:
op (ncclRedOp_t) – IN:

Custom reduction operator is to be destroyed

comm (ncclComm/object) – IN:

Communicator associated with this reduction operator

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclRedOpDestroy(op, comm)#

(No short description, might be part of a group.)

Args:
op (ncclRedOp_t):

(undocumented)

comm (ncclComm/object):

(undocumented)

hip.rccl.ncclReduce(sendbuff, recvbuff, unsigned long count, datatype, op, int root, comm, stream)#

Reduce

Reduces data arrays of length count in sendbuff into recvbuff using op operation.

recvbuff* may be NULL on all calls except for root device. root* is the rank (not the HIP device) where data will reside after the operation is complete.

In-place operation will happen if sendbuff == recvbuff.

Args:
sendbuff (Pointer/object) – IN:

Local device data buffer to be reduced

recvbuff (Pointer/object) – OUT:

Data buffer where result is stored (only for root rank). May be null for other ranks.

count (int) – IN:

Number of elements in every send buffer

datatype (ncclDataType_t) – IN:

Data buffer element datatype

op (ncclRedOp_t) – IN:

Reduction operator type

root (int) – IN:

Rank where result data array will be stored

comm (ncclComm/object) – IN:

Communicator group object to execute on

stream (ihipStream_t/object) – IN:

HIP stream to execute collective on

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclReduce(sendbuff, recvbuff, unsigned long count, datatype, op, int root, comm, stream)#

(No short description, might be part of a group.)

Args:
sendbuff (Pointer/object):

(undocumented)

recvbuff (Pointer/object):

(undocumented)

count (int):

(undocumented)

datatype (ncclDataType_t):

(undocumented)

op (ncclRedOp_t):

(undocumented)

root (int):

(undocumented)

comm (ncclComm/object):

(undocumented)

stream (ihipStream_t/object):

(undocumented)

hip.rccl.ncclBcast(buff, unsigned long count, datatype, int root, comm, stream)#

(Deprecated) Broadcast (in-place)

Copies count values from root to all other devices. root is the rank (not the CUDA device) where data resides before the operation is started. This operation is implicitly in-place.

Args:
buff (Pointer/object) – IN,OUT:

Input array on root to be copied to other ranks. Output array for all ranks.

count (int) – IN:

Number of elements in data buffer

datatype (ncclDataType_t) – IN:

Data buffer element datatype

root (int) – IN:

Rank owning buffer to be copied to others

comm (ncclComm/object) – IN:

Communicator group object to execute on

stream (ihipStream_t/object) – IN:

HIP stream to execute collective on

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclBcast(buff, unsigned long count, datatype, int root, comm, stream)#

(No short description, might be part of a group.)

Args:
buff (Pointer/object):

(undocumented)

count (int):

(undocumented)

datatype (ncclDataType_t):

(undocumented)

root (int):

(undocumented)

comm (ncclComm/object):

(undocumented)

stream (ihipStream_t/object):

(undocumented)

hip.rccl.ncclBroadcast(sendbuff, recvbuff, unsigned long count, datatype, int root, comm, stream)#

Broadcast

Copies count values from sendbuff on root to recvbuff on all devices.

root* is the rank (not the HIP device) where data resides before the operation is started. sendbuff* may be NULL on ranks other than root.

In-place operation will happen if sendbuff == recvbuff.

Args:
sendbuff (Pointer/object) – IN:

Data array to copy (if root). May be NULL for other ranks

recvbuff (Pointer/object) – IN:

Data array to store received array

count (int) – IN:

Number of elements in data buffer

datatype (ncclDataType_t) – IN:

Data buffer element datatype

root (int) – IN:

Rank of broadcast root

comm (ncclComm/object) – IN:

Communicator group object to execute on

stream (ihipStream_t/object) – IN:

HIP stream to execute collective on

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclBroadcast(sendbuff, recvbuff, unsigned long count, datatype, int root, comm, stream)#

(No short description, might be part of a group.)

Args:
sendbuff (Pointer/object):

(undocumented)

recvbuff (Pointer/object):

(undocumented)

count (int):

(undocumented)

datatype (ncclDataType_t):

(undocumented)

root (int):

(undocumented)

comm (ncclComm/object):

(undocumented)

stream (ihipStream_t/object):

(undocumented)

hip.rccl.ncclAllReduce(sendbuff, recvbuff, unsigned long count, datatype, op, comm, stream)#

All-Reduce

Reduces data arrays of length count in sendbuff using op operation, and leaves identical copies of result on each recvbuff. In-place operation will happen if sendbuff == recvbuff.

Args:
sendbuff (Pointer/object) – IN:

Input data array to reduce

recvbuff (Pointer/object) – OUT:

Data array to store reduced result array

count (int) – IN:

Number of elements in data buffer

datatype (ncclDataType_t) – IN:

Data buffer element datatype

op (ncclRedOp_t) – IN:

Reduction operator

comm (ncclComm/object) – IN:

Communicator group object to execute on

stream (ihipStream_t/object) – IN:

HIP stream to execute collective on

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclAllReduce(sendbuff, recvbuff, unsigned long count, datatype, op, comm, stream)#

(No short description, might be part of a group.)

Args:
sendbuff (Pointer/object):

(undocumented)

recvbuff (Pointer/object):

(undocumented)

count (int):

(undocumented)

datatype (ncclDataType_t):

(undocumented)

op (ncclRedOp_t):

(undocumented)

comm (ncclComm/object):

(undocumented)

stream (ihipStream_t/object):

(undocumented)

hip.rccl.ncclReduceScatter(sendbuff, recvbuff, unsigned long recvcount, datatype, op, comm, stream)#

Reduce-Scatter

Reduces data in sendbuff using op operation and leaves reduced result scattered over the devices so that recvbuff on rank i will contain the i-th block of the result. Assumes sendcount is equal to nranks*recvcount, which means that sendbuff should have a size of at least nranks*recvcount elements. In-place operations will happen if recvbuff == sendbuff + rank * recvcount.

Args:
sendbuff (Pointer/object) – IN:

Input data array to reduce

recvbuff (Pointer/object) – OUT:

Data array to store reduced result subarray

recvcount (int) – IN:

Number of elements each rank receives

datatype (ncclDataType_t) – IN:

Data buffer element datatype

op (ncclRedOp_t) – IN:

Reduction operator

comm (ncclComm/object) – IN:

Communicator group object to execute on

stream (ihipStream_t/object) – IN:

HIP stream to execute collective on

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclReduceScatter(sendbuff, recvbuff, unsigned long recvcount, datatype, op, comm, stream)#

(No short description, might be part of a group.)

Args:
sendbuff (Pointer/object):

(undocumented)

recvbuff (Pointer/object):

(undocumented)

recvcount (int):

(undocumented)

datatype (ncclDataType_t):

(undocumented)

op (ncclRedOp_t):

(undocumented)

comm (ncclComm/object):

(undocumented)

stream (ihipStream_t/object):

(undocumented)

hip.rccl.ncclAllGather(sendbuff, recvbuff, unsigned long sendcount, datatype, comm, stream)#

All-Gather

Each device gathers sendcount values from other GPUs into recvbuff, receiving data from rank i at offset i*sendcount. Assumes recvcount is equal to nranks*sendcount, which means that recvbuff should have a size of at least nranks*sendcount elements. In-place operations will happen if sendbuff == recvbuff + rank * sendcount.

Args:
sendbuff (Pointer/object) – IN:

Input data array to send

recvbuff (Pointer/object) – OUT:

Data array to store the gathered result

sendcount (int) – IN:

Number of elements each rank sends

datatype (ncclDataType_t) – IN:

Data buffer element datatype

comm (ncclComm/object) – IN:

Communicator group object to execute on

stream (ihipStream_t/object) – IN:

HIP stream to execute collective on

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclAllGather(sendbuff, recvbuff, unsigned long sendcount, datatype, comm, stream)#

(No short description, might be part of a group.)

Args:
sendbuff (Pointer/object):

(undocumented)

recvbuff (Pointer/object):

(undocumented)

sendcount (int):

(undocumented)

datatype (ncclDataType_t):

(undocumented)

comm (ncclComm/object):

(undocumented)

stream (ihipStream_t/object):

(undocumented)

hip.rccl.ncclSend(sendbuff, unsigned long count, datatype, int peer, comm, stream)#

Send

Send data from sendbuff to rank peer. Rank peer needs to call ncclRecv with the same datatype and the same count as this rank. This operation is blocking for the GPU. If multiple ncclSend and ncclRecv operations need to progress concurrently to complete, they must be fused within a ncclGroupStart / ncclGroupEnd section.

Args:
sendbuff (Pointer/object) – IN:

Data array to send

count (int) – IN:

Number of elements to send

datatype (ncclDataType_t) – IN:

Data buffer element datatype

peer (int) – IN:

Peer rank to send to

comm (ncclComm/object) – IN:

Communicator group object to execute on

stream (ihipStream_t/object) – IN:

HIP stream to execute collective on

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclSend(sendbuff, unsigned long count, datatype, int peer, comm, stream)#

(No short description, might be part of a group.)

Args:
sendbuff (Pointer/object):

(undocumented)

count (int):

(undocumented)

datatype (ncclDataType_t):

(undocumented)

peer (int):

(undocumented)

comm (ncclComm/object):

(undocumented)

stream (ihipStream_t/object):

(undocumented)

hip.rccl.ncclRecv(recvbuff, unsigned long count, datatype, int peer, comm, stream)#

Receive

Receive data from rank peer into recvbuff. Rank peer needs to call ncclSend with the same datatype and the same count as this rank. This operation is blocking for the GPU. If multiple ncclSend and ncclRecv operations need to progress concurrently to complete, they must be fused within a ncclGroupStart/ ncclGroupEnd section.

Args:
recvbuff (Pointer/object) – OUT:

Data array to receive

count (int) – IN:

Number of elements to receive

datatype (ncclDataType_t) – IN:

Data buffer element datatype

peer (int) – IN:

Peer rank to send to

comm (ncclComm/object) – IN:

Communicator group object to execute on

stream (ihipStream_t/object) – IN:

HIP stream to execute collective on

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclRecv(recvbuff, unsigned long count, datatype, int peer, comm, stream)#

(No short description, might be part of a group.)

Args:
recvbuff (Pointer/object):

(undocumented)

count (int):

(undocumented)

datatype (ncclDataType_t):

(undocumented)

peer (int):

(undocumented)

comm (ncclComm/object):

(undocumented)

stream (ihipStream_t/object):

(undocumented)

hip.rccl.ncclGather(sendbuff, recvbuff, unsigned long sendcount, datatype, int root, comm, stream)#

Gather

Root device gathers sendcount values from other GPUs into recvbuff, receiving data from rank i at offset i*sendcount. Assumes recvcount is equal to nranks*sendcount, which means that recvbuff should have a size of at least nranks*sendcount elements. In-place operations will happen if sendbuff == recvbuff + rank * sendcount.

recvbuff* may be NULL on ranks other than root.

Args:
sendbuff (Pointer/object) – IN:

Data array to send

recvbuff (Pointer/object) – OUT:

Data array to receive into on root.

sendcount (int) – IN:

Number of elements to send per rank

datatype (ncclDataType_t) – IN:

Data buffer element datatype

root (int) – IN:

Rank that receives data from all other ranks

comm (ncclComm/object) – IN:

Communicator group object to execute on

stream (ihipStream_t/object) – IN:

HIP stream to execute collective on

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclGather(sendbuff, recvbuff, unsigned long sendcount, datatype, int root, comm, stream)#

(No short description, might be part of a group.)

Args:
sendbuff (Pointer/object):

(undocumented)

recvbuff (Pointer/object):

(undocumented)

sendcount (int):

(undocumented)

datatype (ncclDataType_t):

(undocumented)

root (int):

(undocumented)

comm (ncclComm/object):

(undocumented)

stream (ihipStream_t/object):

(undocumented)

hip.rccl.ncclScatter(sendbuff, recvbuff, unsigned long recvcount, datatype, int root, comm, stream)#

Scatter

Scattered over the devices so that recvbuff on rank i will contain the i-th block of the data on root. Assumes sendcount is equal to nranks*recvcount, which means that sendbuff should have a size of at least nranks*recvcount elements. In-place operations will happen if recvbuff == sendbuff + rank * recvcount.

Args:
sendbuff (Pointer/object) – IN:

Data array to send (on root rank). May be NULL on other ranks.

recvbuff (Pointer/object) – OUT:

Data array to receive partial subarray into

recvcount (int) – IN:

Number of elements to receive per rank

datatype (ncclDataType_t) – IN:

Data buffer element datatype

root (int) – IN:

Rank that scatters data to all other ranks

comm (ncclComm/object) – IN:

Communicator group object to execute on

stream (ihipStream_t/object) – IN:

HIP stream to execute collective on

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclScatter(sendbuff, recvbuff, unsigned long recvcount, datatype, int root, comm, stream)#

(No short description, might be part of a group.)

Args:
sendbuff (Pointer/object):

(undocumented)

recvbuff (Pointer/object):

(undocumented)

recvcount (int):

(undocumented)

datatype (ncclDataType_t):

(undocumented)

root (int):

(undocumented)

comm (ncclComm/object):

(undocumented)

stream (ihipStream_t/object):

(undocumented)

hip.rccl.ncclAllToAll(sendbuff, recvbuff, unsigned long count, datatype, comm, stream)#

All-To-All

Device (i) send (j)th block of data to device (j) and be placed as (i)th block. Each block for sending/receiving has count elements, which means that recvbuff and sendbuff should have a size of nranks*count elements. In-place operation is NOT supported. It is the user’s responsibility to ensure that sendbuff and recvbuff are distinct.

Args:
sendbuff (Pointer/object) – IN:

Data array to send (contains blocks for each other rank)

recvbuff (Pointer/object) – OUT:

Data array to receive (contains blocks from each other rank)

count (int) – IN:

Number of elements to send between each pair of ranks

datatype (ncclDataType_t) – IN:

Data buffer element datatype

comm (ncclComm/object) – IN:

Communicator group object to execute on

stream (ihipStream_t/object) – IN:

HIP stream to execute collective on

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclAllToAll(sendbuff, recvbuff, unsigned long count, datatype, comm, stream)#

(No short description, might be part of a group.)

Args:
sendbuff (Pointer/object):

(undocumented)

recvbuff (Pointer/object):

(undocumented)

count (int):

(undocumented)

datatype (ncclDataType_t):

(undocumented)

comm (ncclComm/object):

(undocumented)

stream (ihipStream_t/object):

(undocumented)

hip.rccl.ncclAllToAllv(sendbuff, sendcounts, sdispls, recvbuff, recvcounts, rdispls, datatype, comm, stream)#

All-To-Allv

Device (i) sends sendcounts[j] of data from offset sdispls[j] to device (j). At the same time, device (i) receives recvcounts[j] of data from device (j) to be placed at rdispls[j]. sendcounts, sdispls, recvcounts and rdispls are all measured in the units of datatype, not bytes. In-place operation will happen if sendbuff == recvbuff.

Args:
sendbuff (Pointer/object) – IN:

Data array to send (contains blocks for each other rank)

sendcounts (ListOfUnsignedLong/object) – IN:

Array containing number of elements to send to each participating rank

sdispls (ListOfUnsignedLong/object) – IN:

Array of offsets into sendbuff for each participating rank

recvbuff (Pointer/object) – OUT:

Data array to receive (contains blocks from each other rank)

recvcounts (ListOfUnsignedLong/object) – IN:

Array containing number of elements to receive from each participating rank

rdispls (ListOfUnsignedLong/object) – IN:

Array of offsets into recvbuff for each participating rank

datatype (ncclDataType_t) – IN:

Data buffer element datatype

comm (ncclComm/object) – IN:

Communicator group object to execute on

stream (ihipStream_t/object) – IN:

HIP stream to execute collective on

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclAllToAllv(sendbuff, sendcounts, sdispls, recvbuff, recvcounts, rdispls, datatype, comm, stream)#

(No short description, might be part of a group.)

Args:
sendbuff (Pointer/object):

(undocumented)

sendcounts (ListOfUnsignedLong/object):

(undocumented)

sdispls (ListOfUnsignedLong/object):

(undocumented)

recvbuff (Pointer/object):

(undocumented)

recvcounts (ListOfUnsignedLong/object):

(undocumented)

rdispls (ListOfUnsignedLong/object):

(undocumented)

datatype (ncclDataType_t):

(undocumented)

comm (ncclComm/object):

(undocumented)

stream (ihipStream_t/object):

(undocumented)

hip.rccl.ncclGroupStart()#

Group Start

Start a group call. All calls to RCCL until ncclGroupEnd will be fused into a single RCCL operation. Nothing will be started on the HIP stream until ncclGroupEnd.

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclGroupStart()#

(No short description, might be part of a group.)

hip.rccl.ncclGroupEnd()#

Group End

End a group call. Start a fused RCCL operation consisting of all calls since ncclGroupStart. Operations on the HIP stream depending on the RCCL operations need to be called after ncclGroupEnd.

Returns:

A tuple of size 1 that contains (in that order):

  • ncclResult_t: Result code. See rccl_result_code for more details.

hip.rccl.pncclGroupEnd()#

(No short description, might be part of a group.)