Collective routines#
ROCSHMEM_BARRIER_ALL#
- 
__device__ void rocshmem_ctx_wg_barrier_all(rocshmem_ctx_t ctx)#
 
- 
__device__ void rocshmem_wg_barrier_all()#
 - Parameters:
 ctx – Context with which to perform this operation.
- Returns:
 None.
Description: This routine performs a collective barrier between all PEs in the system. The caller is blocked until the barrier is resolved.
ROCSHMEM_TEAM_SYNC#
- 
__device__ void rocshmem_ctx_wg_team_sync(rocshmem_ctx_t ctx, rocshmem_team_t team)#
 
- 
__device__ void rocshmem_wg_team_sync(rocshmem_team_t team)#
 - Parameters:
 ctx – Context with which to perform this operation.
team – Team with which to perform this operation.
- Returns:
 None.
Description: This routine registers the arrival of a PE at a barrier. The caller is blocked until the synchronization is resolved.
Unlike the shmem_barrier_all routine, shmem_team_sync only ensures the
completion and visibility of previously issued memory stores, but does not
ensure the completion of remote memory updates issued via OpenSHMEM routines.
ROCSHMEM_SYNC_ALL#
- 
__device__ void rocshmem_ctx_wg_sync_all(rocshmem_ctx_t ctx)#
 
- 
__device__ void rocshmem_wg_sync_all()#
 - Parameters:
 ctx – Context with which to perform this operation.
- Returns:
 None.
Description:
This routine behaves the same as rocshmem_wg_team_sync when called on the world team.
ROSHMEM_ALLTOALL#
- 
__device__ void rocshmem_ctx_TYPENAME_wg_alltoall(rocshmem_ctx_t ctx, rocshmem_team_t team, TYPE *dest, const TYPE *source, int nelems)#
 - Parameters:
 team – The team participating in the collective.
dest – Destination address. Must be an address on the symmetric heap.
source – Source address. Must be an address on the symmetric heap.
nelems – Number of data blocks transferred per pair of PEs.
- Returns:
 None.
Description: This routine exchanges a fixed amount of contiguous data blocks between all pairs of PEs participating in the collective routine. This function must be called as a work-group collective.
Valid TYPENAME and TYPE values are listed in RMA Data Types.
ROCSHMEM_BROADCAST#
- 
__device__ void rocshmem_ctx_TYPENAME_wg_broadcast(rocshmem_ctx_t ctx, rocshmem_team_t team, TYPE *dest, const TYPE *source, int nelems, int pe_root)#
 - Parameters:
 ctx – Context with which to perform this collective.
team – The team participating in the collective.
dest – Destination address. Must be an address on the symmetric heap.
source – Source address. Must be an address on the symmetric heap.
nelems – Number of data blocks transferred per pair of PEs.
- Returns:
 None.
Description: This routine performs a broadcast across PEs in the team. The caller is blocked until the broadcast completes.
Valid TYPENAME and TYPE values are listed in RMA Data Types.
ROCSHMEM_FCOLLECT#
- 
__device__ void rocshmem_ctx_TYPENAME_wg_fcollect(rocshmem_ctx_t ctx, rocshmem_team_t team, TYPE *dest, const TYPE *source, int nelems)#
 - Parameters:
 ctx – Context with which to perform this collective.
team – The team participating in the collective.
dest – Destination address. Must be an address on the symmetric heap.
source – Source address. Must be an address on the symmetric heap.
nelems – Number of data blocks transferred per pair of PEs.
- Returns:
 None.
Description: This routine concatenates blocks of data from multiple PEs to an array in every PE participating in the collective routine.
ROCSHMEM_REDUCTION#
- 
__device__ int rocshmem_ctx_TYPENAME_OPNAME_wg_reduce(rocshmem_ctx_t ctx, rocshmem_team_t team, TYPE *dest, const TYPE *source, int nreduce)#
 - Parameters:
 ctx – Context with which to perform this collective.
team – The team participating in the collective.
dest – Destination address. Must be an address on the symmetric heap.
source – Source address. Must be an address on the symmetric heap.
nreduce – Number of data blocks transferred per pair of PEs.
- Returns:
 Zero on successful local completion. Nonzero otherwise.
Description: This routine performs an allreduce operation across PEs in the team.
Valid TYPENAME, TYPE, and OPNAME values are listed in Reduction Types, Names and Operations.
Supported reduction types and operations#
TYPE  | 
TYPENAME  | 
OPNAME  | 
Supported  | 
|---|---|---|---|
char  | 
char  | 
max, min, sum, prod  | 
No  | 
signed char  | 
schar  | 
max, min, sum, prod  | 
No  | 
short  | 
short  | 
max, min, sum, prod  | 
Yes  | 
int  | 
int  | 
max, min, sum, prod  | 
Yes  | 
long  | 
long  | 
max, min, sum, prod  | 
Yes  | 
long long  | 
longlong  | 
max, min, sum, prod  | 
Yes  | 
ptrdiff_t  | 
ptrdiff  | 
max, min, sum, prod  | 
No  | 
unsigned char  | 
uchar  | 
and, or, xor, max, min, sum, prod  | 
No  | 
unsigned short  | 
ushort  | 
and, or, xor, max, min, sum, prod  | 
No  | 
unsigned int  | 
uint  | 
and, or, xor, max, min, sum, prod  | 
No  | 
unsigned long  | 
ulong  | 
and, or, xor, max, min, sum, prod  | 
No  | 
unsigned long long  | 
ulonglong  | 
and, or, xor, max, min, sum, prod  | 
No  | 
int8_t  | 
int8  | 
and, or, xor, max, min, sum, prod  | 
No  | 
int16_t  | 
int16  | 
and, or, xor, max, min, sum, prod  | 
No  | 
int32_t  | 
int32  | 
and, or, xor, max, min, sum, prod  | 
No  | 
int64_t  | 
int64  | 
and, or, xor, max, min, sum, prod  | 
No  | 
uint8_t  | 
uint8  | 
and, or, xor, max, min, sum, prod  | 
No  | 
uint16_t  | 
uint16  | 
and, or, xor, max, min, sum, prod  | 
No  | 
uint32_t  | 
uint32  | 
and, or, xor, max, min, sum, prod  | 
No  | 
uint64_t  | 
uint64  | 
and, or, xor, max, min, sum, prod  | 
No  | 
size_t  | 
size  | 
and, or, xor, max, min, sum, prod  | 
No  | 
float  | 
float  | 
max, min, sum, prod  | 
Yes  | 
double  | 
double  | 
max, min, sum, prod  | 
Yes  | 
long double  | 
longdouble  | 
max, min, sum, prod  | 
No  | 
double _Complex  | 
complexd  | 
sum, prod  | 
No  | 
float _Complex  | 
complexf  | 
sum, prod  | 
No  |