HIP Cooperative groups API#

Cooperative kernel launches#

The following host-side functions are used for cooperative kernel launches.

Warning

doxygenfunction: Cannot find function “hipLaunchCooperativeKernel” in doxygen xml output for project “HIP 6.2.41133 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hip/checkouts/latest/docs/doxygen/xml

Warning

doxygenfunction: Cannot find function “hipLaunchCooperativeKernel” in doxygen xml output for project “HIP 6.2.41133 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hip/checkouts/latest/docs/doxygen/xml

Warning

doxygenfunction: Cannot find function “hipLaunchCooperativeKernelMultiDevice” in doxygen xml output for project “HIP 6.2.41133 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hip/checkouts/latest/docs/doxygen/xml

Warning

doxygenfunction: Cannot find function “hipModuleLaunchCooperativeKernel” in doxygen xml output for project “HIP 6.2.41133 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hip/checkouts/latest/docs/doxygen/xml

Warning

doxygenfunction: Cannot find function “hipModuleLaunchCooperativeKernelMultiDevice” in doxygen xml output for project “HIP 6.2.41133 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hip/checkouts/latest/docs/doxygen/xml

Cooperative groups classes#

The following cooperative groups classes can be used on the device side.

class thread_group#

The base type of all cooperative group types.

Holds the key properties of a constructed cooperative group types object, like the group type, its size, etc.

Note

Cooperative groups feature is implemented on Linux, under development on Microsoft Windows.

Subclassed by cooperative_groups::coalesced_group, cooperative_groups::grid_group, cooperative_groups::multi_grid_group, cooperative_groups::thread_block, cooperative_groups::tiled_group

class thread_block : public cooperative_groups::thread_group#

The workgroup (thread-block in CUDA terminology) cooperative group type.

Represents an intra-workgroup cooperative group type, where the participating threads within the group are the same threads that participated in the currently executing workgroup.

Note

This function is implemented on Linux and is under development on Microsoft Windows.

class grid_group : public cooperative_groups::thread_group#

The grid cooperative group type.

Represents an inter-workgroup cooperative group type, where the participating threads within the group spans across multiple workgroups running the (same) kernel on the same device.

Note

This is implemented on Linux and is under development on Microsoft Windows.

class multi_grid_group : public cooperative_groups::thread_group#

The multi-grid cooperative group type.

Represents an inter-device cooperative group type, where the participating threads within the group span across multiple devices, running the (same) kernel on these devices.

Note

The multi-grid cooperative group type is implemented on Linux, under development on Microsoft Windows.

template<unsigned int size, class ParentCGTy>
class thread_block_tile : public cooperative_groups::impl::thread_block_tile_internal<size, ParentCGTy>#

Group type - thread_block_tile.

Represents one tiled thread group in a wavefront. This group type also supports sub-wave level intrinsics.

Note

This type is implemented on Linux, under development on Microsoft Windows.

Public Functions

unsigned int thread_rank() const#

Rank of the calling thread within [0, size() ).

void sync()#

Synchronizes the threads in the group.

Causes all threads in the group to wait at this synchronization point, and for all shared and global memory accesses by the threads to complete, before running synchronization. This guarantees the visibility of accessed data for all threads in the group.

Note

There are potential read-after-write (RAW), write-after-read (WAR), or write-after-write (WAW) hazards, when threads in the group access the same addresses in shared or global memory. The data hazards can be avoided with synchronization of the group.

unsigned int meta_group_rank() const#

Returns the linear rank of the group within the set of tiles partitioned from a parent group (bounded by meta_group_size)

unsigned int meta_group_size() const#

Returns the number of groups created when the parent group was partitioned.

template<class T>
T shfl(T var, int srcRank) const#

Shuffle operation on group level.

Exchanging variables between threads without use of shared memory. Shuffle operation is a direct copy of var from srcRank thread ID of group.

Template Parameters:

T – The type can be a 32-bit integer or single-precision floating point.

Parameters:
  • var – [in] The source variable to copy. Only the srcRank thread ID of group is copied to other threads.

  • srcRank – [in] The source thread ID of the group for copy.

template<class T>
T shfl_down(T var, unsigned int lane_delta) const#

Shuffle down operation on group level.

Exchanging variables between threads without use of shared memory. Shuffle down operation is copy of var from thread with thread ID of group relative higher with lane_delta to caller thread ID.

Template Parameters:

T – The type can be a 32-bit integer or single-precision floating point.

Parameters:
  • var – [in] The source variable to copy.

  • lane_delta – [in] The lane_delta is the relative thread ID difference between caller thread ID and source of copy thread ID. sourceID = (threadID + lane_delta) % size()

template<class T>
T shfl_up(T var, unsigned int lane_delta) const#

Shuffle up operation on group level.

Exchanging variables between threads without use of shared memory. Shuffle up operation is copy of var from thread with thread ID of group relative lower with lane_delta to caller thread ID.

Template Parameters:

T – The type can be a 32-bit integer or single-precision floating point.

Parameters:
  • var – [in] The source variable to copy.

  • lane_delta – [in] The lane_delta is the relative thread ID difference between caller thread ID and source of copy thread ID. sourceID = (threadID - lane_delta) % size()

template<class T>
T shfl_xor(T var, unsigned int laneMask) const#

Shuffle xor operation on group level.

Exchanging variables between threads without use of shared memory. Shuffle xor operation is copy of var from thread with thread ID of group based on laneMask XOR of the caller thread ID.

Template Parameters:

T – The type can be a 32-bit integer or single-precision floating point.

Parameters:
  • var – [in] The source variable to copy.

  • laneMask – [in] The laneMask is the mask for XOR operation. sourceID = threadID ^ laneMask

unsigned long long ballot(int pred) const#

Ballot function on group level.

Returns a bit mask with the Nth bit set to one if the Nth thread predicate evaluates true.

Parameters:

pred – [in] The predicate to evaluate on group threads.

int any(int pred) const#

Any function on group level.

Returns non-zero if a predicate evaluates true for any threads.

Parameters:

pred – [in] The predicate to evaluate on group threads.

int all(int pred) const#

All function on group level.

Returns non-zero if a predicate evaluates true for all threads.

Parameters:

pred – [in] The predicate to evaluate on group threads.

template<typename T>
unsigned long long match_any(T value) const#

Match any function on group level.

Returns a bit mask containing a 1-bit for every participating thread if that thread has the same value in value as the caller thread.

Parameters:

value – [in] The value to examine on the current thread in group.

template<typename T>
unsigned long long match_all(T value, int &pred) const#

Match all function on group level.

Returns a bit mask containing a 1-bit for every participating thread if they all have the same value in value as the caller thread. The predicate pred is set to true if all participating threads have the same value in value.

Parameters:
  • value – [in] The value to examine on the current thread in group.

  • pred – [out] The predicate is set to true if all participating threads in the thread group have the same value.

class coalesced_group : public cooperative_groups::thread_group#

The coalesced_group cooperative group type.

Represents an active thread group in a wavefront. This group type also supports sub-wave level intrinsics.

Note

This is implemented on Linux and is under development on Microsoft Windows.

Cooperative groups construct functions#

The following functions are used to construct different group-type instances on the device side.

Warning

doxygenfunction: Cannot find function “cooperative_groups::this_multi_grid” in doxygen xml output for project “HIP 6.2.41133 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hip/checkouts/latest/docs/doxygen/xml

Warning

doxygenfunction: Cannot find function “cooperative_groups::this_grid” in doxygen xml output for project “HIP 6.2.41133 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hip/checkouts/latest/docs/doxygen/xml

Warning

doxygenfunction: Cannot find function “cooperative_groups::this_thread_block” in doxygen xml output for project “HIP 6.2.41133 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hip/checkouts/latest/docs/doxygen/xml

Warning

doxygenfunction: Cannot find function “cooperative_groups::coalesced_threads” in doxygen xml output for project “HIP 6.2.41133 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hip/checkouts/latest/docs/doxygen/xml

Warning

doxygenfunction: Cannot find function “cooperative_groups::tiled_partition” in doxygen xml output for project “HIP 6.2.41133 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hip/checkouts/latest/docs/doxygen/xml

Warning

doxygenfunction: Cannot find function “cooperative_groups::tiled_partition” in doxygen xml output for project “HIP 6.2.41133 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hip/checkouts/latest/docs/doxygen/xml

Warning

doxygenfunction: Cannot find function “cooperative_groups::binary_partition” in doxygen xml output for project “HIP 6.2.41133 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hip/checkouts/latest/docs/doxygen/xml

Warning

doxygenfunction: Cannot find function “cooperative_groups::binary_partition” in doxygen xml output for project “HIP 6.2.41133 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hip/checkouts/latest/docs/doxygen/xml

Cooperative groups exposed API functions#

The following functions are the exposed API for different group-type instances on the device side.

Warning

doxygenfunction: Cannot find function “cooperative_groups::group_size” in doxygen xml output for project “HIP 6.2.41133 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hip/checkouts/latest/docs/doxygen/xml

Warning

doxygenfunction: Cannot find function “cooperative_groups::thread_rank” in doxygen xml output for project “HIP 6.2.41133 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hip/checkouts/latest/docs/doxygen/xml

Warning

doxygenfunction: Cannot find function “cooperative_groups::is_valid” in doxygen xml output for project “HIP 6.2.41133 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hip/checkouts/latest/docs/doxygen/xml

Warning

doxygenfunction: Cannot find function “cooperative_groups::sync” in doxygen xml output for project “HIP 6.2.41133 Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hip/checkouts/latest/docs/doxygen/xml