thread_block_tile< size, ParentCGTy > Class Template Reference#
Group type - thread_block_tile. More...
#include <amd_hip_cooperative_groups.h>
Public Member Functions | |
operator thread_block_tile< size, void > () const | |
unsigned int | thread_rank () const |
Rank of the calling thread within [0, size() ). | |
void | sync () |
Synchronizes the threads in the group. | |
unsigned int | meta_group_rank () const |
unsigned int | meta_group_size () const |
Returns the number of groups created when the parent group was partitioned. | |
template<class T > | |
T | shfl (T var, int srcRank) const |
Shuffle operation on group level. | |
template<class T > | |
T | shfl_down (T var, unsigned int lane_delta) const |
Shuffle down operation on group level. | |
template<class T > | |
T | shfl_up (T var, unsigned int lane_delta) const |
Shuffle up operation on group level. | |
template<class T > | |
T | shfl_xor (T var, unsigned int laneMask) const |
Shuffle xor operation on group level. | |
unsigned long long | ballot (int pred) const |
Ballot function on group level. | |
int | any (int pred) const |
Any function on group level. | |
int | all (int pred) const |
All function on group level. | |
template<typename T > | |
unsigned long long | match_any (T value) const |
Match any function on group level. | |
template<typename T > | |
unsigned long long | match_all (T value, int &pred) const |
Match all function on group level. | |
Public Member Functions inherited from cooperative_groups::thread_block_tile_base< tileSize > | |
T | shfl (T var, int srcRank) const |
T | shfl_down (T var, unsigned int lane_delta) const |
T | shfl_up (T var, unsigned int lane_delta) const |
T | shfl_xor (T var, unsigned int laneMask) const |
unsigned long long | ballot (int pred) const |
int | any (int pred) const |
int | all (int pred) const |
unsigned long long | match_any (T value) const |
unsigned long long | match_all (T value, int &pred) const |
Public Member Functions inherited from cooperative_groups::tiled_group | |
unsigned int | size () const |
unsigned int | thread_rank () const |
Rank of the calling thread within [0, size() ). | |
void | sync () const |
Synchronizes the threads in the group. | |
Public Member Functions inherited from cooperative_groups::thread_group | |
uint32_t | size () const |
unsigned int | cg_type () const |
Returns the type of the group. | |
uint32_t | thread_rank () const |
Rank of the calling thread within [0, size() ). | |
bool | is_valid () const |
Returns true if the group has not violated any API constraints. | |
void | sync () const |
Synchronizes the threads in the group. | |
Protected Member Functions | |
thread_block_tile (const ParentCGTy &g) | |
Protected Member Functions inherited from cooperative_groups::impl::thread_block_tile_internal< size, ParentCGTy > | |
template<unsigned int tbtSize, class tbtParentT > | |
thread_block_tile_internal (const thread_block_tile_internal< tbtSize, tbtParentT > &g) | |
thread_block_tile_internal (const thread_block &g) | |
Protected Member Functions inherited from cooperative_groups::thread_block_tile_type< size, ParentCGTy > | |
thread_block_tile_type () | |
Protected Member Functions inherited from cooperative_groups::tiled_group | |
tiled_group (unsigned int tileSize) | |
Protected Member Functions inherited from cooperative_groups::thread_group | |
thread_group (internal::group_type type, uint32_t size=static_cast< uint64_t >(0), uint64_t mask=static_cast< uint64_t >(0)) | |
Additional Inherited Members | |
Static Public Member Functions inherited from cooperative_groups::thread_block_tile_type< size, ParentCGTy > | |
static void | sync () |
Static Public Member Functions inherited from cooperative_groups::thread_block_tile_base< tileSize > | |
static void | sync () |
Static Public Member Functions inherited from cooperative_groups::tile_base< size > | |
static constexpr unsigned int | thread_rank () |
Rank of the thread within this tile. | |
static unsigned int | size () |
Number of threads within this tile. | |
Static Public Member Functions inherited from cooperative_groups::parent_group_info< tileSize, ParentCGTy > | |
static unsigned int | meta_group_rank () |
static unsigned int | meta_group_size () |
Returns the number of groups created when the parent group was partitioned. | |
Protected Attributes inherited from cooperative_groups::thread_group | |
uint32_t | _type |
uint32_t | _size |
Type of the thread_group. | |
uint64_t | _mask |
Total number of threads in the tread_group. | |
struct cooperative_groups::thread_group::_coalesced_info | coalesced_info |
Static Protected Attributes inherited from cooperative_groups::tile_base< size > | |
static constexpr unsigned int | numThreads |
Detailed Description
class cooperative_groups::thread_block_tile< size, ParentCGTy >
Group type - thread_block_tile.
Represents one tiled thread group in a wavefront. This group type also supports sub-wave level intrinsics.
- Note
- This type is implemented on Linux, under development on Microsoft Windows.
Constructor & Destructor Documentation
◆ thread_block_tile()
|
inlineprotected |
Member Function Documentation
◆ all()
int cooperative_groups::thread_block_tile< size, ParentCGTy >::all | ( | int | pred | ) | const |
All function on group level.
Returns non-zero if a predicate evaluates true for all threads.
- Parameters
-
pred [in] The predicate to evaluate on group threads.
◆ any()
int cooperative_groups::thread_block_tile< size, ParentCGTy >::any | ( | int | pred | ) | const |
Any function on group level.
Returns non-zero if a predicate evaluates true for any threads.
- Parameters
-
pred [in] The predicate to evaluate on group threads.
◆ ballot()
unsigned long long cooperative_groups::thread_block_tile< size, ParentCGTy >::ballot | ( | int | pred | ) | const |
Ballot function on group level.
Returns a bit mask with the Nth bit set to one if the Nth thread predicate evaluates true.
- Parameters
-
pred [in] The predicate to evaluate on group threads.
◆ match_all()
unsigned long long cooperative_groups::thread_block_tile< size, ParentCGTy >::match_all | ( | T | value, |
int & | pred | ||
) | const |
Match all function on group level.
Returns a bit mask containing a 1-bit for every participating thread if they all have the same value in value
as the caller thread. The predicate pred
is set to true if all participating threads have the same value in value
.
- Parameters
-
value [in] The value to examine on the current thread in group. pred [out] The predicate is set to true if all participating threads in the thread group have the same value.
◆ match_any()
unsigned long long cooperative_groups::thread_block_tile< size, ParentCGTy >::match_any | ( | T | value | ) | const |
Match any function on group level.
Returns a bit mask containing a 1-bit for every participating thread if that thread has the same value in value
as the caller thread.
- Parameters
-
value [in] The value to examine on the current thread in group.
◆ meta_group_rank()
unsigned int cooperative_groups::thread_block_tile< size, ParentCGTy >::meta_group_rank | ( | ) | const |
Returns the linear rank of the group within the set of tiles partitioned from a parent group (bounded by meta_group_size)
◆ meta_group_size()
unsigned int cooperative_groups::thread_block_tile< size, ParentCGTy >::meta_group_size | ( | ) | const |
Returns the number of groups created when the parent group was partitioned.
◆ operator thread_block_tile< size, void >()
|
inline |
◆ shfl()
T cooperative_groups::thread_block_tile< size, ParentCGTy >::shfl | ( | T | var, |
int | srcRank | ||
) | const |
Shuffle operation on group level.
Exchanging variables between threads without use of shared memory. Shuffle operation is a direct copy of var
from srcRank
thread ID of group.
- Template Parameters
-
T The type can be a 32-bit integer or single-precision floating point.
- Parameters
-
var [in] The source variable to copy. Only the srcRank thread ID of group is copied to other threads. srcRank [in] The source thread ID of the group for copy.
◆ shfl_down()
T cooperative_groups::thread_block_tile< size, ParentCGTy >::shfl_down | ( | T | var, |
unsigned int | lane_delta | ||
) | const |
Shuffle down operation on group level.
Exchanging variables between threads without use of shared memory. Shuffle down operation is copy of var
from thread with thread ID of group relative higher with lane_delta
to caller thread ID.
- Template Parameters
-
T The type can be a 32-bit integer or single-precision floating point.
- Parameters
-
var [in] The source variable to copy. lane_delta [in] The lane_delta is the relative thread ID difference between caller thread ID and source of copy thread ID. sourceID = (threadID + lane_delta) % size()
◆ shfl_up()
T cooperative_groups::thread_block_tile< size, ParentCGTy >::shfl_up | ( | T | var, |
unsigned int | lane_delta | ||
) | const |
Shuffle up operation on group level.
Exchanging variables between threads without use of shared memory. Shuffle up operation is copy of var
from thread with thread ID of group relative lower with lane_delta
to caller thread ID.
- Template Parameters
-
T The type can be a 32-bit integer or single-precision floating point.
- Parameters
-
var [in] The source variable to copy. lane_delta [in] The lane_delta is the relative thread ID difference between caller thread ID and source of copy thread ID. sourceID = (threadID - lane_delta) % size()
◆ shfl_xor()
T cooperative_groups::thread_block_tile< size, ParentCGTy >::shfl_xor | ( | T | var, |
unsigned int | laneMask | ||
) | const |
Shuffle xor operation on group level.
Exchanging variables between threads without use of shared memory. Shuffle xor operation is copy of var from thread with thread ID of group based on laneMask XOR of the caller thread ID.
- Template Parameters
-
T The type can be a 32-bit integer or single-precision floating point.
- Parameters
-
var [in] The source variable to copy. laneMask [in] The laneMask is the mask for XOR operation. sourceID = threadID ^ laneMask
◆ sync()
void cooperative_groups::thread_block_tile< size, ParentCGTy >::sync | ( | ) |
Synchronizes the threads in the group.
Causes all threads in the group to wait at this synchronization point, and for all shared and global memory accesses by the threads to complete, before running synchronization. This guarantees the visibility of accessed data for all threads in the group.
- Note
- There are potential read-after-write (RAW), write-after-read (WAR), or write-after-write (WAW) hazards, when threads in the group access the same addresses in shared or global memory. The data hazards can be avoided with synchronization of the group.
◆ thread_rank()
unsigned int cooperative_groups::thread_block_tile< size, ParentCGTy >::thread_rank | ( | ) | const |
Rank of the calling thread within [0, size() ).
The documentation for this class was generated from the following file:
- /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hip/checkouts/clr/hipamd/include/hip/amd_detail/amd_hip_cooperative_groups.h