Discontinuity#
-
template<class T, unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
class block_discontinuity# The
block_discontinuity
class is a block level parallel primitive which provides methods for flagging items that are discontinued within an ordered set of items across threads in a block.- Overview
There are two types of flags:
Head flags.
Tail flags.
The above flags are used to differentiate items from their predecessors or successors.
E.g. Head flags are convenient for differentiating disjoint data segments as part of a segmented reduction/scan.
- Examples
In the examples discontinuity operation is performed on block of 128 threads, using type
int
.__global__ void example_kernel(...) { // specialize discontinuity for int and a block of 128 threads using block_discontinuity_int = rocprim::block_discontinuity<int, 128>; // allocate storage in shared memory __shared__ block_discontinuity_int::storage_type storage; // segment of consecutive items to be used int input[8]; ... int head_flags[8]; block_discontinuity_int b_discontinuity; using flag_op_type = typename rocprim::greater<int>; b_discontinuity.flag_heads(head_flags, input, flag_op_type(), storage); ... }
- Template Parameters:
T – - the input type.
BlockSize – - the number of threads in a block.
Public Types
-
using storage_type = storage_type_#
Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive.
Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords
. It can be aliased to an externally allocated memory, or be a part of a union type with other storage types to increase shared memory reusability.
Public Functions
-
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads(Flag (&head_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)# Tags
head_flags
that indicate discontinuities between items partitioned across the thread block, where the first item has no reference and is always flagged.- Storage reusage
Synchronization barrier should be placed before
storage
is reused or repurposed:__syncthreads()
orrocprim::syncthreads()
.- Example.
__global__ void example_kernel(...) { // specialize discontinuity for int and a block of 128 threads using block_discontinuity_int = rocprim::block_discontinuity<int, 128>; // allocate storage in shared memory __shared__ block_discontinuity_int::storage_type storage; // segment of consecutive items to be used int input[8]; ... int head_flags[8]; block_discontinuity_int b_discontinuity; using flag_op_type = typename rocprim::greater<int>; b_discontinuity.flag_heads(head_flags, input, flag_op_type(), storage); ... }
- Template Parameters:
ItemsPerThread – - [inferred] the number of items to be processed by each thread.
Flag – - [inferred] the flag type.
FlagOp – - [inferred] type of binary function used for flagging.
- Parameters:
head_flags – [out] - array that contains the head flags.
input – [in] - array that data is loaded from.
flag_op – [in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);
orbool (const T& a, const T& b, unsigned int b_index);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it.storage – [in] - reference to a temporary storage object of type storage_type.
-
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads(Flag (&head_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op)# This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not take a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.
-
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads(Flag (&head_flags)[ItemsPerThread], T tile_predecessor_item, const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)# Tags
head_flags
that indicate discontinuities between items partitioned across the thread block, where the first item of the first thread is compared against atile_predecessor_item
.- Storage reusage
Synchronization barrier should be placed before
storage
is reused or repurposed:__syncthreads()
orrocprim::syncthreads()
.- Example.
__global__ void example_kernel(...) { // specialize discontinuity for int and a block of 128 threads using block_discontinuity_int = rocprim::block_discontinuity<int, 128>; // allocate storage in shared memory __shared__ block_discontinuity_int::storage_type storage; // segment of consecutive items to be used int input[8]; int tile_item = 0; if (threadIdx.x == 0) { tile_item = ... } ... int head_flags[8]; block_discontinuity_int b_discontinuity; using flag_op_type = typename rocprim::greater<int>; b_discontinuity.flag_heads(head_flags, tile_item, input, flag_op_type(), storage); ... }
- Template Parameters:
ItemsPerThread – - [inferred] the number of items to be processed by each thread.
Flag – - [inferred] the flag type.
FlagOp – - [inferred] type of binary function used for flagging.
- Parameters:
head_flags – [out] - array that contains the head flags.
tile_predecessor_item – [in] - first tile item from thread to be compared against.
input – [in] - array that data is loaded from.
flag_op – [in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);
orbool (const T& a, const T& b, unsigned int b_index);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it.storage – [in] - reference to a temporary storage object of type storage_type.
-
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads(Flag (&head_flags)[ItemsPerThread], T tile_predecessor_item, const T (&input)[ItemsPerThread], FlagOp flag_op)# This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.
-
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_tails(Flag (&tail_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)# Tags
tail_flags
that indicate discontinuities between items partitioned across the thread block, where the last item has no reference and is always flagged.- Storage reusage
Synchronization barrier should be placed before
storage
is reused or repurposed:__syncthreads()
orrocprim::syncthreads()
.- Example.
__global__ void example_kernel(...) { // specialize discontinuity for int and a block of 128 threads using block_discontinuity_int = rocprim::block_discontinuity<int, 128>; // allocate storage in shared memory __shared__ block_discontinuity_int::storage_type storage; // segment of consecutive items to be used int input[8]; ... int tail_flags[8]; block_discontinuity_int b_discontinuity; using flag_op_type = typename rocprim::greater<int>; b_discontinuity.flag_tails(tail_flags, input, flag_op_type(), storage); ... }
- Template Parameters:
ItemsPerThread – - [inferred] the number of items to be processed by each thread.
Flag – - [inferred] the flag type.
FlagOp – - [inferred] type of binary function used for flagging.
- Parameters:
tail_flags – [out] - array that contains the tail flags.
input – [in] - array that data is loaded from.
flag_op – [in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);
orbool (const T& a, const T& b, unsigned int b_index);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it.storage – [in] - reference to a temporary storage object of type storage_type.
-
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_tails(Flag (&tail_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op)# This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.
-
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_tails(Flag (&tail_flags)[ItemsPerThread], T tile_successor_item, const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)# Tags
tail_flags
that indicate discontinuities between items partitioned across the thread block, where the last item of the last thread is compared against atile_successor_item
.- Storage reusage
Synchronization barrier should be placed before
storage
is reused or repurposed:__syncthreads()
orrocprim::syncthreads()
.- Example.
__global__ void example_kernel(...) { // specialize discontinuity for int and a block of 128 threads using block_discontinuity_int = rocprim::block_discontinuity<int, 128>; // allocate storage in shared memory __shared__ block_discontinuity_int::storage_type storage; // segment of consecutive items to be used int input[8]; int tile_item = 0; if (threadIdx.x == 0) { tile_item = ... } ... int tail_flags[8]; block_discontinuity_int b_discontinuity; using flag_op_type = typename rocprim::greater<int>; b_discontinuity.flag_tails(tail_flags, tile_item, input, flag_op_type(), storage); ... }
- Template Parameters:
ItemsPerThread – - [inferred] the number of items to be processed by each thread.
Flag – - [inferred] the flag type.
FlagOp – - [inferred] type of binary function used for flagging.
- Parameters:
tail_flags – [out] - array that contains the tail flags.
tile_successor_item – [in] - last tile item from thread to be compared against.
input – [in] - array that data is loaded from.
flag_op – [in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);
orbool (const T& a, const T& b, unsigned int b_index);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it.storage – [in] - reference to a temporary storage object of type storage_type.
-
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_tails(Flag (&tail_flags)[ItemsPerThread], T tile_successor_item, const T (&input)[ItemsPerThread], FlagOp flag_op)# This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.
-
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], Flag (&tail_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)# Tags both
head_flags
andtail_flags
that indicate discontinuities between items partitioned across the thread block.- Storage reusage
Synchronization barrier should be placed before
storage
is reused or repurposed:__syncthreads()
orrocprim::syncthreads()
.- Example.
__global__ void example_kernel(...) { // specialize discontinuity for int and a block of 128 threads using block_discontinuity_int = rocprim::block_discontinuity<int, 128>; // allocate storage in shared memory __shared__ block_discontinuity_int::storage_type storage; // segment of consecutive items to be used int input[8]; ... int head_flags[8]; int tail_flags[8]; block_discontinuity_int b_discontinuity; using flag_op_type = typename rocprim::greater<int>; b_discontinuity.flag_heads_and_tails(head_flags, tail_flags, input, flag_op_type(), storage); ... }
- Template Parameters:
ItemsPerThread – - [inferred] the number of items to be processed by each thread.
Flag – - [inferred] the flag type.
FlagOp – - [inferred] type of binary function used for flagging.
- Parameters:
head_flags – [out] - array that contains the head flags.
tail_flags – [out] - array that contains the tail flags.
input – [in] - array that data is loaded from.
flag_op – [in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);
orbool (const T& a, const T& b, unsigned int b_index);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it.storage – [in] - reference to a temporary storage object of type storage_type.
-
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], Flag (&tail_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op)# This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.
-
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], Flag (&tail_flags)[ItemsPerThread], T tile_successor_item, const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)# Tags both
head_flags
andtail_flags
that indicate discontinuities between items partitioned across the thread block, where the last item of the last thread is compared against atile_successor_item
.- Storage reusage
Synchronization barrier should be placed before
storage
is reused or repurposed:__syncthreads()
orrocprim::syncthreads()
.- Example.
__global__ void example_kernel(...) { // specialize discontinuity for int and a block of 128 threads using block_discontinuity_int = rocprim::block_discontinuity<int, 128>; // allocate storage in shared memory __shared__ block_discontinuity_int::storage_type storage; // segment of consecutive items to be used int input[8]; int tile_item = 0; if (threadIdx.x == 0) { tile_item = ... } ... int head_flags[8]; int tail_flags[8]; block_discontinuity_int b_discontinuity; using flag_op_type = typename rocprim::greater<int>; b_discontinuity.flag_heads_and_tails(head_flags, tail_flags, tile_item, input, flag_op_type(), storage); ... }
- Template Parameters:
ItemsPerThread – - [inferred] the number of items to be processed by each thread.
Flag – - [inferred] the flag type.
FlagOp – - [inferred] type of binary function used for flagging.
- Parameters:
head_flags – [out] - array that contains the head flags.
tail_flags – [out] - array that contains the tail flags.
tile_successor_item – [in] - last tile item from thread to be compared against.
input – [in] - array that data is loaded from.
flag_op – [in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);
orbool (const T& a, const T& b, unsigned int b_index);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it.storage – [in] - reference to a temporary storage object of type storage_type.
-
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], Flag (&tail_flags)[ItemsPerThread], T tile_successor_item, const T (&input)[ItemsPerThread], FlagOp flag_op)# This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.
-
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag (&tail_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)# Tags both
head_flags
andtail_flags
that indicate discontinuities between items partitioned across the thread block, where the first item of the first thread is compared against atile_predecessor_item
.- Storage reusage
Synchronization barrier should be placed before
storage
is reused or repurposed:__syncthreads()
orrocprim::syncthreads()
.- Example.
__global__ void example_kernel(...) { // specialize discontinuity for int and a block of 128 threads using block_discontinuity_int = rocprim::block_discontinuity<int, 128>; // allocate storage in shared memory __shared__ block_discontinuity_int::storage_type storage; // segment of consecutive items to be used int input[8]; int tile_item = 0; if (threadIdx.x == 0) { tile_item = ... } ... int head_flags[8]; int tail_flags[8]; block_discontinuity_int b_discontinuity; using flag_op_type = typename rocprim::greater<int>; b_discontinuity.flag_heads_and_tails(head_flags, tile_item, tail_flags, input, flag_op_type(), storage); ... }
- Template Parameters:
ItemsPerThread – - [inferred] the number of items to be processed by each thread.
Flag – - [inferred] the flag type.
FlagOp – - [inferred] type of binary function used for flagging.
- Parameters:
head_flags – [out] - array that contains the head flags.
tile_predecessor_item – [in] - first tile item from thread to be compared against.
tail_flags – [out] - array that contains the tail flags.
input – [in] - array that data is loaded from.
flag_op – [in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);
orbool (const T& a, const T& b, unsigned int b_index);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it.storage – [in] - reference to a temporary storage object of type storage_type.
-
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag (&tail_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op)# This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.
-
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag (&tail_flags)[ItemsPerThread], T tile_successor_item, const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)# Tags both
head_flags
andtail_flags
that indicate discontinuities between items partitioned across the thread block, where the first and last items of the first and last thread is compared against atile_predecessor_item
and atile_successor_item
.- Storage reusage
Synchronization barrier should be placed before
storage
is reused or repurposed:__syncthreads()
orrocprim::syncthreads()
.- Example.
__global__ void example_kernel(...) { // specialize discontinuity for int and a block of 128 threads using block_discontinuity_int = rocprim::block_discontinuity<int, 128>; // allocate storage in shared memory __shared__ block_discontinuity_int::storage_type storage; // segment of consecutive items to be used int input[8]; int tile_predecessor_item = 0; int tile_successor_item = 0; if (threadIdx.x == 0) { tile_predecessor_item = ... tile_successor_item = ... } ... int head_flags[8]; int tail_flags[8]; block_discontinuity_int b_discontinuity; using flag_op_type = typename rocprim::greater<int>; b_discontinuity.flag_heads_and_tails(head_flags, tile_predecessor_item, tail_flags, tile_successor_item, input, flag_op_type(), storage); ... }
- Template Parameters:
ItemsPerThread – - [inferred] the number of items to be processed by each thread.
Flag – - [inferred] the flag type.
FlagOp – - [inferred] type of binary function used for flagging.
- Parameters:
head_flags – [out] - array that contains the head flags.
tile_predecessor_item – [in] - first tile item from thread to be compared against.
tail_flags – [out] - array that contains the tail flags.
tile_successor_item – [in] - last tile item from thread to be compared against.
input – [in] - array that data is loaded from.
flag_op – [in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);
orbool (const T& a, const T& b, unsigned int b_index);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it.storage – [in] - reference to a temporary storage object of type storage_type.
-
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag (&tail_flags)[ItemsPerThread], T tile_successor_item, const T (&input)[ItemsPerThread], FlagOp flag_op)# This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.