Discontinuity#
- 
template<class T, unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
 class block_discontinuity#
- The - block_discontinuityclass is a block level parallel primitive which provides methods for flagging items that are discontinued within an ordered set of items across threads in a block.- Overview
- There are two types of flags: - Head flags. 
- Tail flags. 
 
- The above flags are used to differentiate items from their predecessors or successors. 
- E.g. Head flags are convenient for differentiating disjoint data segments as part of a segmented reduction/scan. 
 
- Examples
- In the examples discontinuity operation is performed on block of 128 threads, using type - int.- __global__ void example_kernel(...) { // specialize discontinuity for int and a block of 128 threads using block_discontinuity_int = rocprim::block_discontinuity<int, 128>; // allocate storage in shared memory __shared__ block_discontinuity_int::storage_type storage; // segment of consecutive items to be used int input[8]; ... int head_flags[8]; block_discontinuity_int b_discontinuity; using flag_op_type = typename rocprim::greater<int>; b_discontinuity.flag_heads(head_flags, input, flag_op_type(), storage); ... } 
 - Template Parameters:
- T – - the input type. 
- BlockSize – - the number of threads in a block. 
 
 - Public Types - 
using storage_type = storage_type_#
- Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. - Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords - . It can be aliased to an externally allocated memory, or be a part of a union type with other storage types to increase shared memory reusability.
 - Public Functions - 
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
 __device__ inline void flag_heads(Flag (&head_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)#
- Tags - head_flagsthat indicate discontinuities between items partitioned across the thread block, where the first item has no reference and is always flagged.- Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Example.
- __global__ void example_kernel(...) { // specialize discontinuity for int and a block of 128 threads using block_discontinuity_int = rocprim::block_discontinuity<int, 128>; // allocate storage in shared memory __shared__ block_discontinuity_int::storage_type storage; // segment of consecutive items to be used int input[8]; ... int head_flags[8]; block_discontinuity_int b_discontinuity; using flag_op_type = typename rocprim::greater<int>; b_discontinuity.flag_heads(head_flags, input, flag_op_type(), storage); ... } 
 - Template Parameters:
- ItemsPerThread – - [inferred] the number of items to be processed by each thread. 
- Flag – - [inferred] the flag type. 
- FlagOp – - [inferred] type of binary function used for flagging. 
 
- Parameters:
- head_flags – [out] - array that contains the head flags. 
- input – [in] - array that data is loaded from. 
- flag_op – [in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: - bool f(const T &a, const T &b);or- bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
- storage – [in] - reference to a temporary storage object of type storage_type. 
 
 
 - 
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
 __device__ inline void flag_heads(Flag (&head_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op)#
- This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not take a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function. 
 - 
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
 __device__ inline void flag_heads(Flag (&head_flags)[ItemsPerThread], T tile_predecessor_item, const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)#
- Tags - head_flagsthat indicate discontinuities between items partitioned across the thread block, where the first item of the first thread is compared against a- tile_predecessor_item.- Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Example.
- __global__ void example_kernel(...) { // specialize discontinuity for int and a block of 128 threads using block_discontinuity_int = rocprim::block_discontinuity<int, 128>; // allocate storage in shared memory __shared__ block_discontinuity_int::storage_type storage; // segment of consecutive items to be used int input[8]; int tile_item = 0; if (threadIdx.x == 0) { tile_item = ... } ... int head_flags[8]; block_discontinuity_int b_discontinuity; using flag_op_type = typename rocprim::greater<int>; b_discontinuity.flag_heads(head_flags, tile_item, input, flag_op_type(), storage); ... } 
 - Template Parameters:
- ItemsPerThread – - [inferred] the number of items to be processed by each thread. 
- Flag – - [inferred] the flag type. 
- FlagOp – - [inferred] type of binary function used for flagging. 
 
- Parameters:
- head_flags – [out] - array that contains the head flags. 
- tile_predecessor_item – [in] - first tile item from thread to be compared against. 
- input – [in] - array that data is loaded from. 
- flag_op – [in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: - bool f(const T &a, const T &b);or- bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
- storage – [in] - reference to a temporary storage object of type storage_type. 
 
 
 - 
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
 __device__ inline void flag_heads(Flag (&head_flags)[ItemsPerThread], T tile_predecessor_item, const T (&input)[ItemsPerThread], FlagOp flag_op)#
- This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function. 
 - 
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
 __device__ inline void flag_tails(Flag (&tail_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)#
- Tags - tail_flagsthat indicate discontinuities between items partitioned across the thread block, where the last item has no reference and is always flagged.- Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Example.
- __global__ void example_kernel(...) { // specialize discontinuity for int and a block of 128 threads using block_discontinuity_int = rocprim::block_discontinuity<int, 128>; // allocate storage in shared memory __shared__ block_discontinuity_int::storage_type storage; // segment of consecutive items to be used int input[8]; ... int tail_flags[8]; block_discontinuity_int b_discontinuity; using flag_op_type = typename rocprim::greater<int>; b_discontinuity.flag_tails(tail_flags, input, flag_op_type(), storage); ... } 
 - Template Parameters:
- ItemsPerThread – - [inferred] the number of items to be processed by each thread. 
- Flag – - [inferred] the flag type. 
- FlagOp – - [inferred] type of binary function used for flagging. 
 
- Parameters:
- tail_flags – [out] - array that contains the tail flags. 
- input – [in] - array that data is loaded from. 
- flag_op – [in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: - bool f(const T &a, const T &b);or- bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
- storage – [in] - reference to a temporary storage object of type storage_type. 
 
 
 - 
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
 __device__ inline void flag_tails(Flag (&tail_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op)#
- This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function. 
 - 
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
 __device__ inline void flag_tails(Flag (&tail_flags)[ItemsPerThread], T tile_successor_item, const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)#
- Tags - tail_flagsthat indicate discontinuities between items partitioned across the thread block, where the last item of the last thread is compared against a- tile_successor_item.- Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Example.
- __global__ void example_kernel(...) { // specialize discontinuity for int and a block of 128 threads using block_discontinuity_int = rocprim::block_discontinuity<int, 128>; // allocate storage in shared memory __shared__ block_discontinuity_int::storage_type storage; // segment of consecutive items to be used int input[8]; int tile_item = 0; if (threadIdx.x == 0) { tile_item = ... } ... int tail_flags[8]; block_discontinuity_int b_discontinuity; using flag_op_type = typename rocprim::greater<int>; b_discontinuity.flag_tails(tail_flags, tile_item, input, flag_op_type(), storage); ... } 
 - Template Parameters:
- ItemsPerThread – - [inferred] the number of items to be processed by each thread. 
- Flag – - [inferred] the flag type. 
- FlagOp – - [inferred] type of binary function used for flagging. 
 
- Parameters:
- tail_flags – [out] - array that contains the tail flags. 
- tile_successor_item – [in] - last tile item from thread to be compared against. 
- input – [in] - array that data is loaded from. 
- flag_op – [in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: - bool f(const T &a, const T &b);or- bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
- storage – [in] - reference to a temporary storage object of type storage_type. 
 
 
 - 
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
 __device__ inline void flag_tails(Flag (&tail_flags)[ItemsPerThread], T tile_successor_item, const T (&input)[ItemsPerThread], FlagOp flag_op)#
- This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function. 
 - 
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
 __device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], Flag (&tail_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)#
- Tags both - head_flagsand- tail_flagsthat indicate discontinuities between items partitioned across the thread block.- Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Example.
- __global__ void example_kernel(...) { // specialize discontinuity for int and a block of 128 threads using block_discontinuity_int = rocprim::block_discontinuity<int, 128>; // allocate storage in shared memory __shared__ block_discontinuity_int::storage_type storage; // segment of consecutive items to be used int input[8]; ... int head_flags[8]; int tail_flags[8]; block_discontinuity_int b_discontinuity; using flag_op_type = typename rocprim::greater<int>; b_discontinuity.flag_heads_and_tails(head_flags, tail_flags, input, flag_op_type(), storage); ... } 
 - Template Parameters:
- ItemsPerThread – - [inferred] the number of items to be processed by each thread. 
- Flag – - [inferred] the flag type. 
- FlagOp – - [inferred] type of binary function used for flagging. 
 
- Parameters:
- head_flags – [out] - array that contains the head flags. 
- tail_flags – [out] - array that contains the tail flags. 
- input – [in] - array that data is loaded from. 
- flag_op – [in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: - bool f(const T &a, const T &b);or- bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
- storage – [in] - reference to a temporary storage object of type storage_type. 
 
 
 - 
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
 __device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], Flag (&tail_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op)#
- This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function. 
 - 
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
 __device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], Flag (&tail_flags)[ItemsPerThread], T tile_successor_item, const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)#
- Tags both - head_flagsand- tail_flagsthat indicate discontinuities between items partitioned across the thread block, where the last item of the last thread is compared against a- tile_successor_item.- Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Example.
- __global__ void example_kernel(...) { // specialize discontinuity for int and a block of 128 threads using block_discontinuity_int = rocprim::block_discontinuity<int, 128>; // allocate storage in shared memory __shared__ block_discontinuity_int::storage_type storage; // segment of consecutive items to be used int input[8]; int tile_item = 0; if (threadIdx.x == 0) { tile_item = ... } ... int head_flags[8]; int tail_flags[8]; block_discontinuity_int b_discontinuity; using flag_op_type = typename rocprim::greater<int>; b_discontinuity.flag_heads_and_tails(head_flags, tail_flags, tile_item, input, flag_op_type(), storage); ... } 
 - Template Parameters:
- ItemsPerThread – - [inferred] the number of items to be processed by each thread. 
- Flag – - [inferred] the flag type. 
- FlagOp – - [inferred] type of binary function used for flagging. 
 
- Parameters:
- head_flags – [out] - array that contains the head flags. 
- tail_flags – [out] - array that contains the tail flags. 
- tile_successor_item – [in] - last tile item from thread to be compared against. 
- input – [in] - array that data is loaded from. 
- flag_op – [in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: - bool f(const T &a, const T &b);or- bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
- storage – [in] - reference to a temporary storage object of type storage_type. 
 
 
 - 
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
 __device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], Flag (&tail_flags)[ItemsPerThread], T tile_successor_item, const T (&input)[ItemsPerThread], FlagOp flag_op)#
- This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function. 
 - 
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
 __device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag (&tail_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)#
- Tags both - head_flagsand- tail_flagsthat indicate discontinuities between items partitioned across the thread block, where the first item of the first thread is compared against a- tile_predecessor_item.- Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Example.
- __global__ void example_kernel(...) { // specialize discontinuity for int and a block of 128 threads using block_discontinuity_int = rocprim::block_discontinuity<int, 128>; // allocate storage in shared memory __shared__ block_discontinuity_int::storage_type storage; // segment of consecutive items to be used int input[8]; int tile_item = 0; if (threadIdx.x == 0) { tile_item = ... } ... int head_flags[8]; int tail_flags[8]; block_discontinuity_int b_discontinuity; using flag_op_type = typename rocprim::greater<int>; b_discontinuity.flag_heads_and_tails(head_flags, tile_item, tail_flags, input, flag_op_type(), storage); ... } 
 - Template Parameters:
- ItemsPerThread – - [inferred] the number of items to be processed by each thread. 
- Flag – - [inferred] the flag type. 
- FlagOp – - [inferred] type of binary function used for flagging. 
 
- Parameters:
- head_flags – [out] - array that contains the head flags. 
- tile_predecessor_item – [in] - first tile item from thread to be compared against. 
- tail_flags – [out] - array that contains the tail flags. 
- input – [in] - array that data is loaded from. 
- flag_op – [in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: - bool f(const T &a, const T &b);or- bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
- storage – [in] - reference to a temporary storage object of type storage_type. 
 
 
 - 
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
 __device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag (&tail_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op)#
- This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function. 
 - 
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
 __device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag (&tail_flags)[ItemsPerThread], T tile_successor_item, const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)#
- Tags both - head_flagsand- tail_flagsthat indicate discontinuities between items partitioned across the thread block, where the first and last items of the first and last thread is compared against a- tile_predecessor_itemand a- tile_successor_item.- Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Example.
- __global__ void example_kernel(...) { // specialize discontinuity for int and a block of 128 threads using block_discontinuity_int = rocprim::block_discontinuity<int, 128>; // allocate storage in shared memory __shared__ block_discontinuity_int::storage_type storage; // segment of consecutive items to be used int input[8]; int tile_predecessor_item = 0; int tile_successor_item = 0; if (threadIdx.x == 0) { tile_predecessor_item = ... tile_successor_item = ... } ... int head_flags[8]; int tail_flags[8]; block_discontinuity_int b_discontinuity; using flag_op_type = typename rocprim::greater<int>; b_discontinuity.flag_heads_and_tails(head_flags, tile_predecessor_item, tail_flags, tile_successor_item, input, flag_op_type(), storage); ... } 
 - Template Parameters:
- ItemsPerThread – - [inferred] the number of items to be processed by each thread. 
- Flag – - [inferred] the flag type. 
- FlagOp – - [inferred] type of binary function used for flagging. 
 
- Parameters:
- head_flags – [out] - array that contains the head flags. 
- tile_predecessor_item – [in] - first tile item from thread to be compared against. 
- tail_flags – [out] - array that contains the tail flags. 
- tile_successor_item – [in] - last tile item from thread to be compared against. 
- input – [in] - array that data is loaded from. 
- flag_op – [in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: - bool f(const T &a, const T &b);or- bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
- storage – [in] - reference to a temporary storage object of type storage_type. 
 
 
 - 
template<unsigned int ItemsPerThread, class Flag, class FlagOp>
 __device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag (&tail_flags)[ItemsPerThread], T tile_successor_item, const T (&input)[ItemsPerThread], FlagOp flag_op)#
- This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.