Discontinuity#

template<class T, unsigned int BlockSizeX, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
class block_discontinuity#

The block_discontinuity class is a block level parallel primitive which provides methods for flagging items that are discontinued within an ordered set of items across threads in a block.

Overview

  • There are two types of flags:

    • Head flags.

    • Tail flags.

  • The above flags are used to differentiate items from their predecessors or successors.

  • E.g. Head flags are convenient for differentiating disjoint data segments as part of a segmented reduction/scan.

Examples

In the examples discontinuity operation is performed on block of 128 threads, using type int.

__global__ void example_kernel(...)
{
    // specialize discontinuity for int and a block of 128 threads
    using block_discontinuity_int = rocprim::block_discontinuity<int, 128>;
    // allocate storage in shared memory
    __shared__ block_discontinuity_int::storage_type storage;

    // segment of consecutive items to be used
    int input[8];
    ...
    int head_flags[8];
    block_discontinuity_int b_discontinuity;
    using flag_op_type = typename rocprim::greater<int>;
    b_discontinuity.flag_heads(head_flags, input, flag_op_type(), storage);
    ...
}

Template Parameters:
  • T – - the input type.

  • BlockSize – - the number of threads in a block.

Public Types

using storage_type = storage_type_#

Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive.

Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords __shared__. It can be aliased to an externally allocated memory, or be a part of a union type with other storage types to increase shared memory reusability.

Public Functions

template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads(Flag (&head_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)#

Tags head_flags that indicate discontinuities between items partitioned across the thread block, where the first item has no reference and is always flagged.

Storage reusage

Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().

Example.

__global__ void example_kernel(...)
{
    // specialize discontinuity for int and a block of 128 threads
    using block_discontinuity_int = rocprim::block_discontinuity<int, 128>;
    // allocate storage in shared memory
    __shared__ block_discontinuity_int::storage_type storage;

    // segment of consecutive items to be used
    int input[8];
    ...
    int head_flags[8];
    block_discontinuity_int b_discontinuity;
    using flag_op_type = typename rocprim::greater<int>;
    b_discontinuity.flag_heads(head_flags, input, flag_op_type(), storage);
    ...
}

Template Parameters:
  • ItemsPerThread – - [inferred] the number of items to be processed by each thread.

  • Flag – - [inferred] the flag type.

  • FlagOp – - [inferred] type of binary function used for flagging.

Parameters:
  • head_flags[out] - array that contains the head flags.

  • input[in] - array that data is loaded from.

  • flag_op[in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.

  • storage[in] - reference to a temporary storage object of type storage_type.

template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads(Flag (&head_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op)#

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not take a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.

template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads(Flag (&head_flags)[ItemsPerThread], T tile_predecessor_item, const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)#

Tags head_flags that indicate discontinuities between items partitioned across the thread block, where the first item of the first thread is compared against a tile_predecessor_item.

Storage reusage

Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().

Example.

__global__ void example_kernel(...)
{
    // specialize discontinuity for int and a block of 128 threads
    using block_discontinuity_int = rocprim::block_discontinuity<int, 128>;
    // allocate storage in shared memory
    __shared__ block_discontinuity_int::storage_type storage;

    // segment of consecutive items to be used
    int input[8];
    int tile_item = 0;
    if (threadIdx.x == 0)
    {
        tile_item = ...
    }
    ...
    int head_flags[8];
    block_discontinuity_int b_discontinuity;
    using flag_op_type = typename rocprim::greater<int>;
    b_discontinuity.flag_heads(head_flags, tile_item, input, flag_op_type(),
                               storage);
    ...
}

Template Parameters:
  • ItemsPerThread – - [inferred] the number of items to be processed by each thread.

  • Flag – - [inferred] the flag type.

  • FlagOp – - [inferred] type of binary function used for flagging.

Parameters:
  • head_flags[out] - array that contains the head flags.

  • tile_predecessor_item[in] - first tile item from thread to be compared against.

  • input[in] - array that data is loaded from.

  • flag_op[in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.

  • storage[in] - reference to a temporary storage object of type storage_type.

template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads(Flag (&head_flags)[ItemsPerThread], T tile_predecessor_item, const T (&input)[ItemsPerThread], FlagOp flag_op)#

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.

template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_tails(Flag (&tail_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)#

Tags tail_flags that indicate discontinuities between items partitioned across the thread block, where the last item has no reference and is always flagged.

Storage reusage

Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().

Example.

__global__ void example_kernel(...)
{
    // specialize discontinuity for int and a block of 128 threads
    using block_discontinuity_int = rocprim::block_discontinuity<int, 128>;
    // allocate storage in shared memory
    __shared__ block_discontinuity_int::storage_type storage;

    // segment of consecutive items to be used
    int input[8];
    ...
    int tail_flags[8];
    block_discontinuity_int b_discontinuity;
    using flag_op_type = typename rocprim::greater<int>;
    b_discontinuity.flag_tails(tail_flags, input, flag_op_type(), storage);
    ...
}

Template Parameters:
  • ItemsPerThread – - [inferred] the number of items to be processed by each thread.

  • Flag – - [inferred] the flag type.

  • FlagOp – - [inferred] type of binary function used for flagging.

Parameters:
  • tail_flags[out] - array that contains the tail flags.

  • input[in] - array that data is loaded from.

  • flag_op[in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.

  • storage[in] - reference to a temporary storage object of type storage_type.

template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_tails(Flag (&tail_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op)#

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.

template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_tails(Flag (&tail_flags)[ItemsPerThread], T tile_successor_item, const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)#

Tags tail_flags that indicate discontinuities between items partitioned across the thread block, where the last item of the last thread is compared against a tile_successor_item.

Storage reusage

Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().

Example.

__global__ void example_kernel(...)
{
    // specialize discontinuity for int and a block of 128 threads
    using block_discontinuity_int = rocprim::block_discontinuity<int, 128>;
    // allocate storage in shared memory
    __shared__ block_discontinuity_int::storage_type storage;

    // segment of consecutive items to be used
    int input[8];
    int tile_item = 0;
    if (threadIdx.x == 0)
    {
        tile_item = ...
    }
    ...
    int tail_flags[8];
    block_discontinuity_int b_discontinuity;
    using flag_op_type = typename rocprim::greater<int>;
    b_discontinuity.flag_tails(tail_flags, tile_item, input, flag_op_type(),
                               storage);
    ...
}

Template Parameters:
  • ItemsPerThread – - [inferred] the number of items to be processed by each thread.

  • Flag – - [inferred] the flag type.

  • FlagOp – - [inferred] type of binary function used for flagging.

Parameters:
  • tail_flags[out] - array that contains the tail flags.

  • tile_successor_item[in] - last tile item from thread to be compared against.

  • input[in] - array that data is loaded from.

  • flag_op[in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.

  • storage[in] - reference to a temporary storage object of type storage_type.

template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_tails(Flag (&tail_flags)[ItemsPerThread], T tile_successor_item, const T (&input)[ItemsPerThread], FlagOp flag_op)#

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.

template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], Flag (&tail_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)#

Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block.

Storage reusage

Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().

Example.

__global__ void example_kernel(...)
{
    // specialize discontinuity for int and a block of 128 threads
    using block_discontinuity_int = rocprim::block_discontinuity<int, 128>;
    // allocate storage in shared memory
    __shared__ block_discontinuity_int::storage_type storage;

    // segment of consecutive items to be used
    int input[8];
    ...
    int head_flags[8];
    int tail_flags[8];
    block_discontinuity_int b_discontinuity;
    using flag_op_type = typename rocprim::greater<int>;
    b_discontinuity.flag_heads_and_tails(head_flags, tail_flags, input,
                                         flag_op_type(), storage);
    ...
}

Template Parameters:
  • ItemsPerThread – - [inferred] the number of items to be processed by each thread.

  • Flag – - [inferred] the flag type.

  • FlagOp – - [inferred] type of binary function used for flagging.

Parameters:
  • head_flags[out] - array that contains the head flags.

  • tail_flags[out] - array that contains the tail flags.

  • input[in] - array that data is loaded from.

  • flag_op[in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.

  • storage[in] - reference to a temporary storage object of type storage_type.

template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], Flag (&tail_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op)#

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.

template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], Flag (&tail_flags)[ItemsPerThread], T tile_successor_item, const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)#

Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block, where the last item of the last thread is compared against a tile_successor_item.

Storage reusage

Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().

Example.

__global__ void example_kernel(...)
{
    // specialize discontinuity for int and a block of 128 threads
    using block_discontinuity_int = rocprim::block_discontinuity<int, 128>;
    // allocate storage in shared memory
    __shared__ block_discontinuity_int::storage_type storage;

    // segment of consecutive items to be used
    int input[8];
    int tile_item = 0;
    if (threadIdx.x == 0)
    {
        tile_item = ...
    }
    ...
    int head_flags[8];
    int tail_flags[8];
    block_discontinuity_int b_discontinuity;
    using flag_op_type = typename rocprim::greater<int>;
    b_discontinuity.flag_heads_and_tails(head_flags, tail_flags, tile_item,
                                         input, flag_op_type(),
                                         storage);
    ...
}

Template Parameters:
  • ItemsPerThread – - [inferred] the number of items to be processed by each thread.

  • Flag – - [inferred] the flag type.

  • FlagOp – - [inferred] type of binary function used for flagging.

Parameters:
  • head_flags[out] - array that contains the head flags.

  • tail_flags[out] - array that contains the tail flags.

  • tile_successor_item[in] - last tile item from thread to be compared against.

  • input[in] - array that data is loaded from.

  • flag_op[in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.

  • storage[in] - reference to a temporary storage object of type storage_type.

template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], Flag (&tail_flags)[ItemsPerThread], T tile_successor_item, const T (&input)[ItemsPerThread], FlagOp flag_op)#

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.

template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag (&tail_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)#

Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block, where the first item of the first thread is compared against a tile_predecessor_item.

Storage reusage

Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().

Example.

__global__ void example_kernel(...)
{
    // specialize discontinuity for int and a block of 128 threads
    using block_discontinuity_int = rocprim::block_discontinuity<int, 128>;
    // allocate storage in shared memory
    __shared__ block_discontinuity_int::storage_type storage;

    // segment of consecutive items to be used
    int input[8];
    int tile_item = 0;
    if (threadIdx.x == 0)
    {
        tile_item = ...
    }
    ...
    int head_flags[8];
    int tail_flags[8];
    block_discontinuity_int b_discontinuity;
    using flag_op_type = typename rocprim::greater<int>;
    b_discontinuity.flag_heads_and_tails(head_flags, tile_item, tail_flags,
                                         input, flag_op_type(),
                                         storage);
    ...
}

Template Parameters:
  • ItemsPerThread – - [inferred] the number of items to be processed by each thread.

  • Flag – - [inferred] the flag type.

  • FlagOp – - [inferred] type of binary function used for flagging.

Parameters:
  • head_flags[out] - array that contains the head flags.

  • tile_predecessor_item[in] - first tile item from thread to be compared against.

  • tail_flags[out] - array that contains the tail flags.

  • input[in] - array that data is loaded from.

  • flag_op[in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.

  • storage[in] - reference to a temporary storage object of type storage_type.

template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag (&tail_flags)[ItemsPerThread], const T (&input)[ItemsPerThread], FlagOp flag_op)#

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.

template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag (&tail_flags)[ItemsPerThread], T tile_successor_item, const T (&input)[ItemsPerThread], FlagOp flag_op, storage_type &storage)#

Tags both head_flags andtail_flags that indicate discontinuities between items partitioned across the thread block, where the first and last items of the first and last thread is compared against a tile_predecessor_item and a tile_successor_item.

Storage reusage

Synchronization barrier should be placed before storage is reused or repurposed: __syncthreads() or rocprim::syncthreads().

Example.

__global__ void example_kernel(...)
{
    // specialize discontinuity for int and a block of 128 threads
    using block_discontinuity_int = rocprim::block_discontinuity<int, 128>;
    // allocate storage in shared memory
    __shared__ block_discontinuity_int::storage_type storage;

    // segment of consecutive items to be used
    int input[8];
    int tile_predecessor_item = 0;
    int tile_successor_item = 0;
    if (threadIdx.x == 0)
    {
        tile_predecessor_item = ...
        tile_successor_item = ...
    }
    ...
    int head_flags[8];
    int tail_flags[8];
    block_discontinuity_int b_discontinuity;
    using flag_op_type = typename rocprim::greater<int>;
    b_discontinuity.flag_heads_and_tails(head_flags, tile_predecessor_item,
                                         tail_flags, tile_successor_item,
                                         input, flag_op_type(),
                                         storage);
    ...
}

Template Parameters:
  • ItemsPerThread – - [inferred] the number of items to be processed by each thread.

  • Flag – - [inferred] the flag type.

  • FlagOp – - [inferred] type of binary function used for flagging.

Parameters:
  • head_flags[out] - array that contains the head flags.

  • tile_predecessor_item[in] - first tile item from thread to be compared against.

  • tail_flags[out] - array that contains the tail flags.

  • tile_successor_item[in] - last tile item from thread to be compared against.

  • input[in] - array that data is loaded from.

  • flag_op[in] - binary operation function object that will be used for flagging. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b); or bool (const T& a, const T& b, unsigned int b_index);. The signature does not need to have const &, but function object must not modify the objects passed to it.

  • storage[in] - reference to a temporary storage object of type storage_type.

template<unsigned int ItemsPerThread, class Flag, class FlagOp>
__device__ inline void flag_heads_and_tails(Flag (&head_flags)[ItemsPerThread], T tile_predecessor_item, Flag (&tail_flags)[ItemsPerThread], T tile_successor_item, const T (&input)[ItemsPerThread], FlagOp flag_op)#

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. This overload does not accept a reference to temporary storage, instead it is declared as part of the function itself. Note that this does NOT decrease the shared memory requirements of a kernel using this function.