Scan#
Class#
- 
template<class T, unsigned int BlockSizeX, block_scan_algorithm Algorithm = block_scan_algorithm::default_algorithm, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
 class block_scan#
- The block_scan class is a block level parallel primitive which provides methods for performing inclusive and exclusive scan operations of items partitioned across threads in a block. - Overview
- Supports non-commutative scan operators. However, a scan operator should be associative. When used with non-associative functions the results may be non-deterministic and/or vary in precision. 
- Computation can more efficient when: - ItemsPerThreadis greater than one,
- Tis an arithmetic type,
- scan operation is simple addition operator, and 
- the number of threads in the block is a multiple of the hardware warp size (see rocprim::device_warp_size()). 
 
- block_scan has two alternative implementations: - block_scan_algorithm::using_warp_scanand block_scan_algorithm::reduce_then_scan.
 
- Examples
- In the examples scan operation is performed on block of 192 threads, each provides one - intvalue, result is returned using the same variable as for input.- __global__ void example_kernel(...) { // specialize warp_scan for int and logical warp of 192 threads using block_scan_int = rocprim::block_scan<int, 192>; // allocate storage in shared memory __shared__ block_scan_int::storage_type storage; int value = ...; // execute inclusive scan block_scan_int().inclusive_scan( value, // input value, // output storage ); ... } 
 - Template Parameters:
- T – - the input/output type. 
- BlockSizeX – - the number of threads in a block’s x dimension. 
- Algorithm – - selected scan algorithm, block_scan_algorithm::default_algorithm by default. 
- BlockSizeY – - the number of threads in a block’s y dimension, defaults to 1. 
- BlockSizeZ – - the number of threads in a block’s z dimension, defaults to 1. 
 
 - Public Types - 
using storage_type = typename base_type::storage_type#
- Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. - Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords - . It can be aliased to an externally allocated memory, or be a part of a union type with other storage types to increase shared memory reusability.
 - Public Functions - 
template<class BinaryFunction = ::rocprim::plus<T>>
 __device__ inline void inclusive_scan(T input, T &output, storage_type &storage, BinaryFunction scan_op = BinaryFunction())#
- Performs inclusive scan across threads in a block. - Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Examples
- The examples present inclusive min scan operations performed on a block of 256 threads, each provides one - floatvalue.- __global__ void example_kernel(...) // blockDim.x = 256 { // specialize block_scan for float and block of 256 threads using block_scan_f = rocprim::block_scan<float, 256>; // allocate storage in shared memory for the block __shared__ block_scan_float::storage_type storage; float input = ...; float output; // execute inclusive min scan block_scan_float().inclusive_scan( input, output, storage, rocprim::minimum<float>() ); ... } - If the - inputvalues across threads in a block are- {1, -2, 3, -4, ..., 255, -256}, then- outputvalues in will be- {1, -2, -2, -4, ..., -254, -256}.
 - Template Parameters:
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
- Parameters:
- input – [in] - thread input value. 
- output – [out] - reference to a thread output value. May be aliased with - input.
- storage – [in] - reference to a temporary storage object of type storage_type. 
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<class BinaryFunction = ::rocprim::plus<T>>
 __device__ inline void inclusive_scan(T input, T &output, BinaryFunction scan_op = BinaryFunction())#
- Performs inclusive scan across threads in a block. - This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. - This overload does not accept storage argument. Required shared memory is allocated by the method itself. 
 - Template Parameters:
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
- Parameters:
- input – [in] - thread input value. 
- output – [out] - reference to a thread output value. May be aliased with - input.
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<class BinaryFunction = ::rocprim::plus<T>>
 __device__ inline void inclusive_scan(T input, T &output, T &reduction, storage_type &storage, BinaryFunction scan_op = BinaryFunction())#
- Performs inclusive scan and reduction across threads in a block. - Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Examples
- The examples present inclusive min scan operations performed on a block of 256 threads, each provides one - floatvalue.- __global__ void example_kernel(...) // blockDim.x = 256 { // specialize block_scan for float and block of 256 threads using block_scan_f = rocprim::block_scan<float, 256>; // allocate storage in shared memory for the block __shared__ block_scan_float::storage_type storage; float input = ...; float output; float reduction; // execute inclusive min scan block_scan_float().inclusive_scan( input, output, reduction, storage, rocprim::minimum<float>() ); ... } - If the - inputvalues across threads in a block are- {1, -2, 3, -4, ..., 255, -256}, then- outputvalues in will be- {1, -2, -2, -4, ..., -254, -256}, and the- reductionwill be- -256.
 - Template Parameters:
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
- Parameters:
- input – [in] - thread input value. 
- output – [out] - reference to a thread output value. May be aliased with - input.
- reduction – [out] - result of reducing of all - inputvalues in a block.
- storage – [in] - reference to a temporary storage object of type storage_type. 
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<class BinaryFunction = ::rocprim::plus<T>>
 __device__ inline void inclusive_scan(T input, T &output, T &reduction, BinaryFunction scan_op = BinaryFunction())#
- Performs inclusive scan and reduction across threads in a block. - This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. - This overload does not accept storage argument. Required shared memory is allocated by the method itself. 
 - Template Parameters:
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
- Parameters:
- input – [in] - thread input value. 
- output – [out] - reference to a thread output value. May be aliased with - input.
- reduction – [out] - result of reducing of all - inputvalues in a block.
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<class PrefixCallback, class BinaryFunction = ::rocprim::plus<T>>
 __device__ inline void inclusive_scan(T input, T &output, storage_type &storage, PrefixCallback &prefix_callback_op, BinaryFunction scan_op)#
- Performs inclusive scan across threads in a block, and uses - prefix_callback_opto generate prefix value for the whole block.- Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Examples
- The examples present inclusive prefix sum operations performed on a block of 256 threads, each thread provides one - intvalue.- struct my_block_prefix { int prefix; __device__ my_block_prefix(int prefix) : prefix(prefix) {} __device__ int operator()(int block_reduction) { int old_prefix = prefix; prefix = prefix + block_reduction; return old_prefix; } }; __global__ void example_kernel(...) // blockDim.x = 256 { // specialize block_scan for int and block of 256 threads using block_scan_f = rocprim::block_scan<int, 256>; // allocate storage in shared memory for the block __shared__ block_scan_int::storage_type storage; // init prefix functor my_block_prefix prefix_callback(10); int input; int output; // execute inclusive prefix sum block_scan_int().inclusive_scan( input, output, storage, prefix_callback, rocprim::plus<int>() ); ... } - If the - inputvalues across threads in a block are- {1, 1, 1, ..., 1}, then- outputvalues in will be- {11, 12, 13, ..., 266}, and the- prefixwill be- 266.
 - Template Parameters:
- PrefixCallback – - type of the unary function object used for generating block-wide prefix value for the scan operation. 
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
 
- Parameters:
- input – [in] - thread input value. 
- output – [out] - reference to a thread output value. May be aliased with - input.
- storage – [in] - reference to a temporary storage object of type storage_type. 
- prefix_callback_op – [inout] - function object for generating block prefix value. The signature of the - prefix_callback_opshould be equivalent to the following:- T f(const T &block_reduction);. The signature does not need to have- const &, but function object must not modify the objects passed to it. The object will be called by the first warp of the block with block reduction of- inputvalues as input argument. The result of the first thread will be used as the block-wide prefix.
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<unsigned int ItemsPerThread, class BinaryFunction = ::rocprim::plus<T>>
 __device__ inline void inclusive_scan(T (&input)[ItemsPerThread], T (&output)[ItemsPerThread], storage_type &storage, BinaryFunction scan_op = BinaryFunction())#
- Performs inclusive scan across threads in a block. - Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Examples
- The examples present inclusive maximum scan operations performed on a block of 128 threads, each provides two - longvalue.- __global__ void example_kernel(...) // blockDim.x = 128 { // specialize block_scan for long and block of 128 threads using block_scan_f = rocprim::block_scan<long, 128>; // allocate storage in shared memory for the block __shared__ block_scan_long::storage_type storage; long input[2] = ...; long output[2]; // execute inclusive min scan block_scan_long().inclusive_scan( input, output, storage, rocprim::maximum<long>() ); ... } - If the - inputvalues across threads in a block are- {-1, 2, -3, 4, ..., -255, 256}, then- outputvalues in will be- {-1, 2, 2, 4, ..., 254, 256}.
 - Template Parameters:
- ItemsPerThread – - number of items in the - inputarray.
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
 
- Parameters:
- input – [in] - reference to an array containing thread input values. 
- output – [out] - reference to a thread output array. May be aliased with - input.
- storage – [in] - reference to a temporary storage object of type storage_type. 
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<unsigned int ItemsPerThread, class BinaryFunction = ::rocprim::plus<T>>
 __device__ inline void inclusive_scan(T (&input)[ItemsPerThread], T (&output)[ItemsPerThread], BinaryFunction scan_op = BinaryFunction())#
- Performs inclusive scan across threads in a block. - This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. - This overload does not accept storage argument. Required shared memory is allocated by the method itself. 
 - Template Parameters:
- ItemsPerThread – - number of items in the - inputarray.
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
 
- Parameters:
- input – [in] - reference to an array containing thread input values. 
- output – [out] - reference to a thread output array. May be aliased with - input.
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<unsigned int ItemsPerThread, class BinaryFunction = ::rocprim::plus<T>>
 __device__ inline void inclusive_scan(T (&input)[ItemsPerThread], T (&output)[ItemsPerThread], T &reduction, storage_type &storage, BinaryFunction scan_op = BinaryFunction())#
- Performs inclusive scan and reduction across threads in a block. - Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Examples
- The examples present inclusive maximum scan operations performed on a block of 128 threads, each provides two - longvalue.- __global__ void example_kernel(...) // blockDim.x = 128 { // specialize block_scan for long and block of 128 threads using block_scan_f = rocprim::block_scan<long, 128>; // allocate storage in shared memory for the block __shared__ block_scan_long::storage_type storage; long input[2] = ...; long output[2]; long reduction; // execute inclusive min scan block_scan_long().inclusive_scan( input, output, reduction, storage, rocprim::maximum<long>() ); ... } - If the - inputvalues across threads in a block are- {-1, 2, -3, 4, ..., -255, 256}, then- outputvalues in will be- {-1, 2, 2, 4, ..., 254, 256}and the- reductionwill be- 256.
 - Template Parameters:
- ItemsPerThread – - number of items in the - inputarray.
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
 
- Parameters:
- input – [in] - reference to an array containing thread input values. 
- output – [out] - reference to a thread output array. May be aliased with - input.
- reduction – [out] - result of reducing of all - inputvalues in a block.
- storage – [in] - reference to a temporary storage object of type storage_type. 
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<unsigned int ItemsPerThread, class BinaryFunction = ::rocprim::plus<T>>
 __device__ inline void inclusive_scan(T (&input)[ItemsPerThread], T (&output)[ItemsPerThread], T &reduction, BinaryFunction scan_op = BinaryFunction())#
- Performs inclusive scan and reduction across threads in a block. - This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. - This overload does not accept storage argument. Required shared memory is allocated by the method itself. 
 - Template Parameters:
- ItemsPerThread – - number of items in the - inputarray.
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
 
- Parameters:
- input – [in] - reference to an array containing thread input values. 
- output – [out] - reference to a thread output array. May be aliased with - input.
- reduction – [out] - result of reducing of all - inputvalues in a block.
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<unsigned int ItemsPerThread, class PrefixCallback, class BinaryFunction>
 __device__ inline void inclusive_scan(T (&input)[ItemsPerThread], T (&output)[ItemsPerThread], storage_type &storage, PrefixCallback &prefix_callback_op, BinaryFunction scan_op)#
- Performs inclusive scan across threads in a block, and uses - prefix_callback_opto generate prefix value for the whole block.- Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Examples
- The examples present inclusive prefix sum operations performed on a block of 128 threads, each thread provides two - intvalue.- struct my_block_prefix { int prefix; __device__ my_block_prefix(int prefix) : prefix(prefix) {} __device__ int operator()(int block_reduction) { int old_prefix = prefix; prefix = prefix + block_reduction; return old_prefix; } }; __global__ void example_kernel(...) // blockDim.x = 128 { // specialize block_scan for int and block of 128 threads using block_scan_f = rocprim::block_scan<int, 128>; // allocate storage in shared memory for the block __shared__ block_scan_int::storage_type storage; // init prefix functor my_block_prefix prefix_callback(10); int input[2] = ...; int output[2]; // execute inclusive prefix sum block_scan_int().inclusive_scan( input, output, storage, prefix_callback, rocprim::plus<int>() ); ... } - If the - inputvalues across threads in a block are- {1, 1, 1, ..., 1}, then- outputvalues in will be- {11, 12, 13, ..., 266}, and the- prefixwill be- 266.
 - Template Parameters:
- ItemsPerThread – - number of items in the - inputarray.
- PrefixCallback – - type of the unary function object used for generating block-wide prefix value for the scan operation. 
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
 
- Parameters:
- input – [in] - reference to an array containing thread input values. 
- output – [out] - reference to a thread output array. May be aliased with - input.
- storage – [in] - reference to a temporary storage object of type storage_type. 
- prefix_callback_op – [inout] - function object for generating block prefix value. The signature of the - prefix_callback_opshould be equivalent to the following:- T f(const T &block_reduction);. The signature does not need to have- const &, but function object must not modify the objects passed to it. The object will be called by the first warp of the block with block reduction of- inputvalues as input argument. The result of the first thread will be used as the block-wide prefix.
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<class BinaryFunction = ::rocprim::plus<T>>
 __device__ inline void exclusive_scan(T input, T &output, T init, storage_type &storage, BinaryFunction scan_op = BinaryFunction())#
- Performs exclusive scan across threads in a block. - Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Examples
- The examples present exclusive min scan operations performed on a block of 256 threads, each provides one - floatvalue.- __global__ void example_kernel(...) // blockDim.x = 256 { // specialize block_scan for float and block of 256 threads using block_scan_f = rocprim::block_scan<float, 256>; // allocate storage in shared memory for the block __shared__ block_scan_float::storage_type storage; float init = ...; float input = ...; float output; // execute exclusive min scan block_scan_float().exclusive_scan( input, output, init, storage, rocprim::minimum<float>() ); ... } - If the - inputvalues across threads in a block are- {1, -2, 3, -4, ..., 255, -256}and- initis- 0, then- outputvalues in will be- {0, 0, -2, -2, -4, ..., -254, -254}.
 - Template Parameters:
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
- Parameters:
- input – [in] - thread input value. 
- output – [out] - reference to a thread output value. May be aliased with - input.
- storage – [in] - reference to a temporary storage object of type storage_type. 
- init – [in] - initial value used to start the exclusive scan. Should be the same for all threads in a block. 
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<class BinaryFunction = ::rocprim::plus<T>>
 __device__ inline void exclusive_scan(T input, T &output, T init, BinaryFunction scan_op = BinaryFunction())#
- Performs exclusive scan across threads in a block. - This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. - This overload does not accept storage argument. Required shared memory is allocated by the method itself. 
 - Template Parameters:
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
- Parameters:
- input – [in] - thread input value. 
- output – [out] - reference to a thread output value. May be aliased with - input.
- init – [in] - initial value used to start the exclusive scan. Should be the same for all threads in a block. 
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<class BinaryFunction = ::rocprim::plus<T>>
 __device__ inline void exclusive_scan(T input, T &output, T init, T &reduction, storage_type &storage, BinaryFunction scan_op = BinaryFunction())#
- Performs exclusive scan and reduction across threads in a block. - Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Examples
- The examples present exclusive min scan operations performed on a block of 256 threads, each provides one - floatvalue.- __global__ void example_kernel(...) // blockDim.x = 256 { // specialize block_scan for float and block of 256 threads using block_scan_f = rocprim::block_scan<float, 256>; // allocate storage in shared memory for the block __shared__ block_scan_float::storage_type storage; float init = 0; float input = ...; float output; float reduction; // execute exclusive min scan block_scan_float().exclusive_scan( input, output, init, reduction, storage, rocprim::minimum<float>() ); ... } - If the - inputvalues across threads in a block are- {1, -2, 3, -4, ..., 255, -256}and- initis- 0, then- outputvalues in will be- {0, 0, -2, -2, -4, ..., -254, -254}and the- reductionwill be- -256.
 - Template Parameters:
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
- Parameters:
- input – [in] - thread input value. 
- output – [out] - reference to a thread output value. May be aliased with - input.
- init – [in] - initial value used to start the exclusive scan. Should be the same for all threads in a block. 
- reduction – [out] - result of reducing of all - inputvalues in a block.
- storage – [in] - reference to a temporary storage object of type storage_type. 
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<class BinaryFunction = ::rocprim::plus<T>>
 __device__ inline void exclusive_scan(T input, T &output, T init, T &reduction, BinaryFunction scan_op = BinaryFunction())#
- Performs exclusive scan and reduction across threads in a block. - This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. - This overload does not accept storage argument. Required shared memory is allocated by the method itself. 
 - Template Parameters:
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
- Parameters:
- input – [in] - thread input value. 
- output – [out] - reference to a thread output value. May be aliased with - input.
- init – [in] - initial value used to start the exclusive scan. Should be the same for all threads in a block. 
- reduction – [out] - result of reducing of all - inputvalues in a block.
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<class PrefixCallback, class BinaryFunction = ::rocprim::plus<T>>
 __device__ inline void exclusive_scan(T input, T &output, storage_type &storage, PrefixCallback &prefix_callback_op, BinaryFunction scan_op)#
- Performs exclusive scan across threads in a block, and uses - prefix_callback_opto generate prefix value for the whole block.- Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Examples
- The examples present exclusive prefix sum operations performed on a block of 256 threads, each thread provides one - intvalue.- struct my_block_prefix { int prefix; __device__ my_block_prefix(int prefix) : prefix(prefix) {} __device__ int operator()(int block_reduction) { int old_prefix = prefix; prefix = prefix + block_reduction; return old_prefix; } }; __global__ void example_kernel(...) // blockDim.x = 256 { // specialize block_scan for int and block of 256 threads using block_scan_f = rocprim::block_scan<int, 256>; // allocate storage in shared memory for the block __shared__ block_scan_int::storage_type storage; // init prefix functor my_block_prefix prefix_callback(10); int input; int output; // execute exclusive prefix sum block_scan_int().exclusive_scan( input, output, storage, prefix_callback, rocprim::plus<int>() ); ... } - If the - inputvalues across threads in a block are- {1, 1, 1, ..., 1}, then- outputvalues in will be- {10, 11, 12, 13, ..., 265}, and the- prefixwill be- 266.
 - Template Parameters:
- PrefixCallback – - type of the unary function object used for generating block-wide prefix value for the scan operation. 
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
 
- Parameters:
- input – [in] - thread input value. 
- output – [out] - reference to a thread output value. May be aliased with - input.
- storage – [in] - reference to a temporary storage object of type storage_type. 
- prefix_callback_op – [inout] - function object for generating block prefix value. The signature of the - prefix_callback_opshould be equivalent to the following:- T f(const T &block_reduction);. The signature does not need to have- const &, but function object must not modify the objects passed to it. The object will be called by the first warp of the block with block reduction of- inputvalues as input argument. The result of the first thread will be used as the block-wide prefix.
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<unsigned int ItemsPerThread, class BinaryFunction = ::rocprim::plus<T>>
 __device__ inline void exclusive_scan(T (&input)[ItemsPerThread], T (&output)[ItemsPerThread], T init, storage_type &storage, BinaryFunction scan_op = BinaryFunction())#
- Performs exclusive scan across threads in a block. - Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Examples
- The examples present exclusive maximum scan operations performed on a block of 128 threads, each provides two - longvalue.- __global__ void example_kernel(...) // blockDim.x = 128 { // specialize block_scan for long and block of 128 threads using block_scan_f = rocprim::block_scan<long, 128>; // allocate storage in shared memory for the block __shared__ block_scan_long::storage_type storage; long init = ...; long input[2] = ...; long output[2]; // execute exclusive min scan block_scan_long().exclusive_scan( input, output, init, storage, rocprim::maximum<long>() ); ... } - If the - inputvalues across threads in a block are- {-1, 2, -3, 4, ..., -255, 256}and- initis 0, then- outputvalues in will be- {0, 0, 2, 2, 4, ..., 254, 254}.
 - Template Parameters:
- ItemsPerThread – - number of items in the - inputarray.
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
 
- Parameters:
- input – [in] - reference to an array containing thread input values. 
- output – [out] - reference to a thread output array. May be aliased with - input.
- init – [in] - initial value used to start the exclusive scan. Should be the same for all threads in a block. 
- storage – [in] - reference to a temporary storage object of type storage_type. 
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<unsigned int ItemsPerThread, class BinaryFunction = ::rocprim::plus<T>>
 __device__ inline void exclusive_scan(T (&input)[ItemsPerThread], T (&output)[ItemsPerThread], T init, BinaryFunction scan_op = BinaryFunction())#
- Performs exclusive scan across threads in a block. - This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. - This overload does not accept storage argument. Required shared memory is allocated by the method itself. 
 - Template Parameters:
- ItemsPerThread – - number of items in the - inputarray.
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
 
- Parameters:
- input – [in] - reference to an array containing thread input values. 
- output – [out] - reference to a thread output array. May be aliased with - input.
- init – [in] - initial value used to start the exclusive scan. Should be the same for all threads in a block. 
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<unsigned int ItemsPerThread, class BinaryFunction = ::rocprim::plus<T>>
 __device__ inline void exclusive_scan(T (&input)[ItemsPerThread], T (&output)[ItemsPerThread], T init, T &reduction, storage_type &storage, BinaryFunction scan_op = BinaryFunction())#
- Performs exclusive scan and reduction across threads in a block. - Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Examples
- The examples present exclusive maximum scan operations performed on a block of 128 threads, each provides two - longvalue.- __global__ void example_kernel(...) // blockDim.x = 128 { // specialize block_scan for long and block of 128 threads using block_scan_f = rocprim::block_scan<long, 128>; // allocate storage in shared memory for the block __shared__ block_scan_long::storage_type storage; long init = ...; long input[2] = ...; long output[2]; long reduction; // execute exclusive min scan block_scan_long().exclusive_scan( input, output, init, reduction, storage, rocprim::maximum<long>() ); ... } - If the - inputvalues across threads in a block are- {-1, 2, -3, 4, ..., -255, 256}and- initis 0, then- outputvalues in will be- {0, 0, 2, 2, 4, ..., 254, 254}and the- reductionwill be- 256.
 - Template Parameters:
- ItemsPerThread – - number of items in the - inputarray.
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
 
- Parameters:
- input – [in] - reference to an array containing thread input values. 
- output – [out] - reference to a thread output array. May be aliased with - input.
- init – [in] - initial value used to start the exclusive scan. Should be the same for all threads in a block. 
- reduction – [out] - result of reducing of all - inputvalues in a block.
- storage – [in] - reference to a temporary storage object of type storage_type. 
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<unsigned int ItemsPerThread, class BinaryFunction = ::rocprim::plus<T>>
 __device__ inline void exclusive_scan(T (&input)[ItemsPerThread], T (&output)[ItemsPerThread], T init, T &reduction, BinaryFunction scan_op = BinaryFunction())#
- Performs exclusive scan and reduction across threads in a block. - This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. - This overload does not accept storage argument. Required shared memory is allocated by the method itself. 
 - Template Parameters:
- ItemsPerThread – - number of items in the - inputarray.
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
 
- Parameters:
- input – [in] - reference to an array containing thread input values. 
- output – [out] - reference to a thread output array. May be aliased with - input.
- init – [in] - initial value used to start the exclusive scan. Should be the same for all threads in a block. 
- reduction – [out] - result of reducing of all - inputvalues in a block.
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 - 
template<unsigned int ItemsPerThread, class PrefixCallback, class BinaryFunction>
 __device__ inline void exclusive_scan(T (&input)[ItemsPerThread], T (&output)[ItemsPerThread], storage_type &storage, PrefixCallback &prefix_callback_op, BinaryFunction scan_op)#
- Performs exclusive scan across threads in a block, and uses - prefix_callback_opto generate prefix value for the whole block.- Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Examples
- The examples present exclusive prefix sum operations performed on a block of 128 threads, each thread provides two - intvalue.- struct my_block_prefix { int prefix; __device__ my_block_prefix(int prefix) : prefix(prefix) {} __device__ int operator()(int block_reduction) { int old_prefix = prefix; prefix = prefix + block_reduction; return old_prefix; } }; __global__ void example_kernel(...) // blockDim.x = 128 { // specialize block_scan for int and block of 128 threads using block_scan_f = rocprim::block_scan<int, 128>; // allocate storage in shared memory for the block __shared__ block_scan_int::storage_type storage; // init prefix functor my_block_prefix prefix_callback(10); int input[2] = ...; int output[2]; // execute exclusive prefix sum block_scan_int().exclusive_scan( input, output, storage, prefix_callback, rocprim::plus<int>() ); ... } - If the - inputvalues across threads in a block are- {1, 1, 1, ..., 1}, then- outputvalues in will be- {10, 11, 12, 13, ..., 265}, and the- prefixwill be- 266.
 - Template Parameters:
- ItemsPerThread – - number of items in the - inputarray.
- PrefixCallback – - type of the unary function object used for generating block-wide prefix value for the scan operation. 
- BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>. 
 
- Parameters:
- input – [in] - reference to an array containing thread input values. 
- output – [out] - reference to a thread output array. May be aliased with - input.
- storage – [in] - reference to a temporary storage object of type storage_type. 
- prefix_callback_op – [inout] - function object for generating block prefix value. The signature of the - prefix_callback_opshould be equivalent to the following:- T f(const T &block_reduction);. The signature does not need to have- const &, but function object must not modify the objects passed to it. The object will be called by the first warp of the block with block reduction of- inputvalues as input argument. The result of the first thread will be used as the block-wide prefix.
- scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: - T f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it.
 
 
 
Algorithms#
- 
enum class rocprim::block_scan_algorithm#
- Available algorithms for block_scan primitive. - Values: - 
enumerator reduce_then_scan#
- An algorithm which limits calculations to a single hardware warp. 
 - 
enumerator default_algorithm#
- Default block_scan algorithm. 
 
- 
enumerator reduce_then_scan#