Scan#
Configuring the kernel#
scan#
-
template<unsigned int BlockSize, unsigned int ItemsPerThread, bool UseLookback, ::rocprim::block_load_method BlockLoadMethod, ::rocprim::block_store_method BlockStoreMethod, ::rocprim::block_scan_algorithm BlockScanMethod, unsigned int SizeLimit = std::numeric_limits<unsigned int>::max()>
struct scan_config# Configuration of device-level scan primitives.
- Template Parameters:
BlockSize – - number of threads in a block.
ItemsPerThread – - number of items processed by each thread.
UseLookback – - whether to use lookback scan or reduce-then-scan algorithm.
BlockLoadMethod – - method for loading input values.
StoreLoadMethod – - method for storing values.
BlockScanMethod – - algorithm for block scan.
SizeLimit – - limit on the number of items for a single scan kernel launch.
scan_by_key#
Warning
doxygenstruct: Cannot find class “rocprim::scan_by_key_config” in doxygen xml output for project “rocPRIM Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-rocprim/checkouts/docs-5.1.3/docs/.doxygen/docBin/xml
scan#
inclusive#
-
template<class Config = default_config, class InputIterator, class OutputIterator, class BinaryFunction = ::rocprim::plus<typename std::iterator_traits<InputIterator>::value_type>>
inline hipError_t rocprim::inclusive_scan(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, const size_t size, BinaryFunction scan_op = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)# Parallel inclusive scan primitive for device level.
inclusive_scan function performs a device-wide inclusive prefix scan operation using binary
scan_op
operator.- Overview
Supports non-commutative scan operators. However, a scan operator should be associative. When used with non-associative functions the results may be non-deterministic and/or vary in precision.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Ranges specified by
input
andoutput
must have at leastsize
elements.
- Example
In this example a device-level inclusive sum operation is performed on an array of integer values (
short
s are scanned intoint
s).#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 short * input; // e.g., [1, 2, 3, 4, 5, 6, 7, 8] int * output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::inclusive_scan( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, rocprim::plus<int>() ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform scan rocprim::inclusive_scan( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, rocprim::plus<int>() ); // output: [1, 3, 6, 10, 15, 21, 28, 36]
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
scan_config
or a custom class with the same members.InputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
OutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
BinaryFunction – - type of binary function used for scan. Default type is
rocprim::plus<T>
, whereT
is avalue_type
ofInputIterator
.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the scan operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.input – [in] - iterator to the first element in the range to scan.
output – [out] - iterator to the first element in the output range. It can be same as
input
.size – [in] - number of element in the input range.
scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following:
T f(const T &a, const T &b);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it. Default is BinaryFunction().stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful scan; otherwise a HIP runtime error of typehipError_t
.
exclusive#
-
template<class Config = default_config, class InputIterator, class OutputIterator, class InitValueType, class BinaryFunction = ::rocprim::plus<typename std::iterator_traits<InputIterator>::value_type>>
inline hipError_t rocprim::exclusive_scan(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, const InitValueType initial_value, const size_t size, BinaryFunction scan_op = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)# Parallel exclusive scan primitive for device level.
exclusive_scan function performs a device-wide exclusive prefix scan operation using binary
scan_op
operator.- Overview
Supports non-commutative scan operators. However, a scan operator should be associative. When used with non-associative functions the results may be non-deterministic and/or vary in precision.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Ranges specified by
input
andoutput
must have at leastsize
elements.
- Example
In this example a device-level exclusive min-scan operation is performed on an array of integer values (
short
s are scanned intoint
s) using custom operator.#include <rocprim/rocprim.hpp> // custom scan function auto min_op = [] __device__ (int a, int b) -> int { return a < b ? a : b; }; // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 short * input; // e.g., [4, 7, 6, 2, 5, 1, 3, 8] int * output; // empty array of 8 elements int start_value; // e.g., 9 size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::exclusive_scan( temporary_storage_ptr, temporary_storage_size_bytes, input, output, start_value, input_size, min_op ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform scan rocprim::exclusive_scan( temporary_storage_ptr, temporary_storage_size_bytes, input, output, start_value, input_size, min_op ); // output: [9, 4, 7, 6, 2, 2, 1, 1]
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
scan_config
or a custom class with the same members.InputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
OutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
InitValueType – - type of the initial value.
BinaryFunction – - type of binary function used for scan. Default type is
rocprim::plus<T>
, whereT
is avalue_type
ofInputIterator
.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the scan operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.input – [in] - iterator to the first element in the range to scan.
output – [out] - iterator to the first element in the output range. It can be same as
input
.initial_value – [in] - initial value to start the scan. A rocpim::future_value may be passed to use a value that will be later computed.
size – [in] - number of element in the input range.
scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following:
T f(const T &a, const T &b);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it. The default value isBinaryFunction()
.stream – [in] - [optional] HIP stream object. The default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. The default value is
false
.
- Returns:
hipSuccess
(0
) after successful scan; otherwise a HIP runtime error of typehipError_t
.
segmented, inclusive#
-
template<class Config = default_config, class InputIterator, class OutputIterator, class OffsetIterator, class BinaryFunction = ::rocprim::plus<typename std::iterator_traits<InputIterator>::value_type>>
inline hipError_t rocprim::segmented_inclusive_scan(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, BinaryFunction scan_op = BinaryFunction(), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel segmented inclusive scan primitive for device level.
segmented_inclusive_scan function performs a device-wide inclusive scan operation across multiple sequences from
input
using binaryscan_op
operator.- Overview
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Ranges specified by
input
andoutput
must have at leastsize
elements.Ranges specified by
begin_offsets
andend_offsets
must have at leastsegments
elements. They may use the same sequenceoffsets
of at leastsegments + 1
elements:offsets
forbegin_offsets
andoffsets + 1
forend_offsets
.
- Example
In this example a device-level segmented inclusive min-scan operation is performed on an array of integer values (
short
s are scanned intoint
s) using custom operator.#include <rocprim/rocprim.hpp> // custom scan function auto min_op = [] __device__ (int a, int b) -> int { return a < b ? a : b; }; // Prepare input and output (declare pointers, allocate device memory etc.) short * input; // e.g., [4, 7, 6, 2, 5, 1, 3, 8] int * output; // empty array of 8 elements size_t segments; // e.g., 3 int * offsets; // e.g. [0, 2, 4, 8] size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_inclusive_scan( temporary_storage_ptr, temporary_storage_size_bytes, input, output, segments, offsets, offsets + 1, min_op ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform scan rocprim::inclusive_scan( temporary_storage_ptr, temporary_storage_size_bytes, input, output, segments, offsets, offsets + 1, min_op ); // output: [4, 4, 6, 2, 5, 1, 1, 1]
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
scan_config
or a custom class with the same members.InputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ RandomAccessIterator concept. It can be a simple pointer type.
OutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ RandomAccessIterator concept. It can be a simple pointer type.
OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ RandomAccessIterator concept. It can be a simple pointer type.
BinaryFunction – - type of binary function used for scan operation. Default type is
rocprim::plus<T>
, whereT
is avalue_type
ofInputIterator
.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the scan operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.input – [in] - iterator to the first element in the range to scan.
output – [out] - iterator to the first element in the output range.
segments – [in] - number of segments in the input range.
begin_offsets – [in] - iterator to the first element in the range of beginning offsets.
end_offsets – [in] - iterator to the first element in the range of ending offsets.
scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following:
T f(const T &a, const T &b);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it. The default value isBinaryFunction()
.stream – [in] - [optional] HIP stream object. The default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. The default value is
false
.
- Returns:
hipSuccess
(0
) after successful scan; otherwise a HIP runtime error of typehipError_t
.
segmented, exclusive#
-
template<class Config = default_config, class InputIterator, class OutputIterator, class OffsetIterator, class InitValueType, class BinaryFunction = ::rocprim::plus<typename std::iterator_traits<InputIterator>::value_type>>
inline hipError_t rocprim::segmented_exclusive_scan(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, const InitValueType initial_value, BinaryFunction scan_op = BinaryFunction(), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel segmented exclusive scan primitive for device level.
segmented_exclusive_scan function performs a device-wide exclusive scan operation across multiple sequences from
input
using binaryscan_op
operator.- Overview
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Ranges specified by
input
andoutput
must have at leastsize
elements.Ranges specified by
begin_offsets
andend_offsets
must have at leastsegments
elements. They may use the same sequenceoffsets
of at leastsegments + 1
elements:offsets
forbegin_offsets
andoffsets + 1
forend_offsets
.
- Example
In this example a device-level segmented exclusive min-scan operation is performed on an array of integer values (
short
s are scanned intoint
s) using custom operator.#include <rocprim/rocprim.hpp> // custom scan function auto min_op = [] __device__ (int a, int b) -> int { return a < b ? a : b; }; // Prepare input and output (declare pointers, allocate device memory etc.) int start_value; // e.g., 9 short * input; // e.g., [4, 7, 6, 2, 5, 1, 3, 8] int * output; // empty array of 8 elements size_t segments; // e.g., 3 int * offsets; // e.g. [0, 2, 4, 8] size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_exclusive_scan( temporary_storage_ptr, temporary_storage_size_bytes, input, output, segments, offsets, offsets + 1 start_value, min_op ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform scan rocprim::exclusive_scan( temporary_storage_ptr, temporary_storage_size_bytes, input, output, segments, offsets, offsets + 1 start_value, min_op ); // output: [9, 4, 9, 6, 9, 5, 1, 1]
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
scan_config
or a custom class with the same members.InputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ RandomAccessIterator concept. It can be a simple pointer type.
OutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ RandomAccessIterator concept. It can be a simple pointer type.
OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ RandomAccessIterator concept. It can be a simple pointer type.
InitValueType – - type of the initial value.
BinaryFunction – - type of binary function used for scan operation. Default type is
rocprim::plus<T>
, whereT
is avalue_type
ofInputIterator
.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the scan operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.input – [in] - iterator to the first element in the range to scan.
output – [out] - iterator to the first element in the output range.
segments – [in] - number of segments in the input range.
begin_offsets – [in] - iterator to the first element in the range of beginning offsets.
end_offsets – [in] - iterator to the first element in the range of ending offsets.
initial_value – [in] - initial value to start the scan.
scan_op – [in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following:
T f(const T &a, const T &b);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it. The default value isBinaryFunction()
.stream – [in] - [optional] HIP stream object. The default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. The default value is
false
.
- Returns:
hipSuccess
(0
) after successful scan; otherwise a HIP runtime error of typehipError_t
.
scan_by_key#
inclusive#
Warning
doxygenfunction: Unable to resolve function “rocprim::inclusive_scan_by_key” with arguments (void*const, size_t&, const KeysInputIterator, const ValuesInputIterator, const ValuesOutputIterator, const size_t, const BinaryFunction, const KeyCompareFunction, const hipStream_t, const bool) in doxygen xml output for project “rocPRIM Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-rocprim/checkouts/docs-5.1.3/docs/.doxygen/docBin/xml. Potential matches:
- template<class Config = default_config, class KeysInputIterator, class ValuesInputIterator, class ValuesOutputIterator, class BinaryFunction = ::rocprim::plus<typename std::iterator_traits<ValuesInputIterator>::value_type>, class KeyCompareFunction = ::rocprim::equal_to<typename std::iterator_traits<KeysInputIterator>::value_type>> hipError_t inclusive_scan_by_key(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, ValuesInputIterator values_input, ValuesOutputIterator values_output, const size_t size, BinaryFunction scan_op = BinaryFunction(), KeyCompareFunction key_compare_op = KeyCompareFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)
exclusive#
Warning
doxygenfunction: Unable to resolve function “rocprim::exclusive_scan_by_key” with arguments (void*const, size_t&, const KeysInputIterator, const ValuesInputIterator, const ValuesOutputIterator, const InitialValueType, const size_t, const BinaryFunction, const KeyCompareFunction, const hipStream_t, const bool) in doxygen xml output for project “rocPRIM Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-rocprim/checkouts/docs-5.1.3/docs/.doxygen/docBin/xml. Potential matches:
- template<class Config = default_config, class KeysInputIterator, class ValuesInputIterator, class ValuesOutputIterator, class InitialValueType, class BinaryFunction = ::rocprim::plus<typename std::iterator_traits<ValuesInputIterator>::value_type>, class KeyCompareFunction = ::rocprim::equal_to<typename std::iterator_traits<KeysInputIterator>::value_type>> hipError_t exclusive_scan_by_key(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, ValuesInputIterator values_input, ValuesOutputIterator values_output, const InitialValueType initial_value, const size_t size, BinaryFunction scan_op = BinaryFunction(), KeyCompareFunction key_compare_op = KeyCompareFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)