Scan#

Configuring the kernel#

scan#

template<unsigned int BlockSize, unsigned int ItemsPerThread, ::rocprim::block_load_method BlockLoadMethod, ::rocprim::block_store_method BlockStoreMethod, ::rocprim::block_scan_algorithm BlockScanMethod, unsigned int SizeLimit = std::numeric_limits<unsigned int>::max()>
struct scan_config : public rocprim::detail::scan_config_params#

Configuration of device-level scan primitives.

Template Parameters:
  • BlockSize – - number of threads in a block.

  • ItemsPerThread – - number of items processed by each thread.

  • BlockLoadMethod – - method for loading input values.

  • StoreLoadMethod – - method for storing values.

  • BlockScanMethod – - algorithm for block scan.

  • SizeLimit – - limit on the number of items for a single scan kernel launch.

Subclassed by rocprim::detail::default_scan_config< arch, value_type, enable >

scan_by_key#

template<unsigned int BlockSize, unsigned int ItemsPerThread, ::rocprim::block_load_method BlockLoadMethod, ::rocprim::block_store_method BlockStoreMethod, ::rocprim::block_scan_algorithm BlockScanMethod, unsigned int SizeLimit = std::numeric_limits<unsigned int>::max()>
struct scan_by_key_config : public rocprim::detail::scan_by_key_config_params#

Configuration of device-level scan-by-key operation.

Template Parameters:
  • BlockSize – - number of threads in a block.

  • ItemsPerThread – - number of items processed by each thread.

  • BlockLoadMethod – - method for loading input values.

  • StoreLoadMethod – - method for storing values.

  • BlockScanMethod – - algorithm for block scan.

  • SizeLimit – - limit on the number of items for a single scan kernel launch.

Subclassed by rocprim::detail::default_scan_by_key_config< arch, key_type, value_type, enable >

scan#

inclusive#

template<class Config = default_config, class InputIterator, class OutputIterator, class BinaryFunction = ::rocprim::plus<typename std::iterator_traits<InputIterator>::value_type>, class AccType = typename std::iterator_traits<InputIterator>::value_type>
inline hipError_t rocprim::inclusive_scan(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, const size_t size, BinaryFunction scan_op = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel inclusive scan primitive for device level.

inclusive_scan function performs a device-wide inclusive prefix scan operation using binary scan_op operator.

Overview

  • Supports non-commutative scan operators. However, a scan operator should be associative.

  • When used with non-associative functions (e.g. floating point arithmetic operations):

    • the results may be non-deterministic and/or vary in precision,

    • and bit-wise reproducibility is not guaranteed, that is, results from multiple runs using the same input values on the same device may not be bit-wise identical.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Ranges specified by input and output must have at least size elements.

  • By default, the input type is used for accumulation. A custom type can be specified using the AccType type parameter, see the example below.

Example

In this example a device-level inclusive sum operation is performed on an array of integer values (shorts are scanned into ints).

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;    // e.g., 8
short * input;        // e.g., [1, 2, 3, 4, 5, 6, 7, 8]
int * output;         // empty array of 8 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::inclusive_scan(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size, rocprim::plus<int>()
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform scan
rocprim::inclusive_scan(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size, rocprim::plus<int>()
);
// output: [1, 3, 6, 10, 15, 21, 28, 36]

The same example as above, but now a custom accumulator type is specified.

#include <rocprim/rocprim.hpp>

size_t input_size;
short * input;
int * output;

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;

rocprim::inclusive_scan(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size, rocprim::plus<int>()
);

hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// Use type parameter to set custom accumulator type
rocprim::inclusive_scan<rocprim::default_config,
                        short*,
                        int*,
                        rocprim::plus<int>,
                        int>(temporary_storage_ptr,
                             temporary_storage_size_bytes,
                             input_iterator,
                             output,
                             input_size,
                             rocprim::plus<int>());

Template Parameters:
  • Config – - [optional] configuration of the primitive, has to be scan_config or a class derived from it.

  • InputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • OutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>, where T is a value_type of InputIterator.

  • AccType – - accumulator type used to propagate the scanned values. Default type is value type of the input iterator.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the scan operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • input[in] - iterator to the first element in the range to scan.

  • output[out] - iterator to the first element in the output range. It can be same as input.

  • size[in] - number of element in the input range.

  • scan_op[in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b);. The signature does not need to have const &, but function object must not modify the objects passed to it. Default is BinaryFunction().

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful scan; otherwise a HIP runtime error of type hipError_t.

exclusive#

template<class Config = default_config, class InputIterator, class OutputIterator, class InitValueType, class BinaryFunction = ::rocprim::plus<typename std::iterator_traits<InputIterator>::value_type>, class AccType = detail::input_type_t<InitValueType>>
inline hipError_t rocprim::exclusive_scan(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, const InitValueType initial_value, const size_t size, BinaryFunction scan_op = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel exclusive scan primitive for device level.

exclusive_scan function performs a device-wide exclusive prefix scan operation using binary scan_op operator.

Overview

  • Supports non-commutative scan operators. However, a scan operator should be associative.

  • When used with non-associative functions (e.g. floating point arithmetic operations):

    • the results may be non-deterministic and/or vary in precision,

    • and bit-wise reproducibility is not guaranteed, that is, results from multiple runs using the same input values on the same device may not be bit-wise identical.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Ranges specified by input and output must have at least size elements.

Example

In this example a device-level exclusive min-scan operation is performed on an array of integer values (shorts are scanned into ints) using custom operator.

#include <rocprim/rocprim.hpp>

// custom scan function
auto min_op =
    [] __device__ (int a, int b) -> int
    {
        return a < b ? a : b;
    };

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;    // e.g., 8
short * input;        // e.g., [4, 7, 6, 2, 5, 1, 3, 8]
int * output;         // empty array of 8 elements
int start_value;      // e.g., 9

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::exclusive_scan(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, start_value, input_size, min_op
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform scan
rocprim::exclusive_scan(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, start_value, input_size, min_op
);
// output: [9, 4, 4, 4, 2, 2, 1, 1]

Template Parameters:
  • Config – - [optional] configuration of the primitive, has to be scan_config or a class derived from it.

  • InputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • OutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • InitValueType – - type of the initial value.

  • BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>, where T is a value_type of InputIterator.

  • AccType – - accumulator type used to propagate the scanned values. Default type is ‘InitValueType’, unless it’s ‘rocprim::future_value’. Then it will be the wrapped input type.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the scan operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • input[in] - iterator to the first element in the range to scan.

  • output[out] - iterator to the first element in the output range. It can be same as input.

  • initial_value[in] - initial value to start the scan. A rocpim::future_value may be passed to use a value that will be later computed.

  • size[in] - number of element in the input range.

  • scan_op[in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b);. The signature does not need to have const &, but function object must not modify the objects passed to it. The default value is BinaryFunction().

  • stream[in] - [optional] HIP stream object. The default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. The default value is false.

Returns:

hipSuccess (0) after successful scan; otherwise a HIP runtime error of type hipError_t.

segmented, inclusive#

template<class Config = default_config, class InputIterator, class OutputIterator, class OffsetIterator, class BinaryFunction = ::rocprim::plus<typename std::iterator_traits<InputIterator>::value_type>>
inline hipError_t rocprim::segmented_inclusive_scan(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, BinaryFunction scan_op = BinaryFunction(), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel segmented inclusive scan primitive for device level.

segmented_inclusive_scan function performs a device-wide inclusive scan operation across multiple sequences from input using binary scan_op operator.

Overview

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Ranges specified by input and output must have at least size elements.

  • Ranges specified by begin_offsets and end_offsets must have at least segments elements. They may use the same sequence offsets of at least segments + 1 elements: offsets for begin_offsets and offsets + 1 for end_offsets.

Example

In this example a device-level segmented inclusive min-scan operation is performed on an array of integer values (shorts are scanned into ints) using custom operator.

#include <rocprim/rocprim.hpp>

// custom scan function
auto min_op =
    [] __device__ (int a, int b) -> int
    {
        return a < b ? a : b;
    };

// Prepare input and output (declare pointers, allocate device memory etc.)
short * input;        // e.g., [4, 7, 6, 2, 5, 1, 3, 8]
int   * output;       // empty array of 8 elements
size_t segments;      // e.g., 3
int * offsets;        // e.g. [0, 2, 4, 8]

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::segmented_inclusive_scan(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, segments, offsets, offsets + 1, min_op
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform scan
rocprim::inclusive_scan(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, segments, offsets, offsets + 1, min_op
);
// output: [4, 4, 6, 2, 5, 1, 1, 1]

Template Parameters:
  • Config – - [optional] configuration of the primitive. It has to be scan_config or a class derived from it.

  • InputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ RandomAccessIterator concept. It can be a simple pointer type.

  • OutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ RandomAccessIterator concept. It can be a simple pointer type.

  • OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ RandomAccessIterator concept. It can be a simple pointer type.

  • BinaryFunction – - type of binary function used for scan operation. Default type is rocprim::plus<T>, where T is a value_type of InputIterator.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the scan operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • input[in] - iterator to the first element in the range to scan.

  • output[out] - iterator to the first element in the output range.

  • segments[in] - number of segments in the input range.

  • begin_offsets[in] - iterator to the first element in the range of beginning offsets.

  • end_offsets[in] - iterator to the first element in the range of ending offsets.

  • scan_op[in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b);. The signature does not need to have const &, but function object must not modify the objects passed to it. The default value is BinaryFunction().

  • stream[in] - [optional] HIP stream object. The default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. The default value is false.

Returns:

hipSuccess (0) after successful scan; otherwise a HIP runtime error of type hipError_t.

segmented, exclusive#

template<class Config = default_config, class InputIterator, class OutputIterator, class OffsetIterator, class InitValueType, class BinaryFunction = ::rocprim::plus<typename std::iterator_traits<InputIterator>::value_type>>
inline hipError_t rocprim::segmented_exclusive_scan(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, const InitValueType initial_value, BinaryFunction scan_op = BinaryFunction(), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel segmented exclusive scan primitive for device level.

segmented_exclusive_scan function performs a device-wide exclusive scan operation across multiple sequences from input using binary scan_op operator.

Overview

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Ranges specified by input and output must have at least size elements.

  • Ranges specified by begin_offsets and end_offsets must have at least segments elements. They may use the same sequence offsets of at least segments + 1 elements: offsets for begin_offsets and offsets + 1 for end_offsets.

Example

In this example a device-level segmented exclusive min-scan operation is performed on an array of integer values (shorts are scanned into ints) using custom operator.

#include <rocprim/rocprim.hpp>

// custom scan function
auto min_op =
    [] __device__ (int a, int b) -> int
    {
        return a < b ? a : b;
    };

// Prepare input and output (declare pointers, allocate device memory etc.)
int start_value;      // e.g., 9
short * input;        // e.g., [4, 7, 6, 2, 5, 1, 3, 8]
int   * output;       // empty array of 8 elements
size_t segments;      // e.g., 3
int * offsets;        // e.g. [0, 2, 4, 8]

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::segmented_exclusive_scan(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, segments, offsets, offsets + 1
    start_value, min_op
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform scan
rocprim::exclusive_scan(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, segments, offsets, offsets + 1
    start_value, min_op
);
// output: [9, 4, 9, 6, 9, 5, 1, 1]

Template Parameters:
  • Config – - [optional] configuration of the primitive. It has to be scan_config or a class derived from it.

  • InputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ RandomAccessIterator concept. It can be a simple pointer type.

  • OutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ RandomAccessIterator concept. It can be a simple pointer type.

  • OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ RandomAccessIterator concept. It can be a simple pointer type.

  • InitValueType – - type of the initial value.

  • BinaryFunction – - type of binary function used for scan operation. Default type is rocprim::plus<T>, where T is a value_type of InputIterator.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the scan operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • input[in] - iterator to the first element in the range to scan.

  • output[out] - iterator to the first element in the output range.

  • segments[in] - number of segments in the input range.

  • begin_offsets[in] - iterator to the first element in the range of beginning offsets.

  • end_offsets[in] - iterator to the first element in the range of ending offsets.

  • initial_value[in] - initial value to start the scan.

  • scan_op[in] - binary operation function object that will be used for scan. The signature of the function should be equivalent to the following: T f(const T &a, const T &b);. The signature does not need to have const &, but function object must not modify the objects passed to it. The default value is BinaryFunction().

  • stream[in] - [optional] HIP stream object. The default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. The default value is false.

Returns:

hipSuccess (0) after successful scan; otherwise a HIP runtime error of type hipError_t.

scan_by_key#

inclusive#

template<typename Config = default_config, typename KeysInputIterator, typename ValuesInputIterator, typename ValuesOutputIterator, typename BinaryFunction = ::rocprim::plus<typename std::iterator_traits<ValuesInputIterator>::value_type>, typename KeyCompareFunction = ::rocprim::equal_to<typename std::iterator_traits<KeysInputIterator>::value_type>, typename AccType = typename std::iterator_traits<ValuesInputIterator>::value_type>
inline hipError_t rocprim::inclusive_scan_by_key(void *const temporary_storage, size_t &storage_size, const KeysInputIterator keys_input, const ValuesInputIterator values_input, const ValuesOutputIterator values_output, const size_t size, const BinaryFunction scan_op = BinaryFunction(), const KeyCompareFunction key_compare_op = KeyCompareFunction(), const hipStream_t stream = 0, const bool debug_synchronous = false)#

Parallel inclusive scan-by-key primitive for device level.

inclusive_scan_by_key function performs a device-wide inclusive prefix scan-by-key operation using binary scan_op operator.

Overview

  • Supports non-commutative scan operators. However, a scan operator should be associative.

  • When used with non-associative functions (e.g. floating point arithmetic operations):

    • the results may be non-deterministic and/or vary in precision,

    • and bit-wise reproducibility is not guaranteed, that is, results from multiple runs using the same input values on the same device may not be bit-wise identical.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Ranges specified by keys_input, values_input, and values_output must have at least size elements.

Example

In this example a device-level inclusive sum-by-key operation is performed on an array of integer values (shorts are scanned into ints).

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t size;           // e.g., 8
int *   keys_input;    // e.g., [1, 1, 2, 2, 3, 3, 3, 5]
short * values_input;  // e.g., [1, 2, 3, 4, 5, 6, 7, 8]
int *   values_output; // empty array of 8 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::inclusive_scan_by_key(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, values_input,
    values_output, size,
    rocprim::plus<int>()
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform scan-by-key
rocprim::inclusive_scan_by_key(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, values_input,
    values_output, size,
    rocprim::plus<int>()
);
// values_output: [1, 3, 3, 7, 5, 11, 18, 8]

Template Parameters:
  • Config – - [optional] configuration of the primitive, has to be scan_by_key_config or a class derived from it.

  • KeysInputIterator – - random-access iterator type of the input range. It can be a simple pointer type.

  • ValuesInputIterator – - random-access iterator type of the input range. It can be a simple pointer type.

  • ValuesOutputIterator – - random-access iterator type of the output range. It can be a simple pointer type.

  • BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>, where T is a value_type of InputIterator.

  • KeyCompareFunction – - type of binary function used to determine keys equality. Default type is rocprim::equal_to<T>, where T is a value_type of KeysInputIterator.

  • AccType – - accumulator type used to propagate the scanned values. Default type is value type of the input iterator.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the scan operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • keys_input[in] - iterator to the first element in the range of keys.

  • values_input[in] - iterator to the first element in the range of values to scan.

  • values_output[out] - iterator to the first element in the output value range.

  • size[in] - number of element in the input range.

  • scan_op[in] - binary operation function object that will be used for scanning input values. The signature of the function should be equivalent to the following: T f(const T &a, const T &b);. The signature does not need to have const &, but function object must not modify the objects passed to it. Default is BinaryFunction().

  • key_compare_op[in] - binary operation function object that will be used to determine keys equality. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b);. The signature does not need to have const &, but function object must not modify the objects passed to it. Default is KeyCompareFunction().

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful scan; otherwise a HIP runtime error of type hipError_t.

exclusive#

template<typename Config = default_config, typename KeysInputIterator, typename ValuesInputIterator, typename ValuesOutputIterator, typename InitialValueType, typename BinaryFunction = ::rocprim::plus<typename std::iterator_traits<ValuesInputIterator>::value_type>, typename KeyCompareFunction = ::rocprim::equal_to<typename std::iterator_traits<KeysInputIterator>::value_type>, typename AccType = detail::input_type_t<InitialValueType>>
inline hipError_t rocprim::exclusive_scan_by_key(void *const temporary_storage, size_t &storage_size, const KeysInputIterator keys_input, const ValuesInputIterator values_input, const ValuesOutputIterator values_output, const InitialValueType initial_value, const size_t size, const BinaryFunction scan_op = BinaryFunction(), const KeyCompareFunction key_compare_op = KeyCompareFunction(), const hipStream_t stream = 0, const bool debug_synchronous = false)#

Parallel exclusive scan-by-key primitive for device level.

inclusive_scan_by_key function performs a device-wide exclusive prefix scan-by-key operation using binary scan_op operator.

Overview

  • Supports non-commutative scan operators. However, a scan operator should be associative.

  • When used with non-associative functions (e.g. floating point arithmetic operations):

    • the results may be non-deterministic and/or vary in precision,

    • and bit-wise reproducibility is not guaranteed, that is, results from multiple runs using the same input values on the same device may not be bit-wise identical.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Ranges specified by keys_input, values_input, and values_output must have at least size elements.

Example

In this example a device-level inclusive sum-by-key operation is performed on an array of integer values (shorts are scanned into ints).

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t size;           // e.g., 8
int *   keys_input;    // e.g., [1, 1, 1, 2, 2, 3, 3, 4]
short * values_input;  // e.g., [1, 2, 3, 4, 5, 6, 7, 8]
int start_value;       // e.g., 9
int *   values_output; // empty array of 8 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::exclusive_scan_by_key(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, values_input,
    values_output, start_value,
    size,rocprim::plus<int>()
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform scan-by-key
rocprim::exclusive_scan_by_key(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, values_input,
    values_output, start_value,
    size,rocprim::plus<int>()
);
// values_output: [9, 10, 12, 9, 13, 9, 15, 9]

Template Parameters:
  • Config – - [optional] configuration of the primitive, has to be scan_by_key_config or a class derived from it.

  • KeysInputIterator – - random-access iterator type of the input range. It can be a simple pointer type.

  • ValuesInputIterator – - random-access iterator type of the input range. It can be a simple pointer type.

  • ValuesOutputIterator – - random-access iterator type of the output range. It can be a simple pointer type.

  • InitValueType – - type of the initial value.

  • BinaryFunction – - type of binary function used for scan. Default type is rocprim::plus<T>, where T is a value_type of InputIterator.

  • KeyCompareFunction – - type of binary function used to determine keys equality. Default type is rocprim::equal_to<T>, where T is a value_type of KeysInputIterator.

  • AccType – - accumulator type used to propagate the scanned values. Default type is ‘InitValueType’, unless it’s ‘rocprim::future_value’. Then it will be the wrapped input type.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the scan operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • keys_input[in] - iterator to the first element in the range of keys.

  • values_input[in] - iterator to the first element in the range of values to scan.

  • values_output[out] - iterator to the first element in the output value range.

  • initial_value[in] - initial value to start the scan. A rocpim::future_value may be passed to use a value that will be later computed.

  • size[in] - number of element in the input range.

  • scan_op[in] - binary operation function object that will be used for scanning input values. The signature of the function should be equivalent to the following: T f(const T &a, const T &b);. The signature does not need to have const &, but function object must not modify the objects passed to it. Default is BinaryFunction().

  • key_compare_op[in] - binary operation function object that will be used to determine keys equality. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b);. The signature does not need to have const &, but function object must not modify the objects passed to it. Default is KeyCompareFunction().

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful scan; otherwise a HIP runtime error of type hipError_t.