Partition#

partition#

template<class Config = default_config, class InputIterator, class OutputIterator, class SelectedCountOutputIterator, class UnaryPredicate, bool UsingOrderedBlockId = false>
inline hipError_t rocprim::partition(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, SelectedCountOutputIterator selected_count_output, const size_t size, UnaryPredicate predicate, const hipStream_t stream = 0, const bool debug_synchronous = false)#

Parallel select primitive for device level using selection predicate.

Performs a device-wide partition using selection predicate. Partition copies the values from input to output in such a way that all values for which the predicate returns true precede the elements for which it returns false.

Overview

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Range specified by selected_count_output must have at least 1 element.

  • Relative order is preserved for the elements for which the predicate returns true. Other elements are copied in reverse order.

Example

In this example a device-level partition operation is performed on an array of integer values, even values are copied before odd values.

#include <rocprim/rocprim.hpp>

auto predicate =
    [] (int a) -> bool
    {
        return (a%2) == 0;
    };

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;     // e.g., 8
int * input;           // e.g., [1, 2, 3, 4, 5, 6, 7, 8]
int * output;          // empty array of 8 elements
size_t * output_count; // empty array of 1 element

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::partition(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input,
    output, output_count,
    input_size,
    predicate
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform partition
rocprim::partition(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input,
    output, output_count,
    input_size,
    predicate
);
// output: [2, 4, 6, 8, 7, 5, 3, 1]
// output_count: 4

Template Parameters:
  • Config – [optional] Configuration of the primitive, must be default_config or select_config.

  • InputIterator – random-access iterator type of the input range. It can be a simple pointer type.

  • OutputIterator – random-access iterator type of the output range. It can be a simple pointer type.

  • SelectedCountOutputIterator – random-access iterator type of the selected_count_output value. It can be a simple pointer type.

  • UnaryPredicate – type of a unary selection predicate.

  • UsingOrderedBlockId – If true, uses an atomic counter to assign block id instead of natural blockIdx-based ordering. Can increase performance on MI3xx architectures when using streams. The default is false.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the select operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • input[in] iterator to the first element in the range to select values from.

  • output[out] iterator to the first element in the output range.

  • selected_count_output[out] iterator to the total number of selected values (length of output).

  • size[in] number of elements in the input range.

  • predicate[in] unary function object which returns true if the element should be ordered before other elements. The signature of the function should be equivalent to the following: bool f(const T &a);. The signature does not need to have const &, but function object must not modify the object passed to it.

  • stream[in] [optional] HIP stream object. The default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. The default value is false.

partition_two_way#

template<class Config = default_config, class InputIterator, class SelectedOutputIterator, class RejectedOutputIterator, class SelectedCountOutputIterator, class Predicate, bool UsingOrderedBlockId = false>
inline hipError_t rocprim::partition_two_way(void *temporary_storage, size_t &storage_size, InputIterator input, SelectedOutputIterator output_selected, RejectedOutputIterator output_rejected, SelectedCountOutputIterator selected_count_output, const size_t size, Predicate predicate, const hipStream_t stream = 0, const bool debug_synchronous = false)#

Two-way parallel select primitive for device level using selection predicate.

Performs a device-wide partition using selection predicate. Partition copies the values from input to output_selected and output_rejected for all values for which the predicate returns true and false respectively.

Overview

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • The number of elements written to output_selected is equal to the number elements in the input for which predicate returns true.

  • The number of elements written to output_rejected is equal to the number elements in the input for which predicate returns false.

  • Range specified by selected_count_output must have at least 1 element.

  • Relative order is preserved.

Example

In this example a device-level two-way partition operation is performed on an array of integer values, even values are copied into the selected output and odd values are copied into rejected output.

The full example is on GitHub.

#include <rocprim/rocprim.hpp>

auto predicate =
    [] (int a) -> bool
    {
        return (a%2) == 0;
    };

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;     // e.g., 8
int * input;           // e.g., [1, 2, 3, 4, 5, 6, 7, 8]
int * selected_output; // empty array of at least 4 elements
int * rejected_output; // empty array of at least 4 elements
size_t * output_count; // empty array of 1 element

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::partition_two_way(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input,
    selected_output,
    rejected_output,
    selected_output_count,
    input_size,
    predicate
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform partition
rocprim::partition_two_way(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input,
    selected_output,
    rejected_output,
    selected_output_count,
    input_size,
    predicate
);
// output_selected: [2, 4, 6, 8]
// output_rejected: [1, 3, 5, 7]
// selected_output_count: 4

Template Parameters:
  • Config – [optional] Configuration of the primitive, must be default_config or select_config.

  • InputIterator – random-access iterator type of the input range. It can be a simple pointer type.

  • SelectedOutputIterator – random-access iterator type of the selected output range. It can be a simple pointer type.

  • RejectedOutputIterator – random-access iterator type of the rejected output range. It can be a simple pointer type.

  • SelectedCountOutputIterator – random-access iterator type of the selected_count_output value. It can be a simple pointer type.

  • Predicate – type of the selection predicate.

  • UsingOrderedBlockId – If true, uses an atomic counter to assign block id instead of natural blockIdx-based ordering. Can increase performance on MI3xx architectures when using streams. The default is false.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the select operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • input[in] iterator to the first element in the range to select values from.

  • output_selected[out] iterator to the first element in the selected output range.

  • output_rejected[out] iterator to the first element in the rejected output range.

  • selected_count_output[out] iterator to the total number of selected values (length of output_selected ).

  • size[in] number of elements in the input range.

  • predicate[in] the unary selection predicate to select values into the select and reject outputs.

  • stream[in] [optional] HIP stream object. The default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. The default value is false.

partition_three_way#

template<class Config = default_config, typename InputIterator, typename FirstOutputIterator, typename SecondOutputIterator, typename UnselectedOutputIterator, typename SelectedCountOutputIterator, typename FirstUnaryPredicate, typename SecondUnaryPredicate, bool UsingOrderedBlockId = false>
inline hipError_t rocprim::partition_three_way(void *temporary_storage, size_t &storage_size, InputIterator input, FirstOutputIterator output_first_part, SecondOutputIterator output_second_part, UnselectedOutputIterator output_unselected, SelectedCountOutputIterator selected_count_output, const size_t size, FirstUnaryPredicate select_first_part_op, SecondUnaryPredicate select_second_part_op, const hipStream_t stream = 0, const bool debug_synchronous = false)#

Parallel select primitive for device level using two selection predicates.

Performs a device-wide three-way partition using two selection predicates. Partition copies the values from input to either output_first_part or output_second_part or output_unselected according to the following criteria: The value is copied to output_first_part if the predicate select_first_part_op invoked with the value returns true. It is copied to output_second_part if select_first_part_op returns false and select_second_part_op returns true, and it is copied to output_unselected otherwise.

Overview

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • Range specified by selected_count_output must have at least 2 elements.

  • Relative order is preserved for the elements.

  • The number of elements written to output_first_part is equal to the number of elements in the input for which select_first_part_op returned true.

  • The number of elements written to output_second_part is equal to the number of elements in the input for which select_first_part_op returned false and select_second_part_op returned true.

  • The number of elements written to output_unselected is equal to the number of input elements minus the number of elements written to output_first_part minus the number of elements written to output_second_part.

Example

In this example a device-level three-way partition operation is performed on an array of integer values, even values are copied to the first partition, odd and 3-divisible values are copied to the second partition, and the rest of the values are copied to the unselected partition

#include <rocprim/rocprim.hpp>

auto first_predicate =
    [] (int a) -> bool
    {
        return (a%2) == 0;
    };
auto second_predicate =
    [] (int a) -> bool
    {
        return (a%3) == 0;
    };

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
int * input;                // e.g., [1, 2, 3, 4, 5, 6, 7, 8]
int * output_first_part;    // array of 8 elements
int * output_second_part;   // array of 8 elements
int * output_unselected;    // array of 8 elements
size_t * output_count;      // array of 2 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::partition_three_way(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input,
    output_first_part, output_second_part, output_unselected,
    output_count,
    input_size,
    first_predicate,
    second_predicate
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform partition
rocprim::partition_three_way(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input,
    output_first_part, output_second_part, output_unselected,
    output_count,
    input_size,
    first_predicate,
    second_predicate
);
// elements denoted by '*' were not modified
// output_first_part:  [2, 4, 6, 8, *, *, *, *]
// output_second_part: [3, *, *, *, *, *, *, *]
// output_unselected:  [1, 5, 7, *, *, *, *, *]
// output_count:       [4, 1]

Template Parameters:
  • Config – [optional] Configuration of the primitive, must be default_config or select_config.

  • InputIterator – random-access iterator type of the input range. It can be a simple pointer type.

  • FirstOutputIterator – random-access iterator type of the first output range. It can be a simple pointer type.

  • SecondOutputIterator – random-access iterator type of the second output range. It can be a simple pointer type.

  • UnselectedOutputIterator – random-access iterator type of the unselected output range. It can be a simple pointer type.

  • SelectedCountOutputIterator – random-access iterator type of the selected_count_output value. It can be a simple pointer type.

  • FirstUnaryPredicate – type of the first unary selection predicate.

  • SecondUnaryPredicate – type of the second unary selection predicate.

  • UsingOrderedBlockId – If true, uses an atomic counter to assign block id instead of natural blockIdx-based ordering. Can increase performance on MI3xx architectures when using streams. The default is false.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the select operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • input[in] iterator to the first element in the range to select values from.

  • output_first_part[out] iterator to the first element in the first output range.

  • output_second_part[out] iterator to the first element in the second output range.

  • output_unselected[out] iterator to the first element in the unselected output range.

  • selected_count_output[out] iterator to the total number of selected values in output_first_part and output_second_part respectively.

  • size[in] number of elements in the input range.

  • select_first_part_op[in] unary function object which returns true if the element should be in output_first_part range The signature of the function should be equivalent to the following: bool f(const T &a);. The signature does not need to have const &, but function object must not modify the object passed to it.

  • select_second_part_op[in] unary function object which returns true if the element should be in output_second_part range (given that select_first_part_op returned false) The signature of the function should be equivalent to the following: bool f(const T &a);. The signature does not need to have const &, but function object must not modify the object passed to it.

  • stream[in] [optional] HIP stream object. The default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. The default value is false.