Partition#

partition#

template<class Config = default_config, class InputIterator, class OutputIterator, class SelectedCountOutputIterator, class UnaryPredicate>
inline hipError_t rocprim::partition(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, SelectedCountOutputIterator selected_count_output, const size_t size, UnaryPredicate predicate, const hipStream_t stream = 0, const bool debug_synchronous = false)#

Parallel select primitive for device level using selection predicate.

Performs a device-wide partition using selection predicate. Partition copies the values from input to output in such a way that all values for which the predicate returns true precede the elements for which it returns false.

Overview

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Range specified by selected_count_output must have at least 1 element.

  • Relative order is preserved for the elements for which the predicate returns true. Other elements are copied in reverse order.

Example

In this example a device-level partition operation is performed on an array of integer values, even values are copied before odd values.

#include <rocprim/rocprim.hpp>

auto predicate =
    [] __device__ (int a) -> bool
    {
        return (a%2) == 0;
    };

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;     // e.g., 8
int * input;           // e.g., [1, 2, 3, 4, 5, 6, 7, 8]
int * output;          // empty array of 8 elements
size_t * output_count; // empty array of 1 element

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::partition(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input,
    output, output_count,
    input_size,
    predicate
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform partition
rocprim::partition(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input,
    output, output_count,
    input_size,
    predicate
);
// output: [2, 4, 6, 8, 7, 5, 3, 1]
// output_count: 4

Template Parameters:
  • Config – - [optional] configuration of the primitive. It has to be select_config or a class derived from it.

  • InputIterator – - random-access iterator type of the input range. It can be a simple pointer type.

  • OutputIterator – - random-access iterator type of the output range. It can be a simple pointer type.

  • SelectedCountOutputIterator – - random-access iterator type of the selected_count_output value. It can be a simple pointer type.

  • UnaryPredicate – - type of a unary selection predicate.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the select operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • input[in] - iterator to the first element in the range to select values from.

  • output[out] - iterator to the first element in the output range.

  • selected_count_output[out] - iterator to the total number of selected values (length of output).

  • size[in] - number of elements in the input range.

  • predicate[in] - unary function object which returns true if the element should be ordered before other elements. The signature of the function should be equivalent to the following: bool f(const T &a);. The signature does not need to have const &, but function object must not modify the object passed to it.

  • stream[in] - [optional] HIP stream object. The default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. The default value is false.

partition_three_way#

template<class Config = default_config, typename InputIterator, typename FirstOutputIterator, typename SecondOutputIterator, typename UnselectedOutputIterator, typename SelectedCountOutputIterator, typename FirstUnaryPredicate, typename SecondUnaryPredicate>
inline hipError_t rocprim::partition_three_way(void *temporary_storage, size_t &storage_size, InputIterator input, FirstOutputIterator output_first_part, SecondOutputIterator output_second_part, UnselectedOutputIterator output_unselected, SelectedCountOutputIterator selected_count_output, const size_t size, FirstUnaryPredicate select_first_part_op, SecondUnaryPredicate select_second_part_op, const hipStream_t stream = 0, const bool debug_synchronous = false)#

Parallel select primitive for device level using two selection predicates.

Performs a device-wide three-way partition using two selection predicates. Partition copies the values from input to either output_first_part or output_second_part or output_unselected according to the following criteria: The value is copied to output_first_part if the predicate select_first_part_op invoked with the value returns true. It is copied to output_second_part if select_first_part_op returns false and select_second_part_op returns true, and it is copied to output_unselected otherwise.

Overview

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • Range specified by selected_count_output must have at least 2 elements.

  • Relative order is preserved for the elements.

  • The number of elements written to output_first_part is equal to the number of elements in the input for which select_first_part_op returned true.

  • The number of elements written to output_second_part is equal to the number of elements in the input for which select_first_part_op returned false and select_second_part_op returned true.

  • The number of elements written to output_unselected is equal to the number of input elements minus the number of elements written to output_first_part minus the number of elements written to output_second_part.

Example

In this example a device-level three-way partition operation is performed on an array of integer values, even values are copied to the first partition, odd and 3-divisible values are copied to the second partition, and the rest of the values are copied to the unselected partition

#include <rocprim/rocprim.hpp>

auto first_predicate =
    [] __device__ (int a) -> bool
    {
        return (a%2) == 0;
    };
auto second_predicate =
    [] __device__ (int a) -> bool
    {
        return (a%3) == 0;
    };

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
int * input;                // e.g., [1, 2, 3, 4, 5, 6, 7, 8]
int * output_first_part;    // array of 8 elements
int * output_second_part;   // array of 8 elements
int * output_unselected;    // array of 8 elements
size_t * output_count;      // array of 2 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::partition_three_way(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input,
    output_first_part, output_second_part, output_unselected,
    output_count,
    input_size,
    first_predicate,
    second_predicate
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform partition
rocprim::partition_three_way(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input,
    output_first_part, output_second_part, output_unselected,
    output_count,
    input_size,
    first_predicate,
    second_predicate
);
// elements denoted by '*' were not modified
// output_first_part:  [2, 4, 6, 8, *, *, *, *]
// output_second_part: [3, *, *, *, *, *, *, *]
// output_unselected:  [1, 5, 7, *, *, *, *, *]
// output_count:       [4, 1]

Template Parameters:
  • Config – - [optional] configuration of the primitive. It has to be select_config or a class derived from it.

  • InputIterator – - random-access iterator type of the input range. It can be a simple pointer type.

  • FirstOutputIterator – - random-access iterator type of the first output range. It can be a simple pointer type.

  • SecondOutputIterator – - random-access iterator type of the second output range. It can be a simple pointer type.

  • UnselectedOutputIterator – - random-access iterator type of the unselected output range. It can be a simple pointer type.

  • SelectedCountOutputIterator – - random-access iterator type of the selected_count_output value. It can be a simple pointer type.

  • FirstUnaryPredicate – - type of the first unary selection predicate.

  • SecondUnaryPredicate – - type of the second unary selection predicate.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the select operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • input[in] - iterator to the first element in the range to select values from.

  • output_first_part[out] - iterator to the first element in the first output range.

  • output_second_part[out] - iterator to the first element in the second output range.

  • output_unselected[out] - iterator to the first element in the unselected output range.

  • selected_count_output[out] - iterator to the total number of selected values in output_first_part and output_second_part respectively.

  • size[in] - number of elements in the input range.

  • select_first_part_op[in] - unary function object which returns true if the element should be in output_first_part range The signature of the function should be equivalent to the following: bool f(const T &a);. The signature does not need to have const &, but function object must not modify the object passed to it.

  • select_second_part_op[in] - unary function object which returns true if the element should be in output_second_part range (given that select_first_part_op returned false) The signature of the function should be equivalent to the following: bool f(const T &a);. The signature does not need to have const &, but function object must not modify the object passed to it.

  • stream[in] - [optional] HIP stream object. The default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. The default value is false.