Nth Element#

Configuring the kernel#

template<unsigned int BlockSize, unsigned int ItemsPerThread, unsigned int StopRecursionSize, unsigned int NumberOfBuckets, block_radix_rank_algorithm RadixRankAlgorithm>
struct nth_element_config : public rocprim::detail::nth_element_config_params#

Configuration of device-level nth_element.

Template Parameters:
  • BlockSize – number of threads in a block.

  • ItemsPerThread – number of items processed by each thread.

  • StopRecursionSize – the size from where recursion is stopped to do a block sort

  • NumberOfBuckets – the number of buckets that are used in the algorithm

  • RadixRankAlgorithm – algorithm for radix rank

nth_element#

template<class Config = default_config, class KeysIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<KeysIterator>::value_type>>
inline hipError_t rocprim::nth_element(void *temporary_storage, size_t &storage_size, KeysIterator keys, size_t nth, size_t size, BinaryFunction compare_function = BinaryFunction(), hipStream_t stream = 0, bool debug_synchronous = false)#

Rearrange elements smaller than the n-th before and bigger than n-th after the n-th element.

The element at index n is set to the element that would be at the n-th position if the input was sorted. Additionally the other elements are rearranged such that for all values of i in [keys_output, keys_output + n) and all values of j in [keys_output + n, keys_output + size): comp(*i, *j) is false. Smaller elements than the n-th will be arranged before, and bigger ones after the n-th element.

Overview

  • The contents of the inputs are not altered by the function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • Accepts custom compare_functions for nth_element across the device.

  • Streams in graph capture mode are not supported

Example

In this example a device-level nth_element is performed where input keys are represented by an array of unsigned integers.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
size_t nth;                 // e.g., 4
unsigned int * keys;  // e.g., [ 6, 3, 5, 4, 1, 8, 2, 7 ]

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::nth_element(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, nth, input_size
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform nth_element
rocprim::nth_element(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, nth, input_size
);
// possible keys:   [ 1, 3, 4, 2, 5, 8, 7, 6 ]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be nth_element_config.

  • KeysIterator – [inferred] random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • CompareFunction – [inferred] Type of binary function that accepts two arguments of the type KeysIterator and returns a value convertible to bool. Default type is rocprim::less<>.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the nth_element rearrangement.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys[inout] iterator to the input range.

  • nth[in] The index of the nth_element in the input range.

  • size[in] number of element in the input range.

  • compare_function[in] binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b);. The signature does not need to have const &, but function object must not modify the objects passed to it. The comparator must meet the C++ named requirement Compare. The default value is BinaryFunction().

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful rearrangement; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<KeysInputIterator>::value_type>>
inline hipError_t rocprim::nth_element(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, size_t nth, size_t size, BinaryFunction compare_function = BinaryFunction(), hipStream_t stream = 0, bool debug_synchronous = false)#

Rearrange elements smaller than the n-th before and bigger than n-th after the n-th element.

The element at index n is set to the element that would be at the n-th position if the input was sorted. Additionally the other elements are rearranged such that for all values of i in [keys_output, keys_output + n) and all values of j in [keys_output + n, keys_output + size): comp(*i, *j) is false. Smaller elements than the n-th will be arranged before, and bigger ones after the n-th element.

Overview

  • The contents of the inputs are not altered by the function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • Accepts custom compare_functions for nth_element across the device.

  • Streams in graph capture mode are not supported

Example

In this example a device-level nth_element is performed where input keys are represented by an array of unsigned integers.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
size_t nth;                 // e.g., 4
unsigned int * keys_input;  // e.g., [ 6, 3, 5, 4, 1, 8, 2, 7 ]
unsigned int * keys_output; // empty array of 8 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::nth_element(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, nth, input_size
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform nth_element
rocprim::nth_element(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, nth, input_size
);
// possible keys_output:   [ 1, 3, 4, 2, 5, 8, 7, 6 ]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be nth_element_config.

  • KeysInputIterator – [inferred] random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – [inferred] random-access iterator type of the output range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • CompareFunction – [inferred] Type of binary function that accepts two arguments of the type KeysIterator and returns a value convertible to bool. Default type is rocprim::less<>.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the nth_element rearrangement.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys_input[in] iterator to the input range.

  • keys_output[out] iterator to the output range. No overlap at all is allowed between keys_input and keys_output. keys_output should be able to be written and read from for size elements.

  • nth[in] The index of the nth_element in the input range.

  • size[in] number of element in the input range.

  • compare_function[in] binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b);. The signature does not need to have const &, but function object must not modify the objects passed to it. The comparator must meet the C++ named requirement Compare. The default value is BinaryFunction().

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful rearrangement; otherwise a HIP runtime error of type hipError_t.