Sort#

Configuring the kernel#

merge_sort#

template<unsigned int MergeOddevenBlockSize = 512, unsigned int SortBlockSize = MergeOddevenBlockSize, unsigned int SortItemsPerThread = 1, unsigned int MergeMergepathPartitionBlockSize = 128, unsigned int MergeMergepathBlockSize = 128, unsigned int MergeMergepathItemsPerThread = 4, unsigned int MinInputSizeMergepath = (1 << 17) + 70000>
struct merge_sort_config : public rocprim::detail::merge_sort_config_params#

Configuration of device-level merge primitives.

Template Parameters:
  • SortBlockSize – - block size in the block-sort step

  • SortItemsPerThread – - ItemsPerThread in the block-sort step

  • MergeOddevenBlockSize – - block size in the block merge step using oddeven impl (used when input_size < MinInputSizeMergepath)

  • MergeMergepathPartitionBlockSize – - block size of the partition kernel in the block merge step using mergepath impl

  • MergeMergepathBlockSize – - block size in the block merge step using mergepath impl

  • MergeMergepathItemsPerThread – - ItemsPerThread in the block merge step using mergepath impl

  • MinInputSizeMergepath – - breakpoint of input-size to use mergepath impl for block merge step

radix_sort#

template<class SingleSortConfig = default_config, class MergeSortConfig = default_config, class OnesweepConfig = default_config, size_t MergeSortLimit = 1024 * 1024>
struct radix_sort_config#

Configuration of device-level radix sort operation.

One of three algorithms is used: single sort (launches only a single block), merge sort, or Onesweep.

Template Parameters:
  • SortSingleConfig – - Configuration for the single kernel subalgorithm. must be kernel_config or default_config.

  • MergeSortConfig – - Configuration for the merge sort subalgorithm. must be merge_sort_config or default_config. If merge_sort_config, the sorted items per block must be a power of two.

  • OnesweepConfig – - Configuration for the Onesweep subalgorithm. must be radix_sort_onesweep_config or default_config.

  • MergeSortLimit – - The largest number of items for which the merge sort algorithm will be used. Note that below this limit, a different algorithm may be used.

merge_sort#

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<KeysInputIterator>::value_type>>
inline hipError_t rocprim::merge_sort(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, const size_t size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel merge sort primitive for device level.

merge_sort function performs a device-wide merge sort of keys. Function sorts input keys based on comparison function.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Accepts custom compare_functions for sorting across the device.

Example

In this example a device-level ascending merge sort is performed on an array of float values.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;      // e.g., 8
float * input;          // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7]
float * output;         // empty array of 8 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::merge_sort(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::merge_sort(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size
);
// keys_output: [0.08, 0.2, 0.3, 0.4, 0.6, 0.65, 0.7, 1]

Template Parameters:
  • KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • keys_input[in] - pointer to the first element in the range to sort.

  • keys_output[out] - pointer to the first element in the output range.

  • size[in] - number of element in the input range.

  • compare_function[in] - binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b);. The signature does not need to have const &, but function object must not modify the objects passed to it. The default value is BinaryFunction().

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<KeysInputIterator>::value_type>>
inline hipError_t rocprim::merge_sort(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, const size_t size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel ascending merge sort-by-key primitive for device level.

merge_sort function performs a device-wide merge sort of (key, value) pairs. Function sorts input pairs based on comparison function.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Accepts custom compare_functions for sorting across the device.

Example

In this example a device-level ascending merge sort is performed where input keys are represented by an array of unsigned integers and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
unsigned int * keys_input;  // e.g., [ 6, 3,  5, 4,  1,  8,  2, 7]
double * values_input;      // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
unsigned int * keys_output; // empty array of 8 elements
double * values_output;     // empty array of 8 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::merge_sort(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::merge_sort(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size
);
// keys_output:   [ 1,  2, 3, 4,  5,  6, 7,  8]
// values_output: [-1, -2, 2, 3, -4, -5, 7, -8]

Template Parameters:
  • KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • keys_input[in] - pointer to the first element in the range to sort.

  • keys_output[out] - pointer to the first element in the output range.

  • values_input[in] - pointer to the first element in the range to sort.

  • values_output[out] - pointer to the first element in the output range.

  • size[in] - number of element in the input range.

  • compare_function[in] - binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b);. The signature does not need to have const &, but function object must not modify the objects passed to it. The default value is BinaryFunction().

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

radix_sort_keys#

ascending#

template<class Config = default_config, class Key, class Size>
inline hipError_t rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel ascending radix sort primitive for device level.

radix_sort_keys function performs a device-wide radix sort of keys. Function sorts input keys in ascending order.

Overview

  • The contents of both buffers of keys may be altered by the sorting function.

  • current() of keys is used as the input.

  • The function will update current() of keys to point to the buffer that contains the output range.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • The function requires small temporary_storage as it does not need a temporary buffer of size elements.

  • Key type must be an arithmetic type (that is, an integral type or a floating-point type).

  • Buffers of keys must have at least size elements.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level ascending radix sort is performed on an array of float values.

#include <rocprim/rocprim.hpp>

// Prepare input and tmp (declare pointers, allocate device memory etc.)
size_t input_size;  // e.g., 8
float * input;      // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7]
float * tmp;        // empty array of 8 elements
// Create double-buffer
rocprim::double_buffer<float> keys(input, tmp);

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_keys(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, input_size
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_keys(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, input_size
);
// keys.current(): [0.08, 0.2, 0.3, 0.4, 0.6, 0.65, 0.7, 1]

Template Parameters:
  • Config – - [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • Key – - key type. Must be an integral type or a floating-point type.

  • Size – - integral type that represents the problem size.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • keys[inout] - reference to the double-buffer of keys, its current() contains the input range and will be updated to point to the output range.

  • size[in] - number of element in the input range.

  • begin_bit[in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

descending#

template<class Config = default_config, class Key, class Size>
inline hipError_t rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel descending radix sort primitive for device level.

radix_sort_keys_desc function performs a device-wide radix sort of keys. Function sorts input keys in descending order.

Overview

  • The contents of both buffers of keys may be altered by the sorting function.

  • current() of keys is used as the input.

  • The function will update current() of keys to point to the buffer that contains the output range.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • The function requires small temporary_storage as it does not need a temporary buffer of size elements.

  • Key type must be an arithmetic type (that is, an integral type or a floating-point type).

  • Buffers of keys must have at least size elements.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level descending radix sort is performed on an array of integer values.

#include <rocprim/rocprim.hpp>

// Prepare input and tmp (declare pointers, allocate device memory etc.)
size_t input_size;  // e.g., 8
int * input;        // e.g., [6, 3, 5, 4, 2, 8, 1, 7]
int * tmp;          // empty array of 8 elements
// Create double-buffer
rocprim::double_buffer<int> keys(input, tmp);

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_keys_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, input_size
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_keys_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, input_size
);
// keys.current(): [8, 7, 6, 5, 4, 3, 2, 1]

Template Parameters:
  • Config – - [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • Key – - key type. Must be an integral type or a floating-point type.

  • Size – - integral type that represents the problem size.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • keys[inout] - reference to the double-buffer of keys, its current() contains the input range and will be updated to point to the output range.

  • size[in] - number of element in the input range.

  • begin_bit[in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

segmented, ascending#

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_keys(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel ascending radix sort primitive for device level.

segmented_radix_sort_keys function performs a device-wide radix sort across multiple, non-overlapping sequences of keys. Function sorts input keys in ascending order.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).

  • Ranges specified by keys_input and keys_output must have at least size elements.

  • Ranges specified by begin_offsets and end_offsets must have at least segments elements. They may use the same sequence offsets of at least segments + 1 elements: offsets for begin_offsets and offsets + 1 for end_offsets.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level ascending radix sort is performed on an array of float values.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;      // e.g., 8
float * input;          // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7]
float * output;         // empty array of 8 elements
unsigned int segments;  // e.g., 3
int * offsets;          // e.g. [0, 2, 3, 8]

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::segmented_radix_sort_keys(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size,
    segments, offsets, offsets + 1
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::segmented_radix_sort_keys(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size,
    segments, offsets, offsets + 1
);
// keys_output: [0.3, 0.6, 0.65, 0.08, 0.2, 0.4, 0.7, 1]

Template Parameters:
  • Config – - [optional] configuration of the primitive. It has to be segmented_radix_sort_config or a class derived from it.

  • KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • keys_input[in] - pointer to the first element in the range to sort.

  • keys_output[out] - pointer to the first element in the output range.

  • size[in] - number of element in the input range.

  • segments[in] - number of segments in the input range.

  • begin_offsets[in] - iterator to the first element in the range of beginning offsets.

  • end_offsets[in] - iterator to the first element in the range of ending offsets.

  • begin_bit[in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

segmented, descending#

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel descending radix sort primitive for device level.

segmented_radix_sort_keys_desc function performs a device-wide radix sort across multiple, non-overlapping sequences of keys. Function sorts input keys in descending order.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).

  • Ranges specified by keys_input and keys_output must have at least size elements.

  • Ranges specified by begin_offsets and end_offsets must have at least segments elements. They may use the same sequence offsets of at least segments + 1 elements: offsets for begin_offsets and offsets + 1 for end_offsets.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level descending radix sort is performed on an array of integer values.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;      // e.g., 8
int * input;            // e.g., [6, 3, 5, 4, 2, 8, 1, 7]
int * output;           // empty array of 8 elements
unsigned int segments;  // e.g., 3
int * offsets;          // e.g. [0, 2, 3, 8]

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::segmented_radix_sort_keys_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size,
    segments, offsets, offsets + 1
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::segmented_radix_sort_keys_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size,
    segments, offsets, offsets + 1
);
// keys_output: [6, 3, 5, 8, 7, 4, 2, 1]

Template Parameters:
  • Config – - [optional] configuration of the primitive. It has to be segmented_radix_sort_config or a class derived from it.

  • KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • keys_input[in] - pointer to the first element in the range to sort.

  • keys_output[out] - pointer to the first element in the output range.

  • size[in] - number of element in the input range.

  • segments[in] - number of segments in the input range.

  • begin_offsets[in] - iterator to the first element in the range of beginning offsets.

  • end_offsets[in] - iterator to the first element in the range of ending offsets.

  • begin_bit[in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

radix_sort_pairs#

ascending#

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel ascending radix sort-by-key primitive for device level.

radix_sort_pairs_desc function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).

  • Ranges specified by keys_input, keys_output, values_input and values_output must have at least size elements.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level ascending radix sort is performed where input keys are represented by an array of unsigned integers and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
unsigned int * keys_input;  // e.g., [ 6, 3,  5, 4,  1,  8,  1, 7]
double * values_input;      // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
unsigned int * keys_output; // empty array of 8 elements
double * values_output;     // empty array of 8 elements

// Keys are in range [0; 8], so we can limit compared bit to bits on indexes
// 0, 1, 2, 3, and 4. In order to do this begin_bit is set to 0 and end_bit
// is set to 5.

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_pairs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size, 0, 5
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_pairs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size, 0, 5
);
// keys_output:   [ 1,  1, 3, 4,  5,  6, 7,  8]
// values_output: [-1, -2, 2, 3, -4, -5, 7, -8]

Template Parameters:
  • Config – - [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • Size – - integral type that represents the problem size.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • keys_input[in] - pointer to the first element in the range to sort.

  • keys_output[out] - pointer to the first element in the output range.

  • values_input[in] - pointer to the first element in the range to sort.

  • values_output[out] - pointer to the first element in the output range.

  • size[in] - number of element in the input range.

  • begin_bit[in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

descending#

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel descending radix sort-by-key primitive for device level.

radix_sort_pairs_desc function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).

  • Ranges specified by keys_input, keys_output, values_input and values_output must have at least size elements.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level descending radix sort is performed where input keys are represented by an array of integers and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;       // e.g., 8
int * keys_input;        // e.g., [ 6, 3,  5, 4,  1,  8,  1, 7]
double * values_input;   // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
int * keys_output;       // empty array of 8 elements
double * values_output;  // empty array of 8 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_pairs_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_pairs_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size
);
// keys_output:   [ 8, 7,  6,  5, 4, 3,  1,  1]
// values_output: [-8, 7, -5, -4, 3, 2, -1, -2]

Template Parameters:
  • Config – - [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • Size – - integral type that represents the problem size.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • keys_input[in] - pointer to the first element in the range to sort.

  • keys_output[out] - pointer to the first element in the output range.

  • values_input[in] - pointer to the first element in the range to sort.

  • values_output[out] - pointer to the first element in the output range.

  • size[in] - number of element in the input range.

  • begin_bit[in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

segmented, ascending#

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel ascending radix sort-by-key primitive for device level.

segmented_radix_sort_pairs_desc function performs a device-wide radix sort across multiple, non-overlapping sequences of (key, value) pairs. Function sorts input pairs in ascending order of keys.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).

  • Ranges specified by keys_input, keys_output, values_input and values_output must have at least size elements.

  • Ranges specified by begin_offsets and end_offsets must have at least segments elements. They may use the same sequence offsets of at least segments + 1 elements: offsets for begin_offsets and offsets + 1 for end_offsets.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level ascending radix sort is performed where input keys are represented by an array of unsigned integers and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
unsigned int * keys_input;  // e.g., [ 6, 3,  5, 4,  1,  8,  1, 7]
double * values_input;      // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
unsigned int * keys_output; // empty array of 8 elements
double * values_output;     // empty array of 8 elements
unsigned int segments;      // e.g., 3
int * offsets;              // e.g. [0, 2, 3, 8]

// Keys are in range [0; 8], so we can limit compared bit to bits on indexes
// 0, 1, 2, 3, and 4. In order to do this begin_bit is set to 0 and end_bit
// is set to 5.

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::segmented_radix_sort_pairs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output, input_size,
    segments, offsets, offsets + 1,
    0, 5
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::segmented_radix_sort_pairs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output, input_size,
    segments, offsets, offsets + 1,
    0, 5
);
// keys_output:   [3,  6,  5,  1,  1, 4, 7,  8]
// values_output: [2, -5, -4, -1, -2, 3, 7, -8]

Template Parameters:
  • Config – - [optional] configuration of the primitive. It has to be segmented_radix_sort_config or a class derived from it.

  • KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • keys_input[in] - pointer to the first element in the range to sort.

  • keys_output[out] - pointer to the first element in the output range.

  • values_input[in] - pointer to the first element in the range to sort.

  • values_output[out] - pointer to the first element in the output range.

  • size[in] - number of element in the input range.

  • segments[in] - number of segments in the input range.

  • begin_offsets[in] - iterator to the first element in the range of beginning offsets.

  • end_offsets[in] - iterator to the first element in the range of ending offsets.

  • begin_bit[in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

segmented, ascending#

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel descending radix sort-by-key primitive for device level.

segmented_radix_sort_pairs_desc function performs a device-wide radix sort across multiple, non-overlapping sequences of (key, value) pairs. Function sorts input pairs in descending order of keys.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).

  • Ranges specified by keys_input, keys_output, values_input and values_output must have at least size elements.

  • Ranges specified by begin_offsets and end_offsets must have at least segments elements. They may use the same sequence offsets of at least segments + 1 elements: offsets for begin_offsets and offsets + 1 for end_offsets.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level descending radix sort is performed where input keys are represented by an array of integers and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;       // e.g., 8
int * keys_input;        // e.g., [ 6, 3,  5, 4,  1,  8,  1, 7]
double * values_input;   // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
int * keys_output;       // empty array of 8 elements
double * values_output;  // empty array of 8 elements
unsigned int segments;   // e.g., 3
int * offsets;           // e.g. [0, 2, 3, 8]

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::segmented_radix_sort_pairs_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size,
    segments, offsets, offsets + 1
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::segmented_radix_sort_pairs_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size,
    segments, offsets, offsets + 1
);
// keys_output:   [ 6, 3,  5,  8, 7, 4,  1,  1]
// values_output: [-5, 2, -4, -8, 7, 3, -1, -2]

Template Parameters:
  • Config – - [optional] configuration of the primitive. It has to be segmented_radix_sort_config or a class derived from it.

  • KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • keys_input[in] - pointer to the first element in the range to sort.

  • keys_output[out] - pointer to the first element in the output range.

  • values_input[in] - pointer to the first element in the range to sort.

  • values_output[out] - pointer to the first element in the output range.

  • size[in] - number of element in the input range.

  • segments[in] - number of segments in the input range.

  • begin_offsets[in] - iterator to the first element in the range of beginning offsets.

  • end_offsets[in] - iterator to the first element in the range of ending offsets.

  • begin_bit[in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.