Sort#

Configuring the kernel#

merge_sort#

template<unsigned int MergeOddevenBlockSize = 512, unsigned int SortBlockSize = MergeOddevenBlockSize, unsigned int SortItemsPerThread = 1, unsigned int MergeMergepathPartitionBlockSize = 128, unsigned int MergeMergepathBlockSize = 128, unsigned int MergeMergepathItemsPerThread = 4, unsigned int MinInputSizeMergepath = (1 << 17) + 70000>
struct merge_sort_config : public rocprim::detail::merge_sort_config_params#

Configuration of device-level merge primitives.

Template Parameters:
  • SortBlockSize – - block size in the block-sort step

  • SortItemsPerThread – - ItemsPerThread in the block-sort step

  • MergeOddevenBlockSize – - block size in the block merge step using oddeven impl (used when input_size < MinInputSizeMergepath)

  • MergeMergepathPartitionBlockSize – - block size of the partition kernel in the block merge step using mergepath impl

  • MergeMergepathBlockSize – - block size in the block merge step using mergepath impl

  • MergeMergepathItemsPerThread – - ItemsPerThread in the block merge step using mergepath impl

  • MinInputSizeMergepath – - breakpoint of input-size to use mergepath impl for block merge step

radix_sort#

template<class SingleSortConfig = default_config, class MergeSortConfig = default_config, class OnesweepConfig = default_config, size_t MergeSortLimit = 1024 * 1024>
struct radix_sort_config#

Configuration of device-level radix sort operation.

One of three algorithms is used: single sort (launches only a single block), merge sort, or Onesweep.

Template Parameters:
  • SortSingleConfig – - Configuration for the single kernel subalgorithm. must be kernel_config or default_config.

  • MergeSortConfig – - Configuration for the merge sort subalgorithm. must be merge_sort_config or default_config. If merge_sort_config, the sorted items per block must be a power of two.

  • OnesweepConfig – - Configuration for the Onesweep subalgorithm. must be radix_sort_onesweep_config or default_config.

  • MergeSortLimit – - The largest number of items for which the merge sort algorithm will be used. Note that below this limit, a different algorithm may be used.

merge_sort#

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<KeysInputIterator>::value_type>>
inline hipError_t rocprim::merge_sort(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, const size_t size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel merge sort primitive for device level.

merge_sort function performs a device-wide merge sort of keys. Function sorts input keys based on comparison function.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Accepts custom compare_functions for sorting across the device.

Example

In this example a device-level ascending merge sort is performed on an array of float values.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;      // e.g., 8
float * input;          // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7]
float * output;         // empty array of 8 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::merge_sort(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::merge_sort(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size
);
// keys_output: [0.08, 0.2, 0.3, 0.4, 0.6, 0.65, 0.7, 1]

Template Parameters:
  • KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • keys_input[in] - pointer to the first element in the range to sort.

  • keys_output[out] - pointer to the first element in the output range.

  • size[in] - number of element in the input range.

  • compare_function[in] - binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b);. The signature does not need to have const &, but function object must not modify the objects passed to it. The default value is BinaryFunction().

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<KeysInputIterator>::value_type>>
inline hipError_t rocprim::merge_sort(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, const size_t size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel ascending merge sort-by-key primitive for device level.

merge_sort function performs a device-wide merge sort of (key, value) pairs. Function sorts input pairs based on comparison function.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Accepts custom compare_functions for sorting across the device.

Example

In this example a device-level ascending merge sort is performed where input keys are represented by an array of unsigned integers and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
unsigned int * keys_input;  // e.g., [ 6, 3,  5, 4,  1,  8,  2, 7]
double * values_input;      // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
unsigned int * keys_output; // empty array of 8 elements
double * values_output;     // empty array of 8 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::merge_sort(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::merge_sort(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size
);
// keys_output:   [ 1,  2, 3, 4,  5,  6, 7,  8]
// values_output: [-1, -2, 2, 3, -4, -5, 7, -8]

Template Parameters:
  • KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • keys_input[in] - pointer to the first element in the range to sort.

  • keys_output[out] - pointer to the first element in the output range.

  • values_input[in] - pointer to the first element in the range to sort.

  • values_output[out] - pointer to the first element in the output range.

  • size[in] - number of element in the input range.

  • compare_function[in] - binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b);. The signature does not need to have const &, but function object must not modify the objects passed to it. The default value is BinaryFunction().

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

radix_sort_keys#

Ascending Sort#

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
hipError_t rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel ascending radix sort primitive for device level.

radix_sort_keys function performs a device-wide radix sort of keys. Function sorts input keys in ascending order.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).

  • Ranges specified by keys_input and keys_output must have at least size elements.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level ascending radix sort is performed on an array of float values.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;      // e.g., 8
float * input;          // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7]
float * output;         // empty array of 8 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_keys(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_keys(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size
);
// keys_output: [0.08, 0.2, 0.3, 0.4, 0.6, 0.65, 0.7, 1]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • Size – integral type that represents the problem size.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys_input[in] pointer to the first element in the range to sort.

  • keys_output[out] pointer to the first element in the output range.

  • size[in] number of element in the input range.

  • begin_bit[in] [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
auto rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#

Parallel ascending radix sort primitive for device level.

radix_sort_keys function performs a device-wide radix sort of keys. Function sorts input keys in ascending order.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) can be any trivially copyable type.

  • decomposer must be a functor that implements operator()(Key&) const. This operator must return a rocprim::tuple that contains one or more reference to value(s) of arithmetic types. These references must point to member variables of Key, however not every member variable has to be exposed this way.

  • Ranges specified by keys_input and keys_output must have at least size elements.

  • begin_bit and end_bit can be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returns rocprim::tuple<int16_t&, uint8_t&>, begin_bit==6 and end_bit==12, then the 2 MSBs of the uint8_t value and the 4 LSBs of the int16_t value are considered for sorting. The range specified by begin_bit and end_bit must be valid with regards to the sizes of the return tuple’s elements.

Example

In this example a device-level ascending radix sort is performed on an array of values of a custom type, using a custom decomposer.

#include <rocprim/rocprim.hpp>

struct custom_type
{
    int i;
    double d;
};

struct custom_type_decomposer
{
    rocprim::tuple<int&, double&> operator()(custom_type& key) const
    {
        return rocprim::tuple<int&, double&>(key.i, key.d);
    }
};

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;      // e.g., 8
custom_type * input;    // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}]
custom_type * output;   // empty array of 8 elements

constexpr unsigned int begin_bit = 0;
constexpr unsigned int end_bit = 96;
size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_keys(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size, custom_type_decomposer{}, begin_bit, end_bit
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_keys(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size, custom_type_decomposer{}, begin_bit, end_bit
);
// keys_output: [{-3, 0.3}, {-1, 0.7}, {0, 0.2}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {11, 0.08}, {11, 1.0}]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • KeysInputIterator – Random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – Random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • Size – Integral type that represents the problem size.

  • Key – The value type of the input and output iterators.

  • Decomposer – The type of the decomposer functor.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys_input[in] pointer to the first element in the range to sort.

  • keys_output[out] pointer to the first element in the output range.

  • size[in] number of element in the input range.

  • decomposer[in] decomposer functor that produces a tuple of references from the input key type.

  • begin_bit[in] index of the first (least significant) bit used in key comparison.

  • end_bit[in] past-the-end index (most significant) bit used in key comparison.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
auto rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#

Parallel ascending radix sort primitive for device level.

radix_sort_keys function performs a device-wide radix sort of keys. Function sorts input keys in ascending order.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) can be any trivially copyable type.

  • decomposer must be a functor that implements operator()(Key&) const. This operator must return a rocprim::tuple that contains one or more reference to value(s) of arithmetic types. These references must point to member variables of Key, however not every member variable has to be exposed this way.

  • Ranges specified by keys_input and keys_output must have at least size elements.

Example

In this example a device-level ascending radix sort is performed on an array of values of a custom type, using a custom decomposer.

#include <rocprim/rocprim.hpp>

struct custom_type
{
    int i;
    double d;
};

struct custom_type_decomposer
{
    rocprim::tuple<int&, double&> operator()(custom_type& key) const
    {
        return rocprim::tuple<int&, double&>(key.i, key.d);
    }
};

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;      // e.g., 8
custom_type * input;    // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}]
custom_type * output;   // empty array of 8 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_keys(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size, custom_type_decomposer{}
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_keys(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size, custom_type_decomposer{}
);
// keys_output: [{-3, 0.3}, {-1, 0.7}, {0, 0.2}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {11, 0.08}, {11, 1.0}]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • KeysInputIterator – Random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – Random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • Size – Integral type that represents the problem size.

  • Key – The value type of the input and output iterators.

  • Decomposer – The type of the decomposer functor.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys_input[in] pointer to the first element in the range to sort.

  • keys_output[out] pointer to the first element in the output range.

  • size[in] number of element in the input range.

  • decomposer[in] decomposer functor that produces a tuple of references from the input key type.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class Key, class Size>
hipError_t rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel ascending radix sort primitive for device level.

radix_sort_keys function performs a device-wide radix sort of keys. Function sorts input keys in ascending order.

Overview

  • The contents of both buffers of keys may be altered by the sorting function.

  • current() of keys is used as the input.

  • The function will update current() of keys to point to the buffer that contains the output range.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • The function requires small temporary_storage as it does not need a temporary buffer of size elements.

  • Key type must be an arithmetic type (that is, an integral type or a floating-point type).

  • Buffers of keys must have at least size elements.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level ascending radix sort is performed on an array of float values.

#include <rocprim/rocprim.hpp>

// Prepare input and tmp (declare pointers, allocate device memory etc.)
size_t input_size;  // e.g., 8
float * input;      // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7]
float * tmp;        // empty array of 8 elements
// Create double-buffer
rocprim::double_buffer<float> keys(input, tmp);

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_keys(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, input_size
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_keys(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, input_size
);
// keys.current(): [0.08, 0.2, 0.3, 0.4, 0.6, 0.65, 0.7, 1]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • Key – key type. Must be an integral type or a floating-point type.

  • Size – integral type that represents the problem size.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys[inout] reference to the double-buffer of keys, its current() contains the input range and will be updated to point to the output range.

  • size[in] number of element in the input range.

  • begin_bit[in] [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class Key, class Size, class Decomposer>
auto rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#

Parallel ascending radix sort primitive for device level.

radix_sort_keys function performs a device-wide radix sort of keys. Function sorts input keys in ascending order.

Overview

  • The contents of both buffers of keys may be altered by the sorting function.

  • current() of keys is used as the input.

  • The function will update current() of keys to point to the buffer that contains the output range.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • The function requires small temporary_storage as it does not need a temporary buffer of size elements.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) can be any trivially copyable type.

  • decomposer must be a functor that implements operator()(Key&) const. This operator must return a rocprim::tuple that contains one or more reference to value(s) of arithmetic types. These references must point to member variables of Key, however not every member variable has to be exposed this way.

  • Ranges specified by keys_input and keys_output must have at least size elements.

  • begin_bit and end_bit can be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returns rocprim::tuple<int16_t&, uint8_t&>, begin_bit==6 and end_bit==12, then the 2 MSBs of the uint8_t value and the 4 LSBs of the int16_t value are considered for sorting. The range specified by begin_bit and end_bit must be valid with regards to the sizes of the return tuple’s elements.

Example

In this example a device-level ascending radix sort is performed on an array of values of a custom type, using a custom decomposer.

#include <rocprim/rocprim.hpp>

struct custom_type
{
    int i;
    double d;
};

struct custom_type_decomposer
{
    rocprim::tuple<int&, double&> operator()(custom_type& key) const
    {
        return rocprim::tuple<int&, double&>(key.i, key.d);
    }
};

// Prepare input and tmp (declare pointers, allocate device memory etc.)
constexpr unsigned int begin_bit = 0;
constexpr unsigned int end_bit = 96;
size_t input_size;   // e.g., 8
custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}]
custom_type * tmp;   // empty array of 8 elements
// Create double-buffer
rocprim::double_buffer<custom_type> keys(input, tmp);

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_keys(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, input_size, custom_type_decomposer{}, begin_bit, end_bit
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_keys(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, input_size, custom_type_decomposer{}, begin_bit, end_bit
);
// keys.current(): [{-3, 0.3}, {-1, 0.7}, {0, 0.2}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {11, 0.08}, {11, 1.0}]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • Key – key type. Must be an integral type or a floating-point type.

  • Size – integral type that represents the problem size.

  • Decomposer – The type of the decomposer functor.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys[inout] reference to the double-buffer of keys, its current() contains the input range and will be updated to point to the output range.

  • size[in] number of element in the input range.

  • decomposer[in] decomposer functor that produces a tuple of references from the input key type.

  • begin_bit[in] index of the first (least significant) bit used in key comparison.

  • end_bit[in] past-the-end index (most significant) bit used in key comparison.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class Key, class Size, class Decomposer>
auto rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#

Parallel ascending radix sort primitive for device level.

radix_sort_keys function performs a device-wide radix sort of keys. Function sorts input keys in ascending order.

Overview

  • The contents of both buffers of keys may be altered by the sorting function.

  • current() of keys is used as the input.

  • The function will update current() of keys to point to the buffer that contains the output range.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • The function requires small temporary_storage as it does not need a temporary buffer of size elements.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) can be any trivially copyable type.

  • decomposer must be a functor that implements operator()(Key&) const. This operator must return a rocprim::tuple that contains one or more reference to value(s) of arithmetic types. These references must point to member variables of Key, however not every member variable has to be exposed this way.

  • Ranges specified by keys_input and keys_output must have at least size elements.

Example

In this example a device-level ascending radix sort is performed on an array of values of a custom type, using a custom decomposer.

#include <rocprim/rocprim.hpp>

struct custom_type
{
    int i;
    double d;
};

struct custom_type_decomposer
{
    rocprim::tuple<int&, double&> operator()(custom_type& key) const
    {
        return rocprim::tuple<int&, double&>(key.i, key.d);
    }
};

// Prepare input and tmp (declare pointers, allocate device memory etc.)
size_t input_size;   // e.g., 8
custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}]
custom_type * tmp;   // empty array of 8 elements
// Create double-buffer
rocprim::double_buffer<custom_type> keys(input, tmp);

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_keys(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, input_size, custom_type_decomposer{}
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_keys(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, input_size, custom_type_decomposer{}
);
// keys.current(): [{-3, 0.3}, {-1, 0.7}, {0, 0.2}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {11, 0.08}, {11, 1.0}]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • Key – key type. Must be an integral type or a floating-point type.

  • Size – integral type that represents the problem size.

  • Decomposer – The type of the decomposer functor.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys[inout] reference to the double-buffer of keys, its current() contains the input range and will be updated to point to the output range.

  • size[in] number of element in the input range.

  • decomposer[in] decomposer functor that produces a tuple of references from the input key type.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

Descending Sort#

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
hipError_t rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel descending radix sort primitive for device level.

radix_sort_keys_desc function performs a device-wide radix sort of keys. Function sorts input keys in descending order.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).

  • Ranges specified by keys_input and keys_output must have at least size elements.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level descending radix sort is performed on an array of integer values.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;    // e.g., 8
int * input;          // e.g., [6, 3, 5, 4, 2, 8, 1, 7]
int * output;         // empty array of 8 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_keys_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_keys_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size
);
// keys_output: [8, 7, 6, 5, 4, 3, 2, 1]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • Size – integral type that represents the problem size.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys_input[in] pointer to the first element in the range to sort.

  • keys_output[out] pointer to the first element in the output range.

  • size[in] number of element in the input range.

  • begin_bit[in] [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
auto rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#

Parallel descending radix sort primitive for device level.

radix_sort_keys_desc function performs a device-wide radix sort of keys. Function sorts input keys in descending order.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) can be any trivially copyable type.

  • decomposer must be a functor that implements operator()(Key&) const. This operator must return a rocprim::tuple that contains one or more reference to value(s) of arithmetic types. These references must point to member variables of Key, however not every member variable has to be exposed this way.

  • Ranges specified by keys_input and keys_output must have at least size elements.

  • begin_bit and end_bit can be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returns rocprim::tuple<int16_t&, uint8_t&>, begin_bit==6 and end_bit==12, then the 2 MSBs of the uint8_t value and the 4 LSBs of the int16_t value are considered for sorting. The range specified by begin_bit and end_bit must be valid with regards to the sizes of the return tuple’s elements.

Example

In this example a device-level descending radix sort is performed on an array of values of a custom type, using a custom decomposer.

#include <rocprim/rocprim.hpp>

struct custom_type
{
    int i;
    double d;
};

struct custom_type_decomposer
{
    rocprim::tuple<int&, double&> operator()(custom_type& key) const
    {
        return rocprim::tuple<int&, double&>(key.i, key.d);
    }
};

// Prepare input and output (declare pointers, allocate device memory etc.)
constexpr unsigned int begin_bit = 0;
constexpr unsigned int end_bit = 96;
size_t input_size;      // e.g., 8
custom_type * input;    // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}]
custom_type * output;   // empty array of 8 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_keys_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size, custom_type_decomposer{}, begin_bit, end_bit
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_keys_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size, custom_type_decomposer{}, begin_bit, end_bit
);
// keys_output: [{11, 1.0}, {11, 0.08}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.2}, {-1, 0.7}, {-3, 0.3},]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • Size – integral type that represents the problem size.

  • Key – The value type of the input and output iterators.

  • Decomposer – The type of the decomposer functor.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys_input[in] pointer to the first element in the range to sort.

  • keys_output[out] pointer to the first element in the output range.

  • size[in] number of element in the input range.

  • decomposer[in] decomposer functor that produces a tuple of references from the input key type.

  • begin_bit[in] index of the first (least significant) bit used in key comparison.

  • end_bit[in] past-the-end index (most significant) bit used in key comparison.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
auto rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#

Parallel descending radix sort primitive for device level.

radix_sort_keys_desc function performs a device-wide radix sort of keys. Function sorts input keys in descending order.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) can be any trivially copyable type.

  • decomposer must be a functor that implements operator()(Key&) const. This operator must return a rocprim::tuple that contains one or more reference to value(s) of arithmetic types. These references must point to member variables of Key, however not every member variable has to be exposed this way.

  • Ranges specified by keys_input and keys_output must have at least size elements.

Example

In this example a device-level descending radix sort is performed on an array of values of a custom type, using a custom decomposer.

#include <rocprim/rocprim.hpp>

struct custom_type
{
    int i;
    double d;
};

struct custom_type_decomposer
{
    rocprim::tuple<int&, double&> operator()(custom_type& key) const
    {
        return rocprim::tuple<int&, double&>(key.i, key.d);
    }
};

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;      // e.g., 8
custom_type * input;    // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}]
custom_type * output;   // empty array of 8 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_keys_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size, custom_type_decomposer{}
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_keys_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size, custom_type_decomposer{}
);
// keys_output: [{11, 1.0}, {11, 0.08}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.2}, {-1, 0.7}, {-3, 0.3},]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • Size – integral type that represents the problem size.

  • Key – The value type of the input and output iterators.

  • Decomposer – The type of the decomposer functor.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys_input[in] pointer to the first element in the range to sort.

  • keys_output[out] pointer to the first element in the output range.

  • size[in] number of element in the input range.

  • decomposer[in] decomposer functor that produces a tuple of references from the input key type.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class Key, class Size>
hipError_t rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel descending radix sort primitive for device level.

radix_sort_keys_desc function performs a device-wide radix sort of keys. Function sorts input keys in descending order.

Overview

  • The contents of both buffers of keys may be altered by the sorting function.

  • current() of keys is used as the input.

  • The function will update current() of keys to point to the buffer that contains the output range.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • The function requires small temporary_storage as it does not need a temporary buffer of size elements.

  • Key type must be an arithmetic type (that is, an integral type or a floating-point type).

  • Buffers of keys must have at least size elements.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level descending radix sort is performed on an array of integer values.

#include <rocprim/rocprim.hpp>

// Prepare input and tmp (declare pointers, allocate device memory etc.)
size_t input_size;  // e.g., 8
int * input;        // e.g., [6, 3, 5, 4, 2, 8, 1, 7]
int * tmp;          // empty array of 8 elements
// Create double-buffer
rocprim::double_buffer<int> keys(input, tmp);

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_keys_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, input_size
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_keys_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, input_size
);
// keys.current(): [8, 7, 6, 5, 4, 3, 2, 1]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • Key – key type. Must be an integral type or a floating-point type.

  • Size – integral type that represents the problem size.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys[inout] reference to the double-buffer of keys, its current() contains the input range and will be updated to point to the output range.

  • size[in] number of element in the input range.

  • begin_bit[in] [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class Key, class Size, class Decomposer>
auto rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#

Parallel descending radix sort primitive for device level.

radix_sort_keys_desc function performs a device-wide radix sort of keys. Function sorts input keys in descending order.

Overview

  • The contents of both buffers of keys may be altered by the sorting function.

  • current() of keys is used as the input.

  • The function will update current() of keys to point to the buffer that contains the output range.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • The function requires small temporary_storage as it does not need a temporary buffer of size elements.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) can be any trivially copyable type.

  • decomposer must be a functor that implements operator()(Key&) const. This operator must return a rocprim::tuple that contains one or more reference to value(s) of arithmetic types. These references must point to member variables of Key, however not every member variable has to be exposed this way.

  • Ranges specified by keys_input and keys_output must have at least size elements.

  • begin_bit and end_bit can be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returns rocprim::tuple<int16_t&, uint8_t&>, begin_bit==6 and end_bit==12, then the 2 MSBs of the uint8_t value and the 4 LSBs of the int16_t value are considered for sorting. The range specified by begin_bit and end_bit must be valid with regards to the sizes of the return tuple’s elements.

Example

In this example a device-level descending radix sort is performed on an array of values of a custom type, using a custom decomposer.

#include <rocprim/rocprim.hpp>

struct custom_type
{
    int i;
    double d;
};

struct custom_type_decomposer
{
    rocprim::tuple<int&, double&> operator()(custom_type& key) const
    {
        return rocprim::tuple<int&, double&>(key.i, key.d);
    }
};

// Prepare input and tmp (declare pointers, allocate device memory etc.)
constexpr unsigned int begin_bit = 0;
constexpr unsigned int end_bit = 96;
size_t input_size;   // e.g., 8
custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}]
custom_type * tmp;   // empty array of 8 elements
// Create double-buffer
rocprim::double_buffer<custom_type> keys(input, tmp);

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_keys_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, input_size, custom_type_decomposer{}, begin_bit, end_bit
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_keys_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, input_size, custom_type_decomposer{}
);
// keys.current(): [{11, 1.0}, {11, 0.08}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.2}, {-1, 0.7}, {-3, 0.3},]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • Key – key type. Must be an integral type or a floating-point type.

  • Size – integral type that represents the problem size.

  • Decomposer – The type of the decomposer functor.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys[inout] reference to the double-buffer of keys, its current() contains the input range and will be updated to point to the output range.

  • size[in] number of element in the input range.

  • decomposer[in] decomposer functor that produces a tuple of references from the input key type.

  • begin_bit[in] [optional] index of the first (least significant) bit used in key comparison. Defaults to 0.

  • end_bit[in] [optional] past-the-end index (most significant) bit used in key comparison. Defaults to the size of the decomposed tuple’s bit range.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class Key, class Size, class Decomposer>
auto rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#

Parallel descending radix sort primitive for device level.

radix_sort_keys_desc function performs a device-wide radix sort of keys. Function sorts input keys in descending order.

Overview

  • The contents of both buffers of keys may be altered by the sorting function.

  • current() of keys is used as the input.

  • The function will update current() of keys to point to the buffer that contains the output range.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • The function requires small temporary_storage as it does not need a temporary buffer of size elements.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) can be any trivially copyable type.

  • decomposer must be a functor that implements operator()(Key&) const. This operator must return a rocprim::tuple that contains one or more reference to value(s) of arithmetic types. These references must point to member variables of Key, however not every member variable has to be exposed this way.

  • Ranges specified by keys_input and keys_output must have at least size elements.

Example

In this example a device-level descending radix sort is performed on an array of values of a custom type, using a custom decomposer.

#include <rocprim/rocprim.hpp>

struct custom_type
{
    int i;
    double d;
};

struct custom_type_decomposer
{
    rocprim::tuple<int&, double&> operator()(custom_type& key) const
    {
        return rocprim::tuple<int&, double&>(key.i, key.d);
    }
};

// Prepare input and tmp (declare pointers, allocate device memory etc.)
size_t input_size;   // e.g., 8
custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}]
custom_type * tmp;   // empty array of 8 elements
// Create double-buffer
rocprim::double_buffer<custom_type> keys(input, tmp);

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_keys_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, input_size, custom_type_decomposer{}
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_keys_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, input_size, custom_type_decomposer{}
);
// keys.current(): [{11, 1.0}, {11, 0.08}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.2}, {-1, 0.7}, {-3, 0.3},]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • Key – key type. Must be an integral type or a floating-point type.

  • Size – integral type that represents the problem size.

  • Decomposer – The type of the decomposer functor.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys[inout] reference to the double-buffer of keys, its current() contains the input range and will be updated to point to the output range.

  • size[in] number of element in the input range.

  • decomposer[in] decomposer functor that produces a tuple of references from the input key type.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

Segmented Ascending Sort#

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_keys(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel ascending radix sort primitive for device level.

segmented_radix_sort_keys function performs a device-wide radix sort across multiple, non-overlapping sequences of keys. Function sorts input keys in ascending order.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).

  • Ranges specified by keys_input and keys_output must have at least size elements.

  • Ranges specified by begin_offsets and end_offsets must have at least segments elements. They may use the same sequence offsets of at least segments + 1 elements: offsets for begin_offsets and offsets + 1 for end_offsets.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level ascending radix sort is performed on an array of float values.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;      // e.g., 8
float * input;          // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7]
float * output;         // empty array of 8 elements
unsigned int segments;  // e.g., 3
int * offsets;          // e.g. [0, 2, 3, 8]

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::segmented_radix_sort_keys(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size,
    segments, offsets, offsets + 1
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::segmented_radix_sort_keys(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size,
    segments, offsets, offsets + 1
);
// keys_output: [0.3, 0.6, 0.65, 0.08, 0.2, 0.4, 0.7, 1]

Template Parameters:
  • Config – - [optional] configuration of the primitive. It has to be segmented_radix_sort_config or a class derived from it.

  • KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • keys_input[in] - pointer to the first element in the range to sort.

  • keys_output[out] - pointer to the first element in the output range.

  • size[in] - number of element in the input range.

  • segments[in] - number of segments in the input range.

  • begin_offsets[in] - iterator to the first element in the range of beginning offsets.

  • end_offsets[in] - iterator to the first element in the range of ending offsets.

  • begin_bit[in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

Segmented Descending Sort#

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel descending radix sort primitive for device level.

segmented_radix_sort_keys_desc function performs a device-wide radix sort across multiple, non-overlapping sequences of keys. Function sorts input keys in descending order.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).

  • Ranges specified by keys_input and keys_output must have at least size elements.

  • Ranges specified by begin_offsets and end_offsets must have at least segments elements. They may use the same sequence offsets of at least segments + 1 elements: offsets for begin_offsets and offsets + 1 for end_offsets.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level descending radix sort is performed on an array of integer values.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;      // e.g., 8
int * input;            // e.g., [6, 3, 5, 4, 2, 8, 1, 7]
int * output;           // empty array of 8 elements
unsigned int segments;  // e.g., 3
int * offsets;          // e.g. [0, 2, 3, 8]

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::segmented_radix_sort_keys_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size,
    segments, offsets, offsets + 1
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::segmented_radix_sort_keys_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, input_size,
    segments, offsets, offsets + 1
);
// keys_output: [6, 3, 5, 8, 7, 4, 2, 1]

Template Parameters:
  • Config – - [optional] configuration of the primitive. It has to be segmented_radix_sort_config or a class derived from it.

  • KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • keys_input[in] - pointer to the first element in the range to sort.

  • keys_output[out] - pointer to the first element in the output range.

  • size[in] - number of element in the input range.

  • segments[in] - number of segments in the input range.

  • begin_offsets[in] - iterator to the first element in the range of beginning offsets.

  • end_offsets[in] - iterator to the first element in the range of ending offsets.

  • begin_bit[in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

radix_sort_pairs#

Ascending Sort#

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
hipError_t rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel ascending radix sort-by-key primitive for device level.

radix_sort_pairs function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).

  • Ranges specified by keys_input, keys_output, values_input and values_output must have at least size elements.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level ascending radix sort is performed where input keys are represented by an array of unsigned integers and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
unsigned int * keys_input;  // e.g., [ 6, 3,  5, 4,  1,  8,  1, 7]
double * values_input;      // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
unsigned int * keys_output; // empty array of 8 elements
double * values_output;     // empty array of 8 elements

// Keys are in range [0; 8], so we can limit compared bit to bits on indexes
// 0, 1, 2, 3, and 4. In order to do this begin_bit is set to 0 and end_bit
// is set to 5.

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_pairs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size, 0, 5
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_pairs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size, 0, 5
);
// keys_output:   [ 1,  1, 3, 4,  5,  6, 7,  8]
// values_output: [-1, -2, 2, 3, -4, -5, 7, -8]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • ValuesInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • ValuesOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • Size – integral type that represents the problem size.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys_input[in] pointer to the first element in the range to sort.

  • keys_output[out] pointer to the first element in the output range.

  • values_input[in] pointer to the first element in the range to sort.

  • values_output[out] pointer to the first element in the output range.

  • size[in] number of element in the input range.

  • begin_bit[in] [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
auto rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#

Parallel ascending radix sort-by-key primitive for device level.

radix_sort_pairs function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) can be any trivially copyable type.

  • decomposer must be a functor that implements operator()(Key&) const. This operator must return a rocprim::tuple that contains one or more reference to value(s) of arithmetic types. These references must point to member variables of Key, however not every member variable has to be exposed this way.

  • Ranges specified by keys_input and keys_output must have at least size elements.

  • begin_bit and end_bit can be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returns rocprim::tuple<int16_t&, uint8_t&>, begin_bit==6 and end_bit==12, then the 2 MSBs of the uint8_t value and the 4 LSBs of the int16_t value are considered for sorting. The range specified by begin_bit and end_bit must be valid with regards to the sizes of the return tuple’s elements.

Example

In this example a device-level ascending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

struct custom_type
{
    int i;
    double d;
};

struct custom_type_decomposer
{
    rocprim::tuple<int&, double&> operator()(custom_type& key) const
    {
        return rocprim::tuple<int&, double&>(key.i, key.d);
    }
};

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
custom_type * keys_input;   // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}]
double * values_input;      // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
custom_type * keys_output;  // empty array of 8 elements
double * values_output;     // empty array of 8 elements

// The integer field of the keys is in range 0-11, which can be represented on 4 bits,
// while for the double member we must specify full bit range [0; 63]. Therefore begin_bit
// is set to 0 and end_bit is set to 68.
constexpr unsigned int begin_bit = 0;
constexpr unsigned int end_bit = 68;

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_pairs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size, custom_type_decomposer{}, begin_bit, end_bit
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_pairs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size, custom_type_decomposer{}, begin_bit, end_bit
);
// keys_output:   [{0, 0.2}, {0, 0.3}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {5, 0.7}, {11, 0.08}, {11, 1.0}]
// values_output: [-1, 2, 3, -5, -4, 7, -8, -2]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • ValuesInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • ValuesOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • Size – integral type that represents the problem size.

  • Key – The value type of the input and output iterators.

  • Decomposer – The type of the decomposer functor.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys_input[in] pointer to the first element in the range to sort.

  • keys_output[out] pointer to the first element in the output range.

  • values_input[in] pointer to the first element in the range to sort.

  • values_output[out] pointer to the first element in the output range.

  • size[in] number of element in the input range.

  • decomposer[in] decomposer functor that produces a tuple of references from the input key type.

  • begin_bit[in] index of the first (least significant) bit used in key comparison.

  • end_bit[in] past-the-end index (most significant) bit used in key comparison.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
auto rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#

Parallel ascending radix sort-by-key primitive for device level.

radix_sort_pairs function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) can be any trivially copyable type.

  • decomposer must be a functor that implements operator()(Key&) const. This operator must return a rocprim::tuple that contains one or more reference to value(s) of arithmetic types. These references must point to member variables of Key, however not every member variable has to be exposed this way.

  • Ranges specified by keys_input and keys_output must have at least size elements.

Example

In this example a device-level ascending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

struct custom_type
{
    int i;
    double d;
};

struct custom_type_decomposer
{
    rocprim::tuple<int&, double&> operator()(custom_type& key) const
    {
        return rocprim::tuple<int&, double&>(key.i, key.d);
    }
};

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
custom_type * keys_input;   // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}]
double * values_input;      // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
custom_type * keys_output;  // empty array of 8 elements
double * values_output;     // empty array of 8 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_pairs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size, custom_type_decomposer{}
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_pairs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size, custom_type_decomposer{}
);
// keys_output:   [{0, 0.2}, {0, 0.3}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {5, 0.7}, {11, 0.08}, {11, 1.0}]
// values_output: [-1, 2, 3, -5, -4, 7, -8, -2]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • ValuesInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • ValuesOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • Size – integral type that represents the problem size.

  • Key – The value type of the input and output iterators.

  • Decomposer – The type of the decomposer functor.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys_input[in] pointer to the first element in the range to sort.

  • keys_output[out] pointer to the first element in the output range.

  • values_input[in] pointer to the first element in the range to sort.

  • values_output[out] pointer to the first element in the output range.

  • size[in] number of element in the input range.

  • decomposer[in] decomposer functor that produces a tuple of references from the input key type.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class Key, class Value, class Size>
hipError_t rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel ascending radix sort-by-key primitive for device level.

radix_sort_pairs function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.

Overview

  • The contents of both buffers of keys and values may be altered by the sorting function.

  • current() of keys and values are used as the input.

  • The function will update current() of keys and values to point to buffers that contains the output range.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • The function requires small temporary_storage as it does not need a temporary buffer of size elements.

  • Key type must be an arithmetic type (that is, an integral type or a floating-point type).

  • Buffers of keys must have at least size elements.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level ascending radix sort is performed where input keys are represented by an array of unsigned integers and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

// Prepare input and tmp (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
unsigned int * keys_input;  // e.g., [ 6, 3,  5, 4,  1,  8,  1, 7]
double * values_input;      // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
unsigned int * keys_tmp;    // empty array of 8 elements
double*  values_tmp;        // empty array of 8 elements
// Create double-buffers
rocprim::double_buffer<unsigned int> keys(keys_input, keys_tmp);
rocprim::double_buffer<double> values(values_input, values_tmp);

// Keys are in range [0; 8], so we can limit compared bit to bits on indexes
// 0, 1, 2, 3, and 4. In order to do this begin_bit is set to 0 and end_bit
// is set to 5.

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_pairs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, values, input_size,
    0, 5
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_pairs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, values, input_size,
    0, 5
);
// keys.current():   [ 1,  1, 3, 4,  5,  6, 7,  8]
// values.current(): [-1, -2, 2, 3, -4, -5, 7, -8]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • Key – key type. Must be an integral type or a floating-point type.

  • Value – value type.

  • Size – integral type that represents the problem size.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys[inout] reference to the double-buffer of keys, its current() contains the input range and will be updated to point to the output range.

  • values[inout] reference to the double-buffer of values, its current() contains the input range and will be updated to point to the output range.

  • size[in] number of element in the input range.

  • begin_bit[in] [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class Key, class Value, class Size, class Decomposer>
auto rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#

Parallel ascending radix sort-by-key primitive for device level.

radix_sort_pairs function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.

Overview

  • The contents of both buffers of keys and values may be altered by the sorting function.

  • current() of keys and values are used as the input.

  • The function will update current() of keys and values to point to buffers that contains the output range.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • The function requires small temporary_storage as it does not need a temporary buffer of size elements.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) can be any trivially copyable type.

  • decomposer must be a functor that implements operator()(Key&) const. This operator must return a rocprim::tuple that contains one or more reference to value(s) of arithmetic types. These references must point to member variables of Key, however not every member variable has to be exposed this way.

  • Ranges specified by keys_input and keys_output must have at least size elements.

  • begin_bit and end_bit can be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returns rocprim::tuple<int16_t&, uint8_t&>, begin_bit==6 and end_bit==12, then the 2 MSBs of the uint8_t value and the 4 LSBs of the int16_t value are considered for sorting. The range specified by begin_bit and end_bit must be valid with regards to the sizes of the return tuple’s elements.

Example

In this example a device-level ascending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

struct custom_type
{
    int i;
    double d;
};

struct custom_type_decomposer
{
    rocprim::tuple<int&, double&> operator()(custom_type& key) const
    {
        return rocprim::tuple<int&, double&>(key.i, key.d);
    }
};

// Prepare input and tmp (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
custom_type * keys_input;   // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}]
double * values_input;      // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
custom_type * keys_tmp;     // empty array of 8 elements
double*  values_tmp;        // empty array of 8 elements
// Create double-buffers
rocprim::double_buffer<custom_type> keys(keys_input, keys_tmp);
rocprim::double_buffer<double> values(values_input, values_tmp);

// The integer field of the keys is in range 0-11, which can be represented on 4 bits,
// while for the double member we must specify full bit range [0; 63]. Therefore begin_bit
// is set to 0 and end_bit is set to 68.
constexpr unsigned int begin_bit = 0;
constexpr unsigned int end_bit = 68;

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_pairs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, values, input_size, custom_type_decomposer{}, begin_bit, end_bit
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_pairs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, values, input_size, custom_type_decomposer{}, begin_bit, end_bit
);
// keys.current():   [{0, 0.2}, {0, 0.3}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {5, 0.7}, {11, 0.08}, {11, 1.0}]
// values.current(): [-1, 2, 3, -5, -4, 7, -8, -2]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • Key – key type. Must be an integral type or a floating-point type.

  • Value – value type.

  • Size – integral type that represents the problem size.

  • Decomposer – The type of the decomposer functor.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys[inout] reference to the double-buffer of keys, its current() contains the input range and will be updated to point to the output range.

  • values[inout] reference to the double-buffer of values, its current() contains the input range and will be updated to point to the output range.

  • size[in] number of element in the input range.

  • decomposer[in] decomposer functor that produces a tuple of references from the input key type.

  • begin_bit[in] index of the first (least significant) bit used in key comparison.

  • end_bit[in] past-the-end index (most significant) bit used in key comparison. Defaults to the size of the decomposed tuple’s bit range.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class Key, class Value, class Size, class Decomposer>
auto rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#

Parallel ascending radix sort-by-key primitive for device level.

radix_sort_pairs function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.

Overview

  • The contents of both buffers of keys and values may be altered by the sorting function.

  • current() of keys and values are used as the input.

  • The function will update current() of keys and values to point to buffers that contains the output range.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • The function requires small temporary_storage as it does not need a temporary buffer of size elements.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) can be any trivially copyable type.

  • decomposer must be a functor that implements operator()(Key&) const. This operator must return a rocprim::tuple that contains one or more reference to value(s) of arithmetic types. These references must point to member variables of Key, however not every member variable has to be exposed this way.

  • Ranges specified by keys_input and keys_output must have at least size elements.

Example

In this example a device-level ascending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

struct custom_type
{
    int i;
    double d;
};

struct custom_type_decomposer
{
    rocprim::tuple<int&, double&> operator()(custom_type& key) const
    {
        return rocprim::tuple<int&, double&>(key.i, key.d);
    }
};

// Prepare input and tmp (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
custom_type * keys_input;   // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}]
double * values_input;      // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
custom_type * keys_tmp;     // empty array of 8 elements
double*  values_tmp;        // empty array of 8 elements
// Create double-buffers
rocprim::double_buffer<custom_type> keys(keys_input, keys_tmp);
rocprim::double_buffer<double> values(values_input, values_tmp);

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_pairs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, values, input_size, custom_type_decomposer{}
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_pairs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, values, input_size, custom_type_decomposer{}
);
// keys.current():   [{0, 0.2}, {0, 0.3}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {5, 0.7}, {11, 0.08}, {11, 1.0}]
// values.current(): [-1, 2, 3, -5, -4, 7, -8, -2]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • Key – key type. Must be an integral type or a floating-point type.

  • Value – value type.

  • Size – integral type that represents the problem size.

  • Decomposer – The type of the decomposer functor.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys[inout] reference to the double-buffer of keys, its current() contains the input range and will be updated to point to the output range.

  • values[inout] reference to the double-buffer of values, its current() contains the input range and will be updated to point to the output range.

  • size[in] number of element in the input range.

  • decomposer[in] decomposer functor that produces a tuple of references from the input key type.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

Descending Sort#

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
hipError_t rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel descending radix sort-by-key primitive for device level.

radix_sort_pairs_desc function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).

  • Ranges specified by keys_input, keys_output, values_input and values_output must have at least size elements.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level descending radix sort is performed where input keys are represented by an array of integers and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;       // e.g., 8
int * keys_input;        // e.g., [ 6, 3,  5, 4,  1,  8,  1, 7]
double * values_input;   // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
int * keys_output;       // empty array of 8 elements
double * values_output;  // empty array of 8 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_pairs_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_pairs_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size
);
// keys_output:   [ 8, 7,  6,  5, 4, 3,  1,  1]
// values_output: [-8, 7, -5, -4, 3, 2, -1, -2]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • ValuesInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • ValuesOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • Size – integral type that represents the problem size.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys_input[in] pointer to the first element in the range to sort.

  • keys_output[out] pointer to the first element in the output range.

  • values_input[in] pointer to the first element in the range to sort.

  • values_output[out] pointer to the first element in the output range.

  • size[in] number of element in the input range.

  • begin_bit[in] [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
auto rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#

Parallel descending radix sort-by-key primitive for device level.

radix_sort_pairs_desc function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) can be any trivially copyable type.

  • decomposer must be a functor that implements operator()(Key&) const. This operator must return a rocprim::tuple that contains one or more reference to value(s) of arithmetic types. These references must point to member variables of Key, however not every member variable has to be exposed this way.

  • Ranges specified by keys_input and keys_output must have at least size elements.

  • begin_bit and end_bit can be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returns rocprim::tuple<int16_t&, uint8_t&>, begin_bit==6 and end_bit==12, then the 2 MSBs of the uint8_t value and the 4 LSBs of the int16_t value are considered for sorting. The range specified by begin_bit and end_bit must be valid with regards to the sizes of the return tuple’s elements.

Example

In this example a device-level descending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

struct custom_type
{
    int i;
    double d;
};

struct custom_type_decomposer
{
    rocprim::tuple<int&, double&> operator()(custom_type& key) const
    {
        return rocprim::tuple<int&, double&>(key.i, key.d);
    }
};

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
custom_type * keys_input;   // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}]
double * values_input;      // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
custom_type * keys_output;  // empty array of 8 elements
double * values_output;     // empty array of 8 elements

// The integer field of the keys is in range 0-11, which can be represented on 4 bits,
// while for the double member we must specify full bit range [0; 63]. Therefore begin_bit
// is set to 0 and end_bit is set to 68.
constexpr unsigned int begin_bit = 0;
constexpr unsigned int end_bit = 68;

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_pairs_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size, custom_type_decomposer{}, begin_bit, end_bit
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_pairs_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size, custom_type_decomposer{}, begin_bit, end_bit
);
// keys_output:   [{11, 1.0}, {11, 0.08}, {5, 0.7}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.3}, {0, 0.2}]
// values_output: [-2, -1, 2, 3, -4, -5, 7, -8]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • ValuesInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • ValuesOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • Size – integral type that represents the problem size.

  • Key – The value type of the input and output iterators.

  • Decomposer – The type of the decomposer functor.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys_input[in] pointer to the first element in the range to sort.

  • keys_output[out] pointer to the first element in the output range.

  • values_input[in] pointer to the first element in the range to sort.

  • values_output[out] pointer to the first element in the output range.

  • size[in] number of element in the input range.

  • decomposer[in] decomposer functor that produces a tuple of references from the input key type.

  • begin_bit[in] index of the first (least significant) bit used in key comparison.

  • end_bit[in] past-the-end index (most significant) bit used in key comparison.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
auto rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#

Parallel descending radix sort-by-key primitive for device level.

radix_sort_pairs_desc function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) can be any trivially copyable type.

  • decomposer must be a functor that implements operator()(Key&) const. This operator must return a rocprim::tuple that contains one or more reference to value(s) of arithmetic types. These references must point to member variables of Key, however not every member variable has to be exposed this way.

  • Ranges specified by keys_input and keys_output must have at least size elements.

Example

In this example a device-level descending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

struct custom_type
{
    int i;
    double d;
};

struct custom_type_decomposer
{
    rocprim::tuple<int&, double&> operator()(custom_type& key) const
    {
        return rocprim::tuple<int&, double&>(key.i, key.d);
    }
};

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
custom_type * keys_input;   // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}]
double * values_input;      // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
custom_type * keys_output;  // empty array of 8 elements
double * values_output;     // empty array of 8 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_pairs_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size, custom_type_decomposer{}
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_pairs_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size, custom_type_decomposer{}
);
// keys_output:   [{11, 1.0}, {11, 0.08}, {5, 0.7}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.3}, {0, 0.2}]
// values_output: [-2, -1, 2, 3, -4, -5, 7, -8]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • ValuesInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • ValuesOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • Size – integral type that represents the problem size.

  • Key – The value type of the input and output iterators.

  • Decomposer – The type of the decomposer functor.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys_input[in] pointer to the first element in the range to sort.

  • keys_output[out] pointer to the first element in the output range.

  • values_input[in] pointer to the first element in the range to sort.

  • values_output[out] pointer to the first element in the output range.

  • size[in] number of element in the input range.

  • decomposer[in] decomposer functor that produces a tuple of references from the input key type.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class Key, class Value, class Size>
hipError_t rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel descending radix sort-by-key primitive for device level.

radix_sort_pairs_desc function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.

Overview

  • The contents of both buffers of keys and values may be altered by the sorting function.

  • current() of keys and values are used as the input.

  • The function will update current() of keys and values to point to buffers that contains the output range.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • The function requires small temporary_storage as it does not need a temporary buffer of size elements.

  • Key type must be an arithmetic type (that is, an integral type or a floating-point type).

  • Buffers of keys must have at least size elements.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level descending radix sort is performed where input keys are represented by an array of integers and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

// Prepare input and tmp (declare pointers, allocate device memory etc.)
size_t input_size;       // e.g., 8
int * keys_input;        // e.g., [ 6, 3,  5, 4,  1,  8,  1, 7]
double * values_input;   // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
int * keys_tmp;          // empty array of 8 elements
double * values_tmp;     // empty array of 8 elements
// Create double-buffers
rocprim::double_buffer<int> keys(keys_input, keys_tmp);
rocprim::double_buffer<double> values(values_input, values_tmp);

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_pairs_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, values, input_size
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_pairs_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, values, input_size
);
// keys.current():   [ 8, 7,  6,  5, 4, 3,  1,  1]
// values.current(): [-8, 7, -5, -4, 3, 2, -1, -2]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • Key – key type. Must be an integral type or a floating-point type.

  • Value – value type.

  • Size – integral type that represents the problem size.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys[inout] reference to the double-buffer of keys, its current() contains the input range and will be updated to point to the output range.

  • values[inout] reference to the double-buffer of values, its current() contains the input range and will be updated to point to the output range.

  • size[in] number of element in the input range.

  • begin_bit[in] [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class Key, class Value, class Size, class Decomposer>
auto rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#

Parallel descending radix sort-by-key primitive for device level.

radix_sort_pairs_desc function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.

Overview

  • The contents of both buffers of keys and values may be altered by the sorting function.

  • current() of keys and values are used as the input.

  • The function will update current() of keys and values to point to buffers that contains the output range.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • The function requires small temporary_storage as it does not need a temporary buffer of size elements.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) can be any trivially copyable type.

  • decomposer must be a functor that implements operator()(Key&) const. This operator must return a rocprim::tuple that contains one or more reference to value(s) of arithmetic types. These references must point to member variables of Key, however not every member variable has to be exposed this way.

  • Ranges specified by keys_input and keys_output must have at least size elements.

  • begin_bit and end_bit can be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returns rocprim::tuple<int16_t&, uint8_t&>, begin_bit==6 and end_bit==12, then the 2 MSBs of the uint8_t value and the 4 LSBs of the int16_t value are considered for sorting. The range specified by begin_bit and end_bit must be valid with regards to the sizes of the return tuple’s elements.

Example

In this example a device-level descending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

struct custom_type
{
    int i;
    double d;
};

struct custom_type_decomposer
{
    rocprim::tuple<int&, double&> operator()(custom_type& key) const
    {
        return rocprim::tuple<int&, double&>(key.i, key.d);
    }
};

// Prepare input and tmp (declare pointers, allocate device memory etc.)
size_t input_size;        // e.g., 8
custom_type * keys_input; // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}]
double * values_input;    // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
custom_type * keys_tmp;   // empty array of 8 elements
double * values_tmp;      // empty array of 8 elements
// Create double-buffers
rocprim::double_buffer<custom_type> keys(keys_input, keys_tmp);
rocprim::double_buffer<double> values(values_input, values_tmp);

// The integer field of the keys is in range 0-11, which can be represented on 4 bits,
// while for the double member we must specify full bit range [0; 63]. Therefore begin_bit
// is set to 0 and end_bit is set to 68.
constexpr unsigned int begin_bit = 0;
constexpr unsigned int end_bit = 68;

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_pairs_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, values, input_size, custom_type_decomposer{}, begin_bit, end_bit
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_pairs_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, values, input_size, custom_type_decomposer{}, begin_bit, end_bit
);
// keys.current():   [{11, 1.0}, {11, 0.08}, {5, 0.7}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.3}, {0, 0.2}]
// values.current(): [-2, -1, 2, 3, -4, -5, 7, -8]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • Key – key type. Must be an integral type or a floating-point type.

  • Value – value type.

  • Size – integral type that represents the problem size.

  • Decomposer – The type of the decomposer functor.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys[inout] reference to the double-buffer of keys, its current() contains the input range and will be updated to point to the output range.

  • values[inout] reference to the double-buffer of values, its current() contains the input range and will be updated to point to the output range.

  • size[in] number of element in the input range.

  • decomposer[in] decomposer functor that produces a tuple of references from the input key type.

  • begin_bit[in] [optional] index of the first (least significant) bit used in key comparison. Defaults to 0.

  • end_bit[in] [optional] past-the-end index (most significant) bit used in key comparison. Defaults to the size of the decomposed tuple’s bit range.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class Key, class Value, class Size, class Decomposer>
auto rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#

Parallel descending radix sort-by-key primitive for device level.

radix_sort_pairs_desc function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.

Overview

  • The contents of both buffers of keys and values may be altered by the sorting function.

  • current() of keys and values are used as the input.

  • The function will update current() of keys and values to point to buffers that contains the output range.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • The function requires small temporary_storage as it does not need a temporary buffer of size elements.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) can be any trivially copyable type.

  • decomposer must be a functor that implements operator()(Key&) const. This operator must return a rocprim::tuple that contains one or more reference to value(s) of arithmetic types. These references must point to member variables of Key, however not every member variable has to be exposed this way.

  • Ranges specified by keys_input and keys_output must have at least size elements.

Example

In this example a device-level descending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

struct custom_type
{
    int i;
    double d;
};

struct custom_type_decomposer
{
    rocprim::tuple<int&, double&> operator()(custom_type& key) const
    {
        return rocprim::tuple<int&, double&>(key.i, key.d);
    }
};

// Prepare input and tmp (declare pointers, allocate device memory etc.)
size_t input_size;        // e.g., 8
custom_type * keys_input; // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}]
double * values_input;    // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
custom_type * keys_tmp;   // empty array of 8 elements
double * values_tmp;      // empty array of 8 elements
// Create double-buffers
rocprim::double_buffer<custom_type> keys(keys_input, keys_tmp);
rocprim::double_buffer<double> values(values_input, values_tmp);

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::radix_sort_pairs_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, values, input_size, custom_type_decomposer{}
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::radix_sort_pairs_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys, values, input_size, custom_type_decomposer{}
);
// keys.current():   [{11, 1.0}, {11, 0.08}, {5, 0.7}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.3}, {0, 0.2}]
// values.current(): [-2, -1, 2, 3, -4, -5, 7, -8]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be radix_sort_config or a class derived from it.

  • Key – key type. Must be an integral type or a floating-point type.

  • Value – value type.

  • Size – integral type that represents the problem size.

  • Decomposer – The type of the decomposer functor.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys[inout] reference to the double-buffer of keys, its current() contains the input range and will be updated to point to the output range.

  • values[inout] reference to the double-buffer of values, its current() contains the input range and will be updated to point to the output range.

  • size[in] number of element in the input range.

  • decomposer[in] decomposer functor that produces a tuple of references from the input key type.

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

Segmented Ascending Sort#

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel ascending radix sort-by-key primitive for device level.

segmented_radix_sort_pairs_desc function performs a device-wide radix sort across multiple, non-overlapping sequences of (key, value) pairs. Function sorts input pairs in ascending order of keys.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).

  • Ranges specified by keys_input, keys_output, values_input and values_output must have at least size elements.

  • Ranges specified by begin_offsets and end_offsets must have at least segments elements. They may use the same sequence offsets of at least segments + 1 elements: offsets for begin_offsets and offsets + 1 for end_offsets.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level ascending radix sort is performed where input keys are represented by an array of unsigned integers and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
unsigned int * keys_input;  // e.g., [ 6, 3,  5, 4,  1,  8,  1, 7]
double * values_input;      // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
unsigned int * keys_output; // empty array of 8 elements
double * values_output;     // empty array of 8 elements
unsigned int segments;      // e.g., 3
int * offsets;              // e.g. [0, 2, 3, 8]

// Keys are in range [0; 8], so we can limit compared bit to bits on indexes
// 0, 1, 2, 3, and 4. In order to do this begin_bit is set to 0 and end_bit
// is set to 5.

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::segmented_radix_sort_pairs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output, input_size,
    segments, offsets, offsets + 1,
    0, 5
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::segmented_radix_sort_pairs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output, input_size,
    segments, offsets, offsets + 1,
    0, 5
);
// keys_output:   [3,  6,  5,  1,  1, 4, 7,  8]
// values_output: [2, -5, -4, -1, -2, 3, 7, -8]

Template Parameters:
  • Config – - [optional] configuration of the primitive. It has to be segmented_radix_sort_config or a class derived from it.

  • KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • keys_input[in] - pointer to the first element in the range to sort.

  • keys_output[out] - pointer to the first element in the output range.

  • values_input[in] - pointer to the first element in the range to sort.

  • values_output[out] - pointer to the first element in the output range.

  • size[in] - number of element in the input range.

  • segments[in] - number of segments in the input range.

  • begin_offsets[in] - iterator to the first element in the range of beginning offsets.

  • end_offsets[in] - iterator to the first element in the range of ending offsets.

  • begin_bit[in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

Segmented Descending Sort#

template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel descending radix sort-by-key primitive for device level.

segmented_radix_sort_pairs_desc function performs a device-wide radix sort across multiple, non-overlapping sequences of (key, value) pairs. Function sorts input pairs in descending order of keys.

Overview

  • The contents of the inputs are not altered by the sorting function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Key type (a value_type of KeysInputIterator and KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).

  • Ranges specified by keys_input, keys_output, values_input and values_output must have at least size elements.

  • Ranges specified by begin_offsets and end_offsets must have at least segments elements. They may use the same sequence offsets of at least segments + 1 elements: offsets for begin_offsets and offsets + 1 for end_offsets.

  • If Key is an integer type and the range of keys is known in advance, the performance can be improved by setting begin_bit and end_bit, for example if all keys are in range [100, 10000], begin_bit = 0 and end_bit = 14 will cover the whole range.

Example

In this example a device-level descending radix sort is performed where input keys are represented by an array of integers and input values by an array of doubles.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;       // e.g., 8
int * keys_input;        // e.g., [ 6, 3,  5, 4,  1,  8,  1, 7]
double * values_input;   // e.g., [-5, 2, -4, 3, -1, -8, -2, 7]
int * keys_output;       // empty array of 8 elements
double * values_output;  // empty array of 8 elements
unsigned int segments;   // e.g., 3
int * offsets;           // e.g. [0, 2, 3, 8]

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::segmented_radix_sort_pairs_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size,
    segments, offsets, offsets + 1
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform sort
rocprim::segmented_radix_sort_pairs_desc(
    temporary_storage_ptr, temporary_storage_size_bytes,
    keys_input, keys_output, values_input, values_output,
    input_size,
    segments, offsets, offsets + 1
);
// keys_output:   [ 6, 3,  5,  8, 7, 4,  1,  1]
// values_output: [-5, 2, -4, -8, 7, 3, -1, -2]

Template Parameters:
  • Config – - [optional] configuration of the primitive. It has to be segmented_radix_sort_config or a class derived from it.

  • KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • keys_input[in] - pointer to the first element in the range to sort.

  • keys_output[out] - pointer to the first element in the output range.

  • values_input[in] - pointer to the first element in the range to sort.

  • values_output[out] - pointer to the first element in the output range.

  • size[in] - number of element in the input range.

  • segments[in] - number of segments in the input range.

  • begin_offsets[in] - iterator to the first element in the range of beginning offsets.

  • end_offsets[in] - iterator to the first element in the range of ending offsets.

  • begin_bit[in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range [0; 8 * sizeof(Key)). Default value: 0. Non-default value not supported for floating-point key-types.

  • end_bit[in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range (begin_bit; 8 * sizeof(Key)]. Default value: * sizeof(Key). Non-default value not supported for floating-point key-types.

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.