Sort#
Configuring the kernel#
merge_sort#
-
template<unsigned int MergeOddevenBlockSize = 512, unsigned int SortBlockSize = MergeOddevenBlockSize, unsigned int SortItemsPerThread = 1, unsigned int MergeMergepathPartitionBlockSize = 128, unsigned int MergeMergepathBlockSize = 128, unsigned int MergeMergepathItemsPerThread = 4, unsigned int MinInputSizeMergepath = (1 << 17) + 70000>
struct merge_sort_config : public rocprim::detail::merge_sort_config_params# Configuration of device-level merge primitives.
- Template Parameters:
SortBlockSize – - block size in the block-sort step
SortItemsPerThread – - ItemsPerThread in the block-sort step
MergeOddevenBlockSize – - block size in the block merge step using oddeven impl (used when input_size < MinInputSizeMergepath)
MergeMergepathPartitionBlockSize – - block size of the partition kernel in the block merge step using mergepath impl
MergeMergepathBlockSize – - block size in the block merge step using mergepath impl
MergeMergepathItemsPerThread – - ItemsPerThread in the block merge step using mergepath impl
MinInputSizeMergepath – - breakpoint of input-size to use mergepath impl for block merge step
radix_sort#
-
template<class SingleSortConfig = default_config, class MergeSortConfig = default_config, class OnesweepConfig = default_config, size_t MergeSortLimit = 1024 * 1024>
struct radix_sort_config# Configuration of device-level radix sort operation.
One of three algorithms is used: single sort (launches only a single block), merge sort, or Onesweep.
- Template Parameters:
SortSingleConfig – - Configuration for the single kernel subalgorithm. must be
kernel_config
ordefault_config
.MergeSortConfig – - Configuration for the merge sort subalgorithm. must be
merge_sort_config
ordefault_config
. Ifmerge_sort_config
, the sorted items per block must be a power of two.OnesweepConfig – - Configuration for the Onesweep subalgorithm. must be
radix_sort_onesweep_config
ordefault_config
.MergeSortLimit – - The largest number of items for which the merge sort algorithm will be used. Note that below this limit, a different algorithm may be used.
merge_sort#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<KeysInputIterator>::value_type>>
inline hipError_t rocprim::merge_sort(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, const size_t size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)# Parallel merge sort primitive for device level.
merge_sort
function performs a device-wide merge sort of keys. Function sorts input keys based on comparison function.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Accepts custom compare_functions for sorting across the device.
- Example
In this example a device-level ascending merge sort is performed on an array of
float
values.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 float * input; // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7] float * output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size ); // keys_output: [0.08, 0.2, 0.3, 0.4, 0.6, 0.65, 0.7, 1]
- Template Parameters:
KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
compare_function – [in] - binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it. The default value isBinaryFunction()
.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<KeysInputIterator>::value_type>>
inline hipError_t rocprim::merge_sort(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, const size_t size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending merge sort-by-key primitive for device level.
merge_sort
function performs a device-wide merge sort of (key, value) pairs. Function sorts input pairs based on comparison function.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Accepts custom compare_functions for sorting across the device.
- Example
In this example a device-level ascending merge sort is performed where input keys are represented by an array of unsigned integers and input values by an array of
double
s.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 unsigned int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 2, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] unsigned int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size ); // keys_output: [ 1, 2, 3, 4, 5, 6, 7, 8] // values_output: [-1, -2, 2, 3, -4, -5, 7, -8]
- Template Parameters:
KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
values_input – [in] - pointer to the first element in the range to sort.
values_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
compare_function – [in] - binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it. The default value isBinaryFunction()
.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
radix_sort_keys#
Ascending Sort#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
hipError_t rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending radix sort primitive for device level.
radix_sort_keys
function performs a device-wide radix sort of keys. Function sorts input keys in ascending order.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level ascending radix sort is performed on an array of
float
values.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 float * input; // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7] float * output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size ); // keys_output: [0.08, 0.2, 0.3, 0.4, 0.6, 0.65, 0.7, 1]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
Size – integral type that represents the problem size.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys_input – [in] pointer to the first element in the range to sort.
keys_output – [out] pointer to the first element in the output range.
size – [in] number of element in the input range.
begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
auto rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t># Parallel ascending radix sort primitive for device level.
radix_sort_keys
function performs a device-wide radix sort of keys. Function sorts input keys in ascending order.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) can be any trivially copyable type.decomposer
must be a functor that implementsoperator()(Key&) const
. This operator must return arocprim::tuple
that contains one or more reference to value(s) of arithmetic types. These references must point to member variables ofKey
, however not every member variable has to be exposed this way.Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.begin_bit
andend_bit
can be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returnsrocprim::tuple<int16_t&, uint8_t&>
,begin_bit==6
andend_bit==12
, then the 2 MSBs of theuint8_t
value and the 4 LSBs of theint16_t
value are considered for sorting. The range specified bybegin_bit
andend_bit
must be valid with regards to the sizes of the return tuple’s elements.
- Example
In this example a device-level ascending radix sort is performed on an array of values of a custom type, using a custom decomposer.
#include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}] custom_type * output; // empty array of 8 elements constexpr unsigned int begin_bit = 0; constexpr unsigned int end_bit = 96; size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // keys_output: [{-3, 0.3}, {-1, 0.7}, {0, 0.2}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {11, 0.08}, {11, 1.0}]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.KeysInputIterator – Random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – Random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
Size – Integral type that represents the problem size.
Key – The value type of the input and output iterators.
Decomposer – The type of the decomposer functor.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys_input – [in] pointer to the first element in the range to sort.
keys_output – [out] pointer to the first element in the output range.
size – [in] number of element in the input range.
decomposer – [in] decomposer functor that produces a tuple of references from the input key type.
begin_bit – [in] index of the first (least significant) bit used in key comparison.
end_bit – [in] past-the-end index (most significant) bit used in key comparison.
stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
auto rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t># Parallel ascending radix sort primitive for device level.
radix_sort_keys
function performs a device-wide radix sort of keys. Function sorts input keys in ascending order.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) can be any trivially copyable type.decomposer
must be a functor that implementsoperator()(Key&) const
. This operator must return arocprim::tuple
that contains one or more reference to value(s) of arithmetic types. These references must point to member variables ofKey
, however not every member variable has to be exposed this way.Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.
- Example
In this example a device-level ascending radix sort is performed on an array of values of a custom type, using a custom decomposer.
#include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}] custom_type * output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, custom_type_decomposer{} ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, custom_type_decomposer{} ); // keys_output: [{-3, 0.3}, {-1, 0.7}, {0, 0.2}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {11, 0.08}, {11, 1.0}]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.KeysInputIterator – Random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – Random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
Size – Integral type that represents the problem size.
Key – The value type of the input and output iterators.
Decomposer – The type of the decomposer functor.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys_input – [in] pointer to the first element in the range to sort.
keys_output – [out] pointer to the first element in the output range.
size – [in] number of element in the input range.
decomposer – [in] decomposer functor that produces a tuple of references from the input key type.
stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class Key, class Size>
hipError_t rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending radix sort primitive for device level.
radix_sort_keys
function performs a device-wide radix sort of keys. Function sorts input keys in ascending order.- Overview
The contents of both buffers of
keys
may be altered by the sorting function.current()
ofkeys
is used as the input.The function will update
current()
ofkeys
to point to the buffer that contains the output range.Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.The function requires small
temporary_storage
as it does not need a temporary buffer ofsize
elements.Key
type must be an arithmetic type (that is, an integral type or a floating-point type).Buffers of
keys
must have at leastsize
elements.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level ascending radix sort is performed on an array of
float
values.#include <rocprim/rocprim.hpp> // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 float * input; // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7] float * tmp; // empty array of 8 elements // Create double-buffer rocprim::double_buffer<float> keys(input, tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size ); // keys.current(): [0.08, 0.2, 0.3, 0.4, 0.6, 0.65, 0.7, 1]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.Key – key type. Must be an integral type or a floating-point type.
Size – integral type that represents the problem size.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys – [inout] reference to the double-buffer of keys, its
current()
contains the input range and will be updated to point to the output range.size – [in] number of element in the input range.
begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class Key, class Size, class Decomposer>
auto rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t># Parallel ascending radix sort primitive for device level.
radix_sort_keys
function performs a device-wide radix sort of keys. Function sorts input keys in ascending order.- Overview
The contents of both buffers of
keys
may be altered by the sorting function.current()
ofkeys
is used as the input.The function will update
current()
ofkeys
to point to the buffer that contains the output range.Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.The function requires small
temporary_storage
as it does not need a temporary buffer ofsize
elements.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) can be any trivially copyable type.decomposer
must be a functor that implementsoperator()(Key&) const
. This operator must return arocprim::tuple
that contains one or more reference to value(s) of arithmetic types. These references must point to member variables ofKey
, however not every member variable has to be exposed this way.Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.begin_bit
andend_bit
can be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returnsrocprim::tuple<int16_t&, uint8_t&>
,begin_bit==6
andend_bit==12
, then the 2 MSBs of theuint8_t
value and the 4 LSBs of theint16_t
value are considered for sorting. The range specified bybegin_bit
andend_bit
must be valid with regards to the sizes of the return tuple’s elements.
- Example
In this example a device-level ascending radix sort is performed on an array of values of a custom type, using a custom decomposer.
#include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and tmp (declare pointers, allocate device memory etc.) constexpr unsigned int begin_bit = 0; constexpr unsigned int end_bit = 96; size_t input_size; // e.g., 8 custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}] custom_type * tmp; // empty array of 8 elements // Create double-buffer rocprim::double_buffer<custom_type> keys(input, tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // keys.current(): [{-3, 0.3}, {-1, 0.7}, {0, 0.2}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {11, 0.08}, {11, 1.0}]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.Key – key type. Must be an integral type or a floating-point type.
Size – integral type that represents the problem size.
Decomposer – The type of the decomposer functor.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys – [inout] reference to the double-buffer of keys, its
current()
contains the input range and will be updated to point to the output range.size – [in] number of element in the input range.
decomposer – [in] decomposer functor that produces a tuple of references from the input key type.
begin_bit – [in] index of the first (least significant) bit used in key comparison.
end_bit – [in] past-the-end index (most significant) bit used in key comparison.
stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class Key, class Size, class Decomposer>
auto rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t># Parallel ascending radix sort primitive for device level.
radix_sort_keys
function performs a device-wide radix sort of keys. Function sorts input keys in ascending order.- Overview
The contents of both buffers of
keys
may be altered by the sorting function.current()
ofkeys
is used as the input.The function will update
current()
ofkeys
to point to the buffer that contains the output range.Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.The function requires small
temporary_storage
as it does not need a temporary buffer ofsize
elements.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) can be any trivially copyable type.decomposer
must be a functor that implementsoperator()(Key&) const
. This operator must return arocprim::tuple
that contains one or more reference to value(s) of arithmetic types. These references must point to member variables ofKey
, however not every member variable has to be exposed this way.Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.
- Example
In this example a device-level ascending radix sort is performed on an array of values of a custom type, using a custom decomposer.
#include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}] custom_type * tmp; // empty array of 8 elements // Create double-buffer rocprim::double_buffer<custom_type> keys(input, tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size, custom_type_decomposer{} ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size, custom_type_decomposer{} ); // keys.current(): [{-3, 0.3}, {-1, 0.7}, {0, 0.2}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {11, 0.08}, {11, 1.0}]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.Key – key type. Must be an integral type or a floating-point type.
Size – integral type that represents the problem size.
Decomposer – The type of the decomposer functor.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys – [inout] reference to the double-buffer of keys, its
current()
contains the input range and will be updated to point to the output range.size – [in] number of element in the input range.
decomposer – [in] decomposer functor that produces a tuple of references from the input key type.
stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
Descending Sort#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
hipError_t rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel descending radix sort primitive for device level.
radix_sort_keys_desc
function performs a device-wide radix sort of keys. Function sorts input keys in descending order.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level descending radix sort is performed on an array of integer values.
#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * input; // e.g., [6, 3, 5, 4, 2, 8, 1, 7] int * output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size ); // keys_output: [8, 7, 6, 5, 4, 3, 2, 1]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
Size – integral type that represents the problem size.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys_input – [in] pointer to the first element in the range to sort.
keys_output – [out] pointer to the first element in the output range.
size – [in] number of element in the input range.
begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
auto rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t># Parallel descending radix sort primitive for device level.
radix_sort_keys_desc
function performs a device-wide radix sort of keys. Function sorts input keys in descending order.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) can be any trivially copyable type.decomposer
must be a functor that implementsoperator()(Key&) const
. This operator must return arocprim::tuple
that contains one or more reference to value(s) of arithmetic types. These references must point to member variables ofKey
, however not every member variable has to be exposed this way.Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.begin_bit
andend_bit
can be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returnsrocprim::tuple<int16_t&, uint8_t&>
,begin_bit==6
andend_bit==12
, then the 2 MSBs of theuint8_t
value and the 4 LSBs of theint16_t
value are considered for sorting. The range specified bybegin_bit
andend_bit
must be valid with regards to the sizes of the return tuple’s elements.
- Example
In this example a device-level descending radix sort is performed on an array of values of a custom type, using a custom decomposer.
#include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and output (declare pointers, allocate device memory etc.) constexpr unsigned int begin_bit = 0; constexpr unsigned int end_bit = 96; size_t input_size; // e.g., 8 custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}] custom_type * output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // keys_output: [{11, 1.0}, {11, 0.08}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.2}, {-1, 0.7}, {-3, 0.3},]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
Size – integral type that represents the problem size.
Key – The value type of the input and output iterators.
Decomposer – The type of the decomposer functor.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys_input – [in] pointer to the first element in the range to sort.
keys_output – [out] pointer to the first element in the output range.
size – [in] number of element in the input range.
decomposer – [in] decomposer functor that produces a tuple of references from the input key type.
begin_bit – [in] index of the first (least significant) bit used in key comparison.
end_bit – [in] past-the-end index (most significant) bit used in key comparison.
stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
auto rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t># Parallel descending radix sort primitive for device level.
radix_sort_keys_desc
function performs a device-wide radix sort of keys. Function sorts input keys in descending order.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) can be any trivially copyable type.decomposer
must be a functor that implementsoperator()(Key&) const
. This operator must return arocprim::tuple
that contains one or more reference to value(s) of arithmetic types. These references must point to member variables ofKey
, however not every member variable has to be exposed this way.Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.
- Example
In this example a device-level descending radix sort is performed on an array of values of a custom type, using a custom decomposer.
#include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}] custom_type * output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, custom_type_decomposer{} ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, custom_type_decomposer{} ); // keys_output: [{11, 1.0}, {11, 0.08}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.2}, {-1, 0.7}, {-3, 0.3},]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
Size – integral type that represents the problem size.
Key – The value type of the input and output iterators.
Decomposer – The type of the decomposer functor.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys_input – [in] pointer to the first element in the range to sort.
keys_output – [out] pointer to the first element in the output range.
size – [in] number of element in the input range.
decomposer – [in] decomposer functor that produces a tuple of references from the input key type.
stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class Key, class Size>
hipError_t rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel descending radix sort primitive for device level.
radix_sort_keys_desc
function performs a device-wide radix sort of keys. Function sorts input keys in descending order.- Overview
The contents of both buffers of
keys
may be altered by the sorting function.current()
ofkeys
is used as the input.The function will update
current()
ofkeys
to point to the buffer that contains the output range.Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.The function requires small
temporary_storage
as it does not need a temporary buffer ofsize
elements.Key
type must be an arithmetic type (that is, an integral type or a floating-point type).Buffers of
keys
must have at leastsize
elements.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level descending radix sort is performed on an array of integer values.
#include <rocprim/rocprim.hpp> // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * input; // e.g., [6, 3, 5, 4, 2, 8, 1, 7] int * tmp; // empty array of 8 elements // Create double-buffer rocprim::double_buffer<int> keys(input, tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size ); // keys.current(): [8, 7, 6, 5, 4, 3, 2, 1]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.Key – key type. Must be an integral type or a floating-point type.
Size – integral type that represents the problem size.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys – [inout] reference to the double-buffer of keys, its
current()
contains the input range and will be updated to point to the output range.size – [in] number of element in the input range.
begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class Key, class Size, class Decomposer>
auto rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t># Parallel descending radix sort primitive for device level.
radix_sort_keys_desc
function performs a device-wide radix sort of keys. Function sorts input keys in descending order.- Overview
The contents of both buffers of
keys
may be altered by the sorting function.current()
ofkeys
is used as the input.The function will update
current()
ofkeys
to point to the buffer that contains the output range.Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.The function requires small
temporary_storage
as it does not need a temporary buffer ofsize
elements.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) can be any trivially copyable type.decomposer
must be a functor that implementsoperator()(Key&) const
. This operator must return arocprim::tuple
that contains one or more reference to value(s) of arithmetic types. These references must point to member variables ofKey
, however not every member variable has to be exposed this way.Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.begin_bit
andend_bit
can be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returnsrocprim::tuple<int16_t&, uint8_t&>
,begin_bit==6
andend_bit==12
, then the 2 MSBs of theuint8_t
value and the 4 LSBs of theint16_t
value are considered for sorting. The range specified bybegin_bit
andend_bit
must be valid with regards to the sizes of the return tuple’s elements.
- Example
In this example a device-level descending radix sort is performed on an array of values of a custom type, using a custom decomposer.
#include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and tmp (declare pointers, allocate device memory etc.) constexpr unsigned int begin_bit = 0; constexpr unsigned int end_bit = 96; size_t input_size; // e.g., 8 custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}] custom_type * tmp; // empty array of 8 elements // Create double-buffer rocprim::double_buffer<custom_type> keys(input, tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size, custom_type_decomposer{} ); // keys.current(): [{11, 1.0}, {11, 0.08}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.2}, {-1, 0.7}, {-3, 0.3},]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.Key – key type. Must be an integral type or a floating-point type.
Size – integral type that represents the problem size.
Decomposer – The type of the decomposer functor.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys – [inout] reference to the double-buffer of keys, its
current()
contains the input range and will be updated to point to the output range.size – [in] number of element in the input range.
decomposer – [in] decomposer functor that produces a tuple of references from the input key type.
begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Defaults to
0
.end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Defaults to the size of the decomposed tuple’s bit range.
stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class Key, class Size, class Decomposer>
auto rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t># Parallel descending radix sort primitive for device level.
radix_sort_keys_desc
function performs a device-wide radix sort of keys. Function sorts input keys in descending order.- Overview
The contents of both buffers of
keys
may be altered by the sorting function.current()
ofkeys
is used as the input.The function will update
current()
ofkeys
to point to the buffer that contains the output range.Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.The function requires small
temporary_storage
as it does not need a temporary buffer ofsize
elements.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) can be any trivially copyable type.decomposer
must be a functor that implementsoperator()(Key&) const
. This operator must return arocprim::tuple
that contains one or more reference to value(s) of arithmetic types. These references must point to member variables ofKey
, however not every member variable has to be exposed this way.Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.
- Example
In this example a device-level descending radix sort is performed on an array of values of a custom type, using a custom decomposer.
#include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}] custom_type * tmp; // empty array of 8 elements // Create double-buffer rocprim::double_buffer<custom_type> keys(input, tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size, custom_type_decomposer{} ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size, custom_type_decomposer{} ); // keys.current(): [{11, 1.0}, {11, 0.08}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.2}, {-1, 0.7}, {-3, 0.3},]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.Key – key type. Must be an integral type or a floating-point type.
Size – integral type that represents the problem size.
Decomposer – The type of the decomposer functor.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys – [inout] reference to the double-buffer of keys, its
current()
contains the input range and will be updated to point to the output range.size – [in] number of element in the input range.
decomposer – [in] decomposer functor that produces a tuple of references from the input key type.
stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
Segmented Ascending Sort#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_keys(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending radix sort primitive for device level.
segmented_radix_sort_keys
function performs a device-wide radix sort across multiple, non-overlapping sequences of keys. Function sorts input keys in ascending order.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.Ranges specified by
begin_offsets
andend_offsets
must have at leastsegments
elements. They may use the same sequenceoffsets
of at leastsegments + 1
elements:offsets
forbegin_offsets
andoffsets + 1
forend_offsets
.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level ascending radix sort is performed on an array of
float
values.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 float * input; // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7] float * output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // keys_output: [0.3, 0.6, 0.65, 0.08, 0.2, 0.4, 0.7, 1]
- Template Parameters:
Config – - [optional] configuration of the primitive. It has to be
segmented_radix_sort_config
or a class derived from it.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
segments – [in] - number of segments in the input range.
begin_offsets – [in] - iterator to the first element in the range of beginning offsets.
end_offsets – [in] - iterator to the first element in the range of ending offsets.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
Segmented Descending Sort#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel descending radix sort primitive for device level.
segmented_radix_sort_keys_desc
function performs a device-wide radix sort across multiple, non-overlapping sequences of keys. Function sorts input keys in descending order.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.Ranges specified by
begin_offsets
andend_offsets
must have at leastsegments
elements. They may use the same sequenceoffsets
of at leastsegments + 1
elements:offsets
forbegin_offsets
andoffsets + 1
forend_offsets
.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level descending radix sort is performed on an array of integer values.
#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * input; // e.g., [6, 3, 5, 4, 2, 8, 1, 7] int * output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // keys_output: [6, 3, 5, 8, 7, 4, 2, 1]
- Template Parameters:
Config – - [optional] configuration of the primitive. It has to be
segmented_radix_sort_config
or a class derived from it.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
segments – [in] - number of segments in the input range.
begin_offsets – [in] - iterator to the first element in the range of beginning offsets.
end_offsets – [in] - iterator to the first element in the range of ending offsets.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
radix_sort_pairs#
Ascending Sort#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
hipError_t rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending radix sort-by-key primitive for device level.
radix_sort_pairs
function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input
,keys_output
,values_input
andvalues_output
must have at leastsize
elements.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level ascending radix sort is performed where input keys are represented by an array of unsigned integers and input values by an array of
double
s.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 unsigned int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] unsigned int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements // Keys are in range [0; 8], so we can limit compared bit to bits on indexes // 0, 1, 2, 3, and 4. In order to do this begin_bit is set to 0 and end_bit // is set to 5. size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, 0, 5 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, 0, 5 ); // keys_output: [ 1, 1, 3, 4, 5, 6, 7, 8] // values_output: [-1, -2, 2, 3, -4, -5, 7, -8]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
Size – integral type that represents the problem size.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys_input – [in] pointer to the first element in the range to sort.
keys_output – [out] pointer to the first element in the output range.
values_input – [in] pointer to the first element in the range to sort.
values_output – [out] pointer to the first element in the output range.
size – [in] number of element in the input range.
begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
auto rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t># Parallel ascending radix sort-by-key primitive for device level.
radix_sort_pairs
function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) can be any trivially copyable type.decomposer
must be a functor that implementsoperator()(Key&) const
. This operator must return arocprim::tuple
that contains one or more reference to value(s) of arithmetic types. These references must point to member variables ofKey
, however not every member variable has to be exposed this way.Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.begin_bit
andend_bit
can be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returnsrocprim::tuple<int16_t&, uint8_t&>
,begin_bit==6
andend_bit==12
, then the 2 MSBs of theuint8_t
value and the 4 LSBs of theint16_t
value are considered for sorting. The range specified bybegin_bit
andend_bit
must be valid with regards to the sizes of the return tuple’s elements.
- Example
In this example a device-level ascending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of
double
s.#include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * keys_input; // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] custom_type * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements // The integer field of the keys is in range 0-11, which can be represented on 4 bits, // while for the double member we must specify full bit range [0; 63]. Therefore begin_bit // is set to 0 and end_bit is set to 68. constexpr unsigned int begin_bit = 0; constexpr unsigned int end_bit = 68; size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // keys_output: [{0, 0.2}, {0, 0.3}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {5, 0.7}, {11, 0.08}, {11, 1.0}] // values_output: [-1, 2, 3, -5, -4, 7, -8, -2]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
Size – integral type that represents the problem size.
Key – The value type of the input and output iterators.
Decomposer – The type of the decomposer functor.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys_input – [in] pointer to the first element in the range to sort.
keys_output – [out] pointer to the first element in the output range.
values_input – [in] pointer to the first element in the range to sort.
values_output – [out] pointer to the first element in the output range.
size – [in] number of element in the input range.
decomposer – [in] decomposer functor that produces a tuple of references from the input key type.
begin_bit – [in] index of the first (least significant) bit used in key comparison.
end_bit – [in] past-the-end index (most significant) bit used in key comparison.
stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
auto rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t># Parallel ascending radix sort-by-key primitive for device level.
radix_sort_pairs
function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) can be any trivially copyable type.decomposer
must be a functor that implementsoperator()(Key&) const
. This operator must return arocprim::tuple
that contains one or more reference to value(s) of arithmetic types. These references must point to member variables ofKey
, however not every member variable has to be exposed this way.Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.
- Example
In this example a device-level ascending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of
double
s.#include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * keys_input; // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] custom_type * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, custom_type_decomposer{} ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, custom_type_decomposer{} ); // keys_output: [{0, 0.2}, {0, 0.3}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {5, 0.7}, {11, 0.08}, {11, 1.0}] // values_output: [-1, 2, 3, -5, -4, 7, -8, -2]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
Size – integral type that represents the problem size.
Key – The value type of the input and output iterators.
Decomposer – The type of the decomposer functor.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys_input – [in] pointer to the first element in the range to sort.
keys_output – [out] pointer to the first element in the output range.
values_input – [in] pointer to the first element in the range to sort.
values_output – [out] pointer to the first element in the output range.
size – [in] number of element in the input range.
decomposer – [in] decomposer functor that produces a tuple of references from the input key type.
stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class Key, class Value, class Size>
hipError_t rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending radix sort-by-key primitive for device level.
radix_sort_pairs
function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
The contents of both buffers of
keys
andvalues
may be altered by the sorting function.current()
ofkeys
andvalues
are used as the input.The function will update
current()
ofkeys
andvalues
to point to buffers that contains the output range.Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.The function requires small
temporary_storage
as it does not need a temporary buffer ofsize
elements.Key
type must be an arithmetic type (that is, an integral type or a floating-point type).Buffers of
keys
must have at leastsize
elements.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level ascending radix sort is performed where input keys are represented by an array of unsigned integers and input values by an array of
double
s.#include <rocprim/rocprim.hpp> // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 unsigned int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] unsigned int * keys_tmp; // empty array of 8 elements double* values_tmp; // empty array of 8 elements // Create double-buffers rocprim::double_buffer<unsigned int> keys(keys_input, keys_tmp); rocprim::double_buffer<double> values(values_input, values_tmp); // Keys are in range [0; 8], so we can limit compared bit to bits on indexes // 0, 1, 2, 3, and 4. In order to do this begin_bit is set to 0 and end_bit // is set to 5. size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, 0, 5 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, 0, 5 ); // keys.current(): [ 1, 1, 3, 4, 5, 6, 7, 8] // values.current(): [-1, -2, 2, 3, -4, -5, 7, -8]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.Key – key type. Must be an integral type or a floating-point type.
Value – value type.
Size – integral type that represents the problem size.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys – [inout] reference to the double-buffer of keys, its
current()
contains the input range and will be updated to point to the output range.values – [inout] reference to the double-buffer of values, its
current()
contains the input range and will be updated to point to the output range.size – [in] number of element in the input range.
begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class Key, class Value, class Size, class Decomposer>
auto rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t># Parallel ascending radix sort-by-key primitive for device level.
radix_sort_pairs
function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
The contents of both buffers of
keys
andvalues
may be altered by the sorting function.current()
ofkeys
andvalues
are used as the input.The function will update
current()
ofkeys
andvalues
to point to buffers that contains the output range.Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.The function requires small
temporary_storage
as it does not need a temporary buffer ofsize
elements.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) can be any trivially copyable type.decomposer
must be a functor that implementsoperator()(Key&) const
. This operator must return arocprim::tuple
that contains one or more reference to value(s) of arithmetic types. These references must point to member variables ofKey
, however not every member variable has to be exposed this way.Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.begin_bit
andend_bit
can be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returnsrocprim::tuple<int16_t&, uint8_t&>
,begin_bit==6
andend_bit==12
, then the 2 MSBs of theuint8_t
value and the 4 LSBs of theint16_t
value are considered for sorting. The range specified bybegin_bit
andend_bit
must be valid with regards to the sizes of the return tuple’s elements.
- Example
In this example a device-level ascending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of
double
s.#include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * keys_input; // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] custom_type * keys_tmp; // empty array of 8 elements double* values_tmp; // empty array of 8 elements // Create double-buffers rocprim::double_buffer<custom_type> keys(keys_input, keys_tmp); rocprim::double_buffer<double> values(values_input, values_tmp); // The integer field of the keys is in range 0-11, which can be represented on 4 bits, // while for the double member we must specify full bit range [0; 63]. Therefore begin_bit // is set to 0 and end_bit is set to 68. constexpr unsigned int begin_bit = 0; constexpr unsigned int end_bit = 68; size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // keys.current(): [{0, 0.2}, {0, 0.3}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {5, 0.7}, {11, 0.08}, {11, 1.0}] // values.current(): [-1, 2, 3, -5, -4, 7, -8, -2]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.Key – key type. Must be an integral type or a floating-point type.
Value – value type.
Size – integral type that represents the problem size.
Decomposer – The type of the decomposer functor.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys – [inout] reference to the double-buffer of keys, its
current()
contains the input range and will be updated to point to the output range.values – [inout] reference to the double-buffer of values, its
current()
contains the input range and will be updated to point to the output range.size – [in] number of element in the input range.
decomposer – [in] decomposer functor that produces a tuple of references from the input key type.
begin_bit – [in] index of the first (least significant) bit used in key comparison.
end_bit – [in] past-the-end index (most significant) bit used in key comparison. Defaults to the size of the decomposed tuple’s bit range.
stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class Key, class Value, class Size, class Decomposer>
auto rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t># Parallel ascending radix sort-by-key primitive for device level.
radix_sort_pairs
function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
The contents of both buffers of
keys
andvalues
may be altered by the sorting function.current()
ofkeys
andvalues
are used as the input.The function will update
current()
ofkeys
andvalues
to point to buffers that contains the output range.Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.The function requires small
temporary_storage
as it does not need a temporary buffer ofsize
elements.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) can be any trivially copyable type.decomposer
must be a functor that implementsoperator()(Key&) const
. This operator must return arocprim::tuple
that contains one or more reference to value(s) of arithmetic types. These references must point to member variables ofKey
, however not every member variable has to be exposed this way.Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.
- Example
In this example a device-level ascending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of
double
s.#include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * keys_input; // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] custom_type * keys_tmp; // empty array of 8 elements double* values_tmp; // empty array of 8 elements // Create double-buffers rocprim::double_buffer<custom_type> keys(keys_input, keys_tmp); rocprim::double_buffer<double> values(values_input, values_tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, custom_type_decomposer{} ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, custom_type_decomposer{} ); // keys.current(): [{0, 0.2}, {0, 0.3}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {5, 0.7}, {11, 0.08}, {11, 1.0}] // values.current(): [-1, 2, 3, -5, -4, 7, -8, -2]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.Key – key type. Must be an integral type or a floating-point type.
Value – value type.
Size – integral type that represents the problem size.
Decomposer – The type of the decomposer functor.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys – [inout] reference to the double-buffer of keys, its
current()
contains the input range and will be updated to point to the output range.values – [inout] reference to the double-buffer of values, its
current()
contains the input range and will be updated to point to the output range.size – [in] number of element in the input range.
decomposer – [in] decomposer functor that produces a tuple of references from the input key type.
stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
Descending Sort#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
hipError_t rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel descending radix sort-by-key primitive for device level.
radix_sort_pairs_desc
function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input
,keys_output
,values_input
andvalues_output
must have at leastsize
elements.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level descending radix sort is performed where input keys are represented by an array of integers and input values by an array of
double
s.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size ); // keys_output: [ 8, 7, 6, 5, 4, 3, 1, 1] // values_output: [-8, 7, -5, -4, 3, 2, -1, -2]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
Size – integral type that represents the problem size.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys_input – [in] pointer to the first element in the range to sort.
keys_output – [out] pointer to the first element in the output range.
values_input – [in] pointer to the first element in the range to sort.
values_output – [out] pointer to the first element in the output range.
size – [in] number of element in the input range.
begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
auto rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t># Parallel descending radix sort-by-key primitive for device level.
radix_sort_pairs_desc
function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) can be any trivially copyable type.decomposer
must be a functor that implementsoperator()(Key&) const
. This operator must return arocprim::tuple
that contains one or more reference to value(s) of arithmetic types. These references must point to member variables ofKey
, however not every member variable has to be exposed this way.Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.begin_bit
andend_bit
can be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returnsrocprim::tuple<int16_t&, uint8_t&>
,begin_bit==6
andend_bit==12
, then the 2 MSBs of theuint8_t
value and the 4 LSBs of theint16_t
value are considered for sorting. The range specified bybegin_bit
andend_bit
must be valid with regards to the sizes of the return tuple’s elements.
- Example
In this example a device-level descending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of
double
s.#include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * keys_input; // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] custom_type * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements // The integer field of the keys is in range 0-11, which can be represented on 4 bits, // while for the double member we must specify full bit range [0; 63]. Therefore begin_bit // is set to 0 and end_bit is set to 68. constexpr unsigned int begin_bit = 0; constexpr unsigned int end_bit = 68; size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // keys_output: [{11, 1.0}, {11, 0.08}, {5, 0.7}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.3}, {0, 0.2}] // values_output: [-2, -1, 2, 3, -4, -5, 7, -8]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
Size – integral type that represents the problem size.
Key – The value type of the input and output iterators.
Decomposer – The type of the decomposer functor.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys_input – [in] pointer to the first element in the range to sort.
keys_output – [out] pointer to the first element in the output range.
values_input – [in] pointer to the first element in the range to sort.
values_output – [out] pointer to the first element in the output range.
size – [in] number of element in the input range.
decomposer – [in] decomposer functor that produces a tuple of references from the input key type.
begin_bit – [in] index of the first (least significant) bit used in key comparison.
end_bit – [in] past-the-end index (most significant) bit used in key comparison.
stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
auto rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t># Parallel descending radix sort-by-key primitive for device level.
radix_sort_pairs_desc
function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) can be any trivially copyable type.decomposer
must be a functor that implementsoperator()(Key&) const
. This operator must return arocprim::tuple
that contains one or more reference to value(s) of arithmetic types. These references must point to member variables ofKey
, however not every member variable has to be exposed this way.Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.
- Example
In this example a device-level descending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of
double
s.#include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * keys_input; // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] custom_type * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, custom_type_decomposer{} ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, custom_type_decomposer{} ); // keys_output: [{11, 1.0}, {11, 0.08}, {5, 0.7}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.3}, {0, 0.2}] // values_output: [-2, -1, 2, 3, -4, -5, 7, -8]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
Size – integral type that represents the problem size.
Key – The value type of the input and output iterators.
Decomposer – The type of the decomposer functor.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys_input – [in] pointer to the first element in the range to sort.
keys_output – [out] pointer to the first element in the output range.
values_input – [in] pointer to the first element in the range to sort.
values_output – [out] pointer to the first element in the output range.
size – [in] number of element in the input range.
decomposer – [in] decomposer functor that produces a tuple of references from the input key type.
stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class Key, class Value, class Size>
hipError_t rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel descending radix sort-by-key primitive for device level.
radix_sort_pairs_desc
function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
The contents of both buffers of
keys
andvalues
may be altered by the sorting function.current()
ofkeys
andvalues
are used as the input.The function will update
current()
ofkeys
andvalues
to point to buffers that contains the output range.Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.The function requires small
temporary_storage
as it does not need a temporary buffer ofsize
elements.Key
type must be an arithmetic type (that is, an integral type or a floating-point type).Buffers of
keys
must have at leastsize
elements.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level descending radix sort is performed where input keys are represented by an array of integers and input values by an array of
double
s.#include <rocprim/rocprim.hpp> // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] int * keys_tmp; // empty array of 8 elements double * values_tmp; // empty array of 8 elements // Create double-buffers rocprim::double_buffer<int> keys(keys_input, keys_tmp); rocprim::double_buffer<double> values(values_input, values_tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size ); // keys.current(): [ 8, 7, 6, 5, 4, 3, 1, 1] // values.current(): [-8, 7, -5, -4, 3, 2, -1, -2]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.Key – key type. Must be an integral type or a floating-point type.
Value – value type.
Size – integral type that represents the problem size.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys – [inout] reference to the double-buffer of keys, its
current()
contains the input range and will be updated to point to the output range.values – [inout] reference to the double-buffer of values, its
current()
contains the input range and will be updated to point to the output range.size – [in] number of element in the input range.
begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class Key, class Value, class Size, class Decomposer>
auto rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t># Parallel descending radix sort-by-key primitive for device level.
radix_sort_pairs_desc
function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
The contents of both buffers of
keys
andvalues
may be altered by the sorting function.current()
ofkeys
andvalues
are used as the input.The function will update
current()
ofkeys
andvalues
to point to buffers that contains the output range.Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.The function requires small
temporary_storage
as it does not need a temporary buffer ofsize
elements.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) can be any trivially copyable type.decomposer
must be a functor that implementsoperator()(Key&) const
. This operator must return arocprim::tuple
that contains one or more reference to value(s) of arithmetic types. These references must point to member variables ofKey
, however not every member variable has to be exposed this way.Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.begin_bit
andend_bit
can be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returnsrocprim::tuple<int16_t&, uint8_t&>
,begin_bit==6
andend_bit==12
, then the 2 MSBs of theuint8_t
value and the 4 LSBs of theint16_t
value are considered for sorting. The range specified bybegin_bit
andend_bit
must be valid with regards to the sizes of the return tuple’s elements.
- Example
In this example a device-level descending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of
double
s.#include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * keys_input; // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] custom_type * keys_tmp; // empty array of 8 elements double * values_tmp; // empty array of 8 elements // Create double-buffers rocprim::double_buffer<custom_type> keys(keys_input, keys_tmp); rocprim::double_buffer<double> values(values_input, values_tmp); // The integer field of the keys is in range 0-11, which can be represented on 4 bits, // while for the double member we must specify full bit range [0; 63]. Therefore begin_bit // is set to 0 and end_bit is set to 68. constexpr unsigned int begin_bit = 0; constexpr unsigned int end_bit = 68; size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // keys.current(): [{11, 1.0}, {11, 0.08}, {5, 0.7}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.3}, {0, 0.2}] // values.current(): [-2, -1, 2, 3, -4, -5, 7, -8]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.Key – key type. Must be an integral type or a floating-point type.
Value – value type.
Size – integral type that represents the problem size.
Decomposer – The type of the decomposer functor.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys – [inout] reference to the double-buffer of keys, its
current()
contains the input range and will be updated to point to the output range.values – [inout] reference to the double-buffer of values, its
current()
contains the input range and will be updated to point to the output range.size – [in] number of element in the input range.
decomposer – [in] decomposer functor that produces a tuple of references from the input key type.
begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Defaults to
0
.end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Defaults to the size of the decomposed tuple’s bit range.
stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class Key, class Value, class Size, class Decomposer>
auto rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t># Parallel descending radix sort-by-key primitive for device level.
radix_sort_pairs_desc
function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
The contents of both buffers of
keys
andvalues
may be altered by the sorting function.current()
ofkeys
andvalues
are used as the input.The function will update
current()
ofkeys
andvalues
to point to buffers that contains the output range.Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.The function requires small
temporary_storage
as it does not need a temporary buffer ofsize
elements.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) can be any trivially copyable type.decomposer
must be a functor that implementsoperator()(Key&) const
. This operator must return arocprim::tuple
that contains one or more reference to value(s) of arithmetic types. These references must point to member variables ofKey
, however not every member variable has to be exposed this way.Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.
- Example
In this example a device-level descending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of
double
s.#include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * keys_input; // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] custom_type * keys_tmp; // empty array of 8 elements double * values_tmp; // empty array of 8 elements // Create double-buffers rocprim::double_buffer<custom_type> keys(keys_input, keys_tmp); rocprim::double_buffer<double> values(values_input, values_tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, custom_type_decomposer{} ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, custom_type_decomposer{} ); // keys.current(): [{11, 1.0}, {11, 0.08}, {5, 0.7}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.3}, {0, 0.2}] // values.current(): [-2, -1, 2, 3, -4, -5, 7, -8]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
radix_sort_config
or a class derived from it.Key – key type. Must be an integral type or a floating-point type.
Value – value type.
Size – integral type that represents the problem size.
Decomposer – The type of the decomposer functor.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys – [inout] reference to the double-buffer of keys, its
current()
contains the input range and will be updated to point to the output range.values – [inout] reference to the double-buffer of values, its
current()
contains the input range and will be updated to point to the output range.size – [in] number of element in the input range.
decomposer – [in] decomposer functor that produces a tuple of references from the input key type.
stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
Segmented Ascending Sort#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending radix sort-by-key primitive for device level.
segmented_radix_sort_pairs_desc
function performs a device-wide radix sort across multiple, non-overlapping sequences of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input
,keys_output
,values_input
andvalues_output
must have at leastsize
elements.Ranges specified by
begin_offsets
andend_offsets
must have at leastsegments
elements. They may use the same sequenceoffsets
of at leastsegments + 1
elements:offsets
forbegin_offsets
andoffsets + 1
forend_offsets
.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level ascending radix sort is performed where input keys are represented by an array of unsigned integers and input values by an array of
double
s.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 unsigned int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] unsigned int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] // Keys are in range [0; 8], so we can limit compared bit to bits on indexes // 0, 1, 2, 3, and 4. In order to do this begin_bit is set to 0 and end_bit // is set to 5. size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1, 0, 5 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1, 0, 5 ); // keys_output: [3, 6, 5, 1, 1, 4, 7, 8] // values_output: [2, -5, -4, -1, -2, 3, 7, -8]
- Template Parameters:
Config – - [optional] configuration of the primitive. It has to be
segmented_radix_sort_config
or a class derived from it.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
values_input – [in] - pointer to the first element in the range to sort.
values_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
segments – [in] - number of segments in the input range.
begin_offsets – [in] - iterator to the first element in the range of beginning offsets.
end_offsets – [in] - iterator to the first element in the range of ending offsets.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
Segmented Descending Sort#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel descending radix sort-by-key primitive for device level.
segmented_radix_sort_pairs_desc
function performs a device-wide radix sort across multiple, non-overlapping sequences of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input
,keys_output
,values_input
andvalues_output
must have at leastsize
elements.Ranges specified by
begin_offsets
andend_offsets
must have at leastsegments
elements. They may use the same sequenceoffsets
of at leastsegments + 1
elements:offsets
forbegin_offsets
andoffsets + 1
forend_offsets
.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level descending radix sort is performed where input keys are represented by an array of integers and input values by an array of
double
s.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1 ); // keys_output: [ 6, 3, 5, 8, 7, 4, 1, 1] // values_output: [-5, 2, -4, -8, 7, 3, -1, -2]
- Template Parameters:
Config – - [optional] configuration of the primitive. It has to be
segmented_radix_sort_config
or a class derived from it.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
values_input – [in] - pointer to the first element in the range to sort.
values_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
segments – [in] - number of segments in the input range.
begin_offsets – [in] - iterator to the first element in the range of beginning offsets.
end_offsets – [in] - iterator to the first element in the range of ending offsets.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.