Sort#
Configuring the kernel#
merge_sort#
- 
template<unsigned int MergeOddevenBlockSize = 512, unsigned int SortBlockSize = MergeOddevenBlockSize, unsigned int SortItemsPerThread = 1, unsigned int MergeMergepathPartitionBlockSize = 128, unsigned int MergeMergepathBlockSize = 128, unsigned int MergeMergepathItemsPerThread = 4, unsigned int MinInputSizeMergepath = (1 << 17) + 70000>
 struct merge_sort_config : public rocprim::detail::merge_sort_config_params#
- Configuration of device-level merge primitives. - Template Parameters:
- SortBlockSize – - block size in the block-sort step 
- SortItemsPerThread – - ItemsPerThread in the block-sort step 
- MergeOddevenBlockSize – - block size in the block merge step using oddeven impl (used when input_size < MinInputSizeMergepath) 
- MergeMergepathPartitionBlockSize – - block size of the partition kernel in the block merge step using mergepath impl 
- MergeMergepathBlockSize – - block size in the block merge step using mergepath impl 
- MergeMergepathItemsPerThread – - ItemsPerThread in the block merge step using mergepath impl 
- MinInputSizeMergepath – - breakpoint of input-size to use mergepath impl for block merge step 
 
 
radix_sort#
- 
template<class SingleSortConfig = default_config, class MergeSortConfig = default_config, class OnesweepConfig = default_config, size_t MergeSortLimit = 1024 * 1024>
 struct radix_sort_config#
- Configuration of device-level radix sort operation. - One of three algorithms is used: single sort (launches only a single block), merge sort, or Onesweep. - Template Parameters:
- SortSingleConfig – - Configuration for the single kernel subalgorithm. must be - kernel_configor- default_config.
- MergeSortConfig – - Configuration for the merge sort subalgorithm. must be - merge_sort_configor- default_config. If- merge_sort_config, the sorted items per block must be a power of two.
- OnesweepConfig – - Configuration for the Onesweep subalgorithm. must be - radix_sort_onesweep_configor- default_config.
- MergeSortLimit – - The largest number of items for which the merge sort algorithm will be used. Note that below this limit, a different algorithm may be used. 
 
 
merge_sort#
- 
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<KeysInputIterator>::value_type>>
 inline hipError_t rocprim::merge_sort(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, const size_t size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)#
- Parallel merge sort primitive for device level. - merge_sortfunction performs a device-wide merge sort of keys. Function sorts input keys based on comparison function.- Overview
- The contents of the inputs are not altered by the sorting function. 
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storagein a null pointer.
- Accepts custom compare_functions for sorting across the device. 
 
- Stability
- merge_sortis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level ascending merge sort is performed on an array of - floatvalues.- #include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 float * input; // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7] float * output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size ); // keys_output: [0.08, 0.2, 0.3, 0.4, 0.6, 0.65, 0.7, 1] 
 - Template Parameters:
- Config – - [optional] Configuration of the primitive, must be - default_configor- merge_sort_config.
- KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
 
- Parameters:
- temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] - reference to a size (in bytes) of - temporary_storage.
- keys_input – [in] - pointer to the first element in the range to sort. 
- keys_output – [out] - pointer to the first element in the output range. 
- size – [in] - number of element in the input range. 
- compare_function – [in] - binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following: - bool f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it. The default value is- BinaryFunction().
- stream – [in] - [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<KeysInputIterator>::value_type>>
 inline hipError_t rocprim::merge_sort(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, const size_t size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)#
- Parallel ascending merge sort-by-key primitive for device level. - merge_sortfunction performs a device-wide merge sort of (key, value) pairs. Function sorts input pairs based on comparison function.- Overview
- The contents of the inputs are not altered by the sorting function. 
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storagein a null pointer.
- Accepts custom compare_functions for sorting across the device. 
 
- Stability
- merge_sortis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level ascending merge sort is performed where input keys are represented by an array of unsigned integers and input values by an array of - doubles.- #include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 unsigned int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 2, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] unsigned int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size ); // keys_output: [ 1, 2, 3, 4, 5, 6, 7, 8] // values_output: [-1, -2, 2, 3, -4, -5, 7, -8] 
 - Template Parameters:
- Config – - [optional] Configuration of the primitive, must be - default_configor- merge_sort_config.
- KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
 
- Parameters:
- temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] - reference to a size (in bytes) of - temporary_storage.
- keys_input – [in] - pointer to the first element in the range to sort. 
- keys_output – [out] - pointer to the first element in the output range. 
- values_input – [in] - pointer to the first element in the range to sort. 
- values_output – [out] - pointer to the first element in the output range. 
- size – [in] - number of element in the input range. 
- compare_function – [in] - binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following: - bool f(const T &a, const T &b);. The signature does not need to have- const &, but function object must not modify the objects passed to it. The default value is- BinaryFunction().
- stream – [in] - [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
radix_sort_keys#
Ascending Sort#
- 
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
 hipError_t rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#
- Parallel ascending radix sort primitive for device level. - radix_sort_keysfunction performs a device-wide radix sort of keys. Function sorts input keys in ascending order.- Overview
- The contents of the inputs are not altered by the sorting function. 
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
- If - Keyis an integer type and the range of keys is known in advance, the performance can be improved by setting- begin_bitand- end_bit, for example if all keys are in range [100, 10000],- begin_bit = 0and- end_bit = 14will cover the whole range.
 
- Stability
- radix_sort_keysis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level ascending radix sort is performed on an array of - floatvalues.- #include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 float * input; // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7] float * output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size ); // keys_output: [0.08, 0.2, 0.3, 0.4, 0.6, 0.65, 0.7, 1] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- Size – integral type that represents the problem size. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys_input – [in] pointer to the first element in the range to sort. 
- keys_output – [out] pointer to the first element in the output range. 
- size – [in] number of element in the input range. 
- begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Must be in range - [0; 8 * sizeof(Key)). Default value:- 0. Non-default value not supported for floating-point key-types.
- end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range - (begin_bit; 8 * sizeof(Key)]. Default value:- * sizeof(Key). Non-default value not supported for floating-point key-types.
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
 auto rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#
- Parallel ascending radix sort primitive for device level. - radix_sort_keysfunction performs a device-wide radix sort of keys. Function sorts input keys in ascending order.- Overview
- The contents of the inputs are not altered by the sorting function. 
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) can be any trivially copyable type.
- decomposermust be a functor that implements- operator()(Key&) const. This operator must return a- rocprim::tuplethat contains one or more reference to value(s) of arithmetic types. These references must point to member variables of- Key, however not every member variable has to be exposed this way.
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
- begin_bitand- end_bitcan be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returns- rocprim::tuple<int16_t&, uint8_t&>,- begin_bit==6and- end_bit==12, then the 2 MSBs of the- uint8_tvalue and the 4 LSBs of the- int16_tvalue are considered for sorting. The range specified by- begin_bitand- end_bitmust be valid with regards to the sizes of the return tuple’s elements.
 
- Stability
- radix_sort_keysis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level ascending radix sort is performed on an array of values of a custom type, using a custom decomposer. - #include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}] custom_type * output; // empty array of 8 elements constexpr unsigned int begin_bit = 0; constexpr unsigned int end_bit = 96; size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // keys_output: [{-3, 0.3}, {-1, 0.7}, {0, 0.2}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {11, 0.08}, {11, 1.0}] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- KeysInputIterator – Random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- KeysOutputIterator – Random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- Size – Integral type that represents the problem size. 
- Key – The value type of the input and output iterators. 
- Decomposer – The type of the decomposer functor. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys_input – [in] pointer to the first element in the range to sort. 
- keys_output – [out] pointer to the first element in the output range. 
- size – [in] number of element in the input range. 
- decomposer – [in] decomposer functor that produces a tuple of references from the input key type. 
- begin_bit – [in] index of the first (least significant) bit used in key comparison. 
- end_bit – [in] past-the-end index (most significant) bit used in key comparison. 
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
 auto rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#
- Parallel ascending radix sort primitive for device level. - radix_sort_keysfunction performs a device-wide radix sort of keys. Function sorts input keys in ascending order.- Overview
- The contents of the inputs are not altered by the sorting function. 
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) can be any trivially copyable type.
- decomposermust be a functor that implements- operator()(Key&) const. This operator must return a- rocprim::tuplethat contains one or more reference to value(s) of arithmetic types. These references must point to member variables of- Key, however not every member variable has to be exposed this way.
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
 
- Stability
- radix_sort_keysis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level ascending radix sort is performed on an array of values of a custom type, using a custom decomposer. - #include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}] custom_type * output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, custom_type_decomposer{} ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, custom_type_decomposer{} ); // keys_output: [{-3, 0.3}, {-1, 0.7}, {0, 0.2}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {11, 0.08}, {11, 1.0}] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- KeysInputIterator – Random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- KeysOutputIterator – Random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- Size – Integral type that represents the problem size. 
- Key – The value type of the input and output iterators. 
- Decomposer – The type of the decomposer functor. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys_input – [in] pointer to the first element in the range to sort. 
- keys_output – [out] pointer to the first element in the output range. 
- size – [in] number of element in the input range. 
- decomposer – [in] decomposer functor that produces a tuple of references from the input key type. 
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class Key, class Size>
 hipError_t rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#
- Parallel ascending radix sort primitive for device level. - radix_sort_keysfunction performs a device-wide radix sort of keys. Function sorts input keys in ascending order.- Overview
- The contents of both buffers of - keysmay be altered by the sorting function.
- current()of- keysis used as the input.
- The function will update - current()of- keysto point to the buffer that contains the output range.
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- The function requires small - temporary_storageas it does not need a temporary buffer of- sizeelements.
- Keytype must be an arithmetic type (that is, an integral type or a floating-point type).
- Buffers of - keysmust have at least- sizeelements.
- If - Keyis an integer type and the range of keys is known in advance, the performance can be improved by setting- begin_bitand- end_bit, for example if all keys are in range [100, 10000],- begin_bit = 0and- end_bit = 14will cover the whole range.
 
- Stability
- radix_sort_keysis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level ascending radix sort is performed on an array of - floatvalues.- #include <rocprim/rocprim.hpp> // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 float * input; // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7] float * tmp; // empty array of 8 elements // Create double-buffer rocprim::double_buffer<float> keys(input, tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size ); // keys.current(): [0.08, 0.2, 0.3, 0.4, 0.6, 0.65, 0.7, 1] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- Key – key type. Must be an integral type or a floating-point type. 
- Size – integral type that represents the problem size. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys – [inout] reference to the double-buffer of keys, its - current()contains the input range and will be updated to point to the output range.
- size – [in] number of element in the input range. 
- begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Must be in range - [0; 8 * sizeof(Key)). Default value:- 0. Non-default value not supported for floating-point key-types.
- end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range - (begin_bit; 8 * sizeof(Key)]. Default value:- * sizeof(Key). Non-default value not supported for floating-point key-types.
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class Key, class Size, class Decomposer>
 auto rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#
- Parallel ascending radix sort primitive for device level. - radix_sort_keysfunction performs a device-wide radix sort of keys. Function sorts input keys in ascending order.- Overview
- The contents of both buffers of - keysmay be altered by the sorting function.
- current()of- keysis used as the input.
- The function will update - current()of- keysto point to the buffer that contains the output range.
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- The function requires small - temporary_storageas it does not need a temporary buffer of- sizeelements.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) can be any trivially copyable type.
- decomposermust be a functor that implements- operator()(Key&) const. This operator must return a- rocprim::tuplethat contains one or more reference to value(s) of arithmetic types. These references must point to member variables of- Key, however not every member variable has to be exposed this way.
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
- begin_bitand- end_bitcan be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returns- rocprim::tuple<int16_t&, uint8_t&>,- begin_bit==6and- end_bit==12, then the 2 MSBs of the- uint8_tvalue and the 4 LSBs of the- int16_tvalue are considered for sorting. The range specified by- begin_bitand- end_bitmust be valid with regards to the sizes of the return tuple’s elements.
 
- Stability
- radix_sort_keysis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level ascending radix sort is performed on an array of values of a custom type, using a custom decomposer. - #include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and tmp (declare pointers, allocate device memory etc.) constexpr unsigned int begin_bit = 0; constexpr unsigned int end_bit = 96; size_t input_size; // e.g., 8 custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}] custom_type * tmp; // empty array of 8 elements // Create double-buffer rocprim::double_buffer<custom_type> keys(input, tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // keys.current(): [{-3, 0.3}, {-1, 0.7}, {0, 0.2}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {11, 0.08}, {11, 1.0}] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- Key – key type. Must be an integral type or a floating-point type. 
- Size – integral type that represents the problem size. 
- Decomposer – The type of the decomposer functor. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys – [inout] reference to the double-buffer of keys, its - current()contains the input range and will be updated to point to the output range.
- size – [in] number of element in the input range. 
- decomposer – [in] decomposer functor that produces a tuple of references from the input key type. 
- begin_bit – [in] index of the first (least significant) bit used in key comparison. 
- end_bit – [in] past-the-end index (most significant) bit used in key comparison. 
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class Key, class Size, class Decomposer>
 auto rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#
- Parallel ascending radix sort primitive for device level. - radix_sort_keysfunction performs a device-wide radix sort of keys. Function sorts input keys in ascending order.- Overview
- The contents of both buffers of - keysmay be altered by the sorting function.
- current()of- keysis used as the input.
- The function will update - current()of- keysto point to the buffer that contains the output range.
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- The function requires small - temporary_storageas it does not need a temporary buffer of- sizeelements.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) can be any trivially copyable type.
- decomposermust be a functor that implements- operator()(Key&) const. This operator must return a- rocprim::tuplethat contains one or more reference to value(s) of arithmetic types. These references must point to member variables of- Key, however not every member variable has to be exposed this way.
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
 
- Stability
- radix_sort_keysis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level ascending radix sort is performed on an array of values of a custom type, using a custom decomposer. - #include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}] custom_type * tmp; // empty array of 8 elements // Create double-buffer rocprim::double_buffer<custom_type> keys(input, tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size, custom_type_decomposer{} ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size, custom_type_decomposer{} ); // keys.current(): [{-3, 0.3}, {-1, 0.7}, {0, 0.2}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {11, 0.08}, {11, 1.0}] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- Key – key type. Must be an integral type or a floating-point type. 
- Size – integral type that represents the problem size. 
- Decomposer – The type of the decomposer functor. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys – [inout] reference to the double-buffer of keys, its - current()contains the input range and will be updated to point to the output range.
- size – [in] number of element in the input range. 
- decomposer – [in] decomposer functor that produces a tuple of references from the input key type. 
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
Descending Sort#
- 
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
 hipError_t rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#
- Parallel descending radix sort primitive for device level. - radix_sort_keys_descfunction performs a device-wide radix sort of keys. Function sorts input keys in descending order.- Overview
- The contents of the inputs are not altered by the sorting function. 
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
- If - Keyis an integer type and the range of keys is known in advance, the performance can be improved by setting- begin_bitand- end_bit, for example if all keys are in range [100, 10000],- begin_bit = 0and- end_bit = 14will cover the whole range.
 
- Stability
- radix_sort_keys_descis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level descending radix sort is performed on an array of integer values. - #include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * input; // e.g., [6, 3, 5, 4, 2, 8, 1, 7] int * output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size ); // keys_output: [8, 7, 6, 5, 4, 3, 2, 1] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- Size – integral type that represents the problem size. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys_input – [in] pointer to the first element in the range to sort. 
- keys_output – [out] pointer to the first element in the output range. 
- size – [in] number of element in the input range. 
- begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Must be in range - [0; 8 * sizeof(Key)). Default value:- 0. Non-default value not supported for floating-point key-types.
- end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range - (begin_bit; 8 * sizeof(Key)]. Default value:- * sizeof(Key). Non-default value not supported for floating-point key-types.
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
 auto rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#
- Parallel descending radix sort primitive for device level. - radix_sort_keys_descfunction performs a device-wide radix sort of keys. Function sorts input keys in descending order.- Overview
- The contents of the inputs are not altered by the sorting function. 
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) can be any trivially copyable type.
- decomposermust be a functor that implements- operator()(Key&) const. This operator must return a- rocprim::tuplethat contains one or more reference to value(s) of arithmetic types. These references must point to member variables of- Key, however not every member variable has to be exposed this way.
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
- begin_bitand- end_bitcan be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returns- rocprim::tuple<int16_t&, uint8_t&>,- begin_bit==6and- end_bit==12, then the 2 MSBs of the- uint8_tvalue and the 4 LSBs of the- int16_tvalue are considered for sorting. The range specified by- begin_bitand- end_bitmust be valid with regards to the sizes of the return tuple’s elements.
 
- Stability
- radix_sort_keys_descis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level descending radix sort is performed on an array of values of a custom type, using a custom decomposer. - #include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and output (declare pointers, allocate device memory etc.) constexpr unsigned int begin_bit = 0; constexpr unsigned int end_bit = 96; size_t input_size; // e.g., 8 custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}] custom_type * output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // keys_output: [{11, 1.0}, {11, 0.08}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.2}, {-1, 0.7}, {-3, 0.3},] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- Size – integral type that represents the problem size. 
- Key – The value type of the input and output iterators. 
- Decomposer – The type of the decomposer functor. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys_input – [in] pointer to the first element in the range to sort. 
- keys_output – [out] pointer to the first element in the output range. 
- size – [in] number of element in the input range. 
- decomposer – [in] decomposer functor that produces a tuple of references from the input key type. 
- begin_bit – [in] index of the first (least significant) bit used in key comparison. 
- end_bit – [in] past-the-end index (most significant) bit used in key comparison. 
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
 auto rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#
- Parallel descending radix sort primitive for device level. - radix_sort_keys_descfunction performs a device-wide radix sort of keys. Function sorts input keys in descending order.- Overview
- The contents of the inputs are not altered by the sorting function. 
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) can be any trivially copyable type.
- decomposermust be a functor that implements- operator()(Key&) const. This operator must return a- rocprim::tuplethat contains one or more reference to value(s) of arithmetic types. These references must point to member variables of- Key, however not every member variable has to be exposed this way.
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
 
- Stability
- radix_sort_keys_descis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level descending radix sort is performed on an array of values of a custom type, using a custom decomposer. - #include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}] custom_type * output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, custom_type_decomposer{} ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, custom_type_decomposer{} ); // keys_output: [{11, 1.0}, {11, 0.08}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.2}, {-1, 0.7}, {-3, 0.3},] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- Size – integral type that represents the problem size. 
- Key – The value type of the input and output iterators. 
- Decomposer – The type of the decomposer functor. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys_input – [in] pointer to the first element in the range to sort. 
- keys_output – [out] pointer to the first element in the output range. 
- size – [in] number of element in the input range. 
- decomposer – [in] decomposer functor that produces a tuple of references from the input key type. 
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class Key, class Size>
 hipError_t rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#
- Parallel descending radix sort primitive for device level. - radix_sort_keys_descfunction performs a device-wide radix sort of keys. Function sorts input keys in descending order.- Overview
- The contents of both buffers of - keysmay be altered by the sorting function.
- current()of- keysis used as the input.
- The function will update - current()of- keysto point to the buffer that contains the output range.
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- The function requires small - temporary_storageas it does not need a temporary buffer of- sizeelements.
- Keytype must be an arithmetic type (that is, an integral type or a floating-point type).
- Buffers of - keysmust have at least- sizeelements.
- If - Keyis an integer type and the range of keys is known in advance, the performance can be improved by setting- begin_bitand- end_bit, for example if all keys are in range [100, 10000],- begin_bit = 0and- end_bit = 14will cover the whole range.
 
- Stability
- radix_sort_keys_descis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level descending radix sort is performed on an array of integer values. - #include <rocprim/rocprim.hpp> // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * input; // e.g., [6, 3, 5, 4, 2, 8, 1, 7] int * tmp; // empty array of 8 elements // Create double-buffer rocprim::double_buffer<int> keys(input, tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size ); // keys.current(): [8, 7, 6, 5, 4, 3, 2, 1] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- Key – key type. Must be an integral type or a floating-point type. 
- Size – integral type that represents the problem size. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys – [inout] reference to the double-buffer of keys, its - current()contains the input range and will be updated to point to the output range.
- size – [in] number of element in the input range. 
- begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Must be in range - [0; 8 * sizeof(Key)). Default value:- 0. Non-default value not supported for floating-point key-types.
- end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range - (begin_bit; 8 * sizeof(Key)]. Default value:- * sizeof(Key). Non-default value not supported for floating-point key-types.
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class Key, class Size, class Decomposer>
 auto rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#
- Parallel descending radix sort primitive for device level. - radix_sort_keys_descfunction performs a device-wide radix sort of keys. Function sorts input keys in descending order.- Overview
- The contents of both buffers of - keysmay be altered by the sorting function.
- current()of- keysis used as the input.
- The function will update - current()of- keysto point to the buffer that contains the output range.
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- The function requires small - temporary_storageas it does not need a temporary buffer of- sizeelements.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) can be any trivially copyable type.
- decomposermust be a functor that implements- operator()(Key&) const. This operator must return a- rocprim::tuplethat contains one or more reference to value(s) of arithmetic types. These references must point to member variables of- Key, however not every member variable has to be exposed this way.
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
- begin_bitand- end_bitcan be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returns- rocprim::tuple<int16_t&, uint8_t&>,- begin_bit==6and- end_bit==12, then the 2 MSBs of the- uint8_tvalue and the 4 LSBs of the- int16_tvalue are considered for sorting. The range specified by- begin_bitand- end_bitmust be valid with regards to the sizes of the return tuple’s elements.
 
- Stability
- radix_sort_keys_descis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level descending radix sort is performed on an array of values of a custom type, using a custom decomposer. - #include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and tmp (declare pointers, allocate device memory etc.) constexpr unsigned int begin_bit = 0; constexpr unsigned int end_bit = 96; size_t input_size; // e.g., 8 custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}] custom_type * tmp; // empty array of 8 elements // Create double-buffer rocprim::double_buffer<custom_type> keys(input, tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size, custom_type_decomposer{} ); // keys.current(): [{11, 1.0}, {11, 0.08}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.2}, {-1, 0.7}, {-3, 0.3},] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- Key – key type. Must be an integral type or a floating-point type. 
- Size – integral type that represents the problem size. 
- Decomposer – The type of the decomposer functor. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys – [inout] reference to the double-buffer of keys, its - current()contains the input range and will be updated to point to the output range.
- size – [in] number of element in the input range. 
- decomposer – [in] decomposer functor that produces a tuple of references from the input key type. 
- begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Defaults to - 0.
- end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Defaults to the size of the decomposed tuple’s bit range. 
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class Key, class Size, class Decomposer>
 auto rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#
- Parallel descending radix sort primitive for device level. - radix_sort_keys_descfunction performs a device-wide radix sort of keys. Function sorts input keys in descending order.- Overview
- The contents of both buffers of - keysmay be altered by the sorting function.
- current()of- keysis used as the input.
- The function will update - current()of- keysto point to the buffer that contains the output range.
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- The function requires small - temporary_storageas it does not need a temporary buffer of- sizeelements.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) can be any trivially copyable type.
- decomposermust be a functor that implements- operator()(Key&) const. This operator must return a- rocprim::tuplethat contains one or more reference to value(s) of arithmetic types. These references must point to member variables of- Key, however not every member variable has to be exposed this way.
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
 
- Stability
- radix_sort_keys_descis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level descending radix sort is performed on an array of values of a custom type, using a custom decomposer. - #include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * input; // e.g., [{2, 0.6}, {-3, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1}, {-1, 0.7}] custom_type * tmp; // empty array of 8 elements // Create double-buffer rocprim::double_buffer<custom_type> keys(input, tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size, custom_type_decomposer{} ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size, custom_type_decomposer{} ); // keys.current(): [{11, 1.0}, {11, 0.08}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.2}, {-1, 0.7}, {-3, 0.3},] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- Key – key type. Must be an integral type or a floating-point type. 
- Size – integral type that represents the problem size. 
- Decomposer – The type of the decomposer functor. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys – [inout] reference to the double-buffer of keys, its - current()contains the input range and will be updated to point to the output range.
- size – [in] number of element in the input range. 
- decomposer – [in] decomposer functor that produces a tuple of references from the input key type. 
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
Segmented Ascending Sort#
- 
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
 inline hipError_t rocprim::segmented_radix_sort_keys(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#
- Parallel ascending radix sort primitive for device level. - segmented_radix_sort_keysfunction performs a device-wide radix sort across multiple, non-overlapping sequences of keys. Function sorts input keys in ascending order.- Overview
- The contents of the inputs are not altered by the sorting function. 
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storagein a null pointer.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
- Ranges specified by - begin_offsetsand- end_offsetsmust have at least- segmentselements. They may use the same sequence- offsetsof at least- segments + 1elements:- offsetsfor- begin_offsetsand- offsets + 1for- end_offsets.
- If - Keyis an integer type and the range of keys is known in advance, the performance can be improved by setting- begin_bitand- end_bit, for example if all keys are in range [100, 10000],- begin_bit = 0and- end_bit = 14will cover the whole range.
 
- Stability
- segmented_radix_sort_keysis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level ascending radix sort is performed on an array of - floatvalues.- #include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 float * input; // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7] float * output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // keys_output: [0.3, 0.6, 0.65, 0.08, 0.2, 0.4, 0.7, 1] 
 - Template Parameters:
- Config – - [optional] Configuration of the primitive, must be - default_configor- segmented_radix_sort_config.
- KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
 
- Parameters:
- temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] - reference to a size (in bytes) of - temporary_storage.
- keys_input – [in] - pointer to the first element in the range to sort. 
- keys_output – [out] - pointer to the first element in the output range. 
- size – [in] - number of element in the input range. 
- segments – [in] - number of segments in the input range. 
- begin_offsets – [in] - iterator to the first element in the range of beginning offsets. 
- end_offsets – [in] - iterator to the first element in the range of ending offsets. 
- begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range - [0; 8 * sizeof(Key)). Default value:- 0. Non-default value not supported for floating-point key-types.
- end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range - (begin_bit; 8 * sizeof(Key)]. Default value:- * sizeof(Key). Non-default value not supported for floating-point key-types.
- stream – [in] - [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
Segmented Descending Sort#
- 
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
 inline hipError_t rocprim::segmented_radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#
- Parallel descending radix sort primitive for device level. - segmented_radix_sort_keys_descfunction performs a device-wide radix sort across multiple, non-overlapping sequences of keys. Function sorts input keys in descending order.- Overview
- The contents of the inputs are not altered by the sorting function. 
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storagein a null pointer.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
- Ranges specified by - begin_offsetsand- end_offsetsmust have at least- segmentselements. They may use the same sequence- offsetsof at least- segments + 1elements:- offsetsfor- begin_offsetsand- offsets + 1for- end_offsets.
- If - Keyis an integer type and the range of keys is known in advance, the performance can be improved by setting- begin_bitand- end_bit, for example if all keys are in range [100, 10000],- begin_bit = 0and- end_bit = 14will cover the whole range.
 
- Stability
- segmented_radix_sort_keys_descis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level descending radix sort is performed on an array of integer values. - #include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * input; // e.g., [6, 3, 5, 4, 2, 8, 1, 7] int * output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // keys_output: [6, 3, 5, 8, 7, 4, 2, 1] 
 - Template Parameters:
- Config – - [optional] Configuration of the primitive, must be - default_configor- segmented_radix_sort_config.
- KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
 
- Parameters:
- temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] - reference to a size (in bytes) of - temporary_storage.
- keys_input – [in] - pointer to the first element in the range to sort. 
- keys_output – [out] - pointer to the first element in the output range. 
- size – [in] - number of element in the input range. 
- segments – [in] - number of segments in the input range. 
- begin_offsets – [in] - iterator to the first element in the range of beginning offsets. 
- end_offsets – [in] - iterator to the first element in the range of ending offsets. 
- begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range - [0; 8 * sizeof(Key)). Default value:- 0. Non-default value not supported for floating-point key-types.
- end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range - (begin_bit; 8 * sizeof(Key)]. Default value:- * sizeof(Key). Non-default value not supported for floating-point key-types.
- stream – [in] - [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
radix_sort_pairs#
Ascending Sort#
- 
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
 hipError_t rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#
- Parallel ascending radix sort-by-key primitive for device level. - radix_sort_pairsfunction performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
- The contents of the inputs are not altered by the sorting function. 
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).
- Ranges specified by - keys_input,- keys_output,- values_inputand- values_outputmust have at least- sizeelements.
- If - Keyis an integer type and the range of keys is known in advance, the performance can be improved by setting- begin_bitand- end_bit, for example if all keys are in range [100, 10000],- begin_bit = 0and- end_bit = 14will cover the whole range.
 
- Stability
- radix_sort_pairsis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level ascending radix sort is performed where input keys are represented by an array of unsigned integers and input values by an array of - doubles.- #include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 unsigned int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] unsigned int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements // Keys are in range [0; 8], so we can limit compared bit to bits on indexes // 0, 1, 2, 3, and 4. In order to do this begin_bit is set to 0 and end_bit // is set to 5. size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, 0, 5 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, 0, 5 ); // keys_output: [ 1, 1, 3, 4, 5, 6, 7, 8] // values_output: [-1, -2, 2, 3, -4, -5, 7, -8] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- ValuesInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- ValuesOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- Size – integral type that represents the problem size. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys_input – [in] pointer to the first element in the range to sort. 
- keys_output – [out] pointer to the first element in the output range. 
- values_input – [in] pointer to the first element in the range to sort. 
- values_output – [out] pointer to the first element in the output range. 
- size – [in] number of element in the input range. 
- begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Must be in range - [0; 8 * sizeof(Key)). Default value:- 0. Non-default value not supported for floating-point key-types.
- end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range - (begin_bit; 8 * sizeof(Key)]. Default value:- * sizeof(Key). Non-default value not supported for floating-point key-types.
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
 auto rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#
- Parallel ascending radix sort-by-key primitive for device level. - radix_sort_pairsfunction performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
- The contents of the inputs are not altered by the sorting function. 
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) can be any trivially copyable type.
- decomposermust be a functor that implements- operator()(Key&) const. This operator must return a- rocprim::tuplethat contains one or more reference to value(s) of arithmetic types. These references must point to member variables of- Key, however not every member variable has to be exposed this way.
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
- begin_bitand- end_bitcan be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returns- rocprim::tuple<int16_t&, uint8_t&>,- begin_bit==6and- end_bit==12, then the 2 MSBs of the- uint8_tvalue and the 4 LSBs of the- int16_tvalue are considered for sorting. The range specified by- begin_bitand- end_bitmust be valid with regards to the sizes of the return tuple’s elements.
 
- Stability
- radix_sort_pairsis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level ascending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of - doubles.- #include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * keys_input; // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] custom_type * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements // The integer field of the keys is in range 0-11, which can be represented on 4 bits, // while for the double member we must specify full bit range [0; 63]. Therefore begin_bit // is set to 0 and end_bit is set to 68. constexpr unsigned int begin_bit = 0; constexpr unsigned int end_bit = 68; size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // keys_output: [{0, 0.2}, {0, 0.3}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {5, 0.7}, {11, 0.08}, {11, 1.0}] // values_output: [-1, 2, 3, -5, -4, 7, -8, -2] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- ValuesInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- ValuesOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- Size – integral type that represents the problem size. 
- Key – The value type of the input and output iterators. 
- Decomposer – The type of the decomposer functor. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys_input – [in] pointer to the first element in the range to sort. 
- keys_output – [out] pointer to the first element in the output range. 
- values_input – [in] pointer to the first element in the range to sort. 
- values_output – [out] pointer to the first element in the output range. 
- size – [in] number of element in the input range. 
- decomposer – [in] decomposer functor that produces a tuple of references from the input key type. 
- begin_bit – [in] index of the first (least significant) bit used in key comparison. 
- end_bit – [in] past-the-end index (most significant) bit used in key comparison. 
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
 auto rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#
- Parallel ascending radix sort-by-key primitive for device level. - radix_sort_pairsfunction performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
- The contents of the inputs are not altered by the sorting function. 
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) can be any trivially copyable type.
- decomposermust be a functor that implements- operator()(Key&) const. This operator must return a- rocprim::tuplethat contains one or more reference to value(s) of arithmetic types. These references must point to member variables of- Key, however not every member variable has to be exposed this way.
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
 
- Stability
- radix_sort_pairsis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level ascending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of - doubles.- #include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * keys_input; // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] custom_type * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, custom_type_decomposer{} ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, custom_type_decomposer{} ); // keys_output: [{0, 0.2}, {0, 0.3}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {5, 0.7}, {11, 0.08}, {11, 1.0}] // values_output: [-1, 2, 3, -5, -4, 7, -8, -2] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- ValuesInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- ValuesOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- Size – integral type that represents the problem size. 
- Key – The value type of the input and output iterators. 
- Decomposer – The type of the decomposer functor. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys_input – [in] pointer to the first element in the range to sort. 
- keys_output – [out] pointer to the first element in the output range. 
- values_input – [in] pointer to the first element in the range to sort. 
- values_output – [out] pointer to the first element in the output range. 
- size – [in] number of element in the input range. 
- decomposer – [in] decomposer functor that produces a tuple of references from the input key type. 
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class Key, class Value, class Size>
 hipError_t rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#
- Parallel ascending radix sort-by-key primitive for device level. - radix_sort_pairsfunction performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
- The contents of both buffers of - keysand- valuesmay be altered by the sorting function.
- current()of- keysand- valuesare used as the input.
- The function will update - current()of- keysand- valuesto point to buffers that contains the output range.
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- The function requires small - temporary_storageas it does not need a temporary buffer of- sizeelements.
- Keytype must be an arithmetic type (that is, an integral type or a floating-point type).
- Buffers of - keysmust have at least- sizeelements.
- If - Keyis an integer type and the range of keys is known in advance, the performance can be improved by setting- begin_bitand- end_bit, for example if all keys are in range [100, 10000],- begin_bit = 0and- end_bit = 14will cover the whole range.
 
- Stability
- radix_sort_pairsis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level ascending radix sort is performed where input keys are represented by an array of unsigned integers and input values by an array of - doubles.- #include <rocprim/rocprim.hpp> // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 unsigned int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] unsigned int * keys_tmp; // empty array of 8 elements double* values_tmp; // empty array of 8 elements // Create double-buffers rocprim::double_buffer<unsigned int> keys(keys_input, keys_tmp); rocprim::double_buffer<double> values(values_input, values_tmp); // Keys are in range [0; 8], so we can limit compared bit to bits on indexes // 0, 1, 2, 3, and 4. In order to do this begin_bit is set to 0 and end_bit // is set to 5. size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, 0, 5 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, 0, 5 ); // keys.current(): [ 1, 1, 3, 4, 5, 6, 7, 8] // values.current(): [-1, -2, 2, 3, -4, -5, 7, -8] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- Key – key type. Must be an integral type or a floating-point type. 
- Value – value type. 
- Size – integral type that represents the problem size. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys – [inout] reference to the double-buffer of keys, its - current()contains the input range and will be updated to point to the output range.
- values – [inout] reference to the double-buffer of values, its - current()contains the input range and will be updated to point to the output range.
- size – [in] number of element in the input range. 
- begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Must be in range - [0; 8 * sizeof(Key)). Default value:- 0. Non-default value not supported for floating-point key-types.
- end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range - (begin_bit; 8 * sizeof(Key)]. Default value:- * sizeof(Key). Non-default value not supported for floating-point key-types.
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class Key, class Value, class Size, class Decomposer>
 auto rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#
- Parallel ascending radix sort-by-key primitive for device level. - radix_sort_pairsfunction performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
- The contents of both buffers of - keysand- valuesmay be altered by the sorting function.
- current()of- keysand- valuesare used as the input.
- The function will update - current()of- keysand- valuesto point to buffers that contains the output range.
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- The function requires small - temporary_storageas it does not need a temporary buffer of- sizeelements.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) can be any trivially copyable type.
- decomposermust be a functor that implements- operator()(Key&) const. This operator must return a- rocprim::tuplethat contains one or more reference to value(s) of arithmetic types. These references must point to member variables of- Key, however not every member variable has to be exposed this way.
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
- begin_bitand- end_bitcan be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returns- rocprim::tuple<int16_t&, uint8_t&>,- begin_bit==6and- end_bit==12, then the 2 MSBs of the- uint8_tvalue and the 4 LSBs of the- int16_tvalue are considered for sorting. The range specified by- begin_bitand- end_bitmust be valid with regards to the sizes of the return tuple’s elements.
 
- Stability
- radix_sort_pairsis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level ascending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of - doubles.- #include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * keys_input; // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] custom_type * keys_tmp; // empty array of 8 elements double* values_tmp; // empty array of 8 elements // Create double-buffers rocprim::double_buffer<custom_type> keys(keys_input, keys_tmp); rocprim::double_buffer<double> values(values_input, values_tmp); // The integer field of the keys is in range 0-11, which can be represented on 4 bits, // while for the double member we must specify full bit range [0; 63]. Therefore begin_bit // is set to 0 and end_bit is set to 68. constexpr unsigned int begin_bit = 0; constexpr unsigned int end_bit = 68; size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // keys.current(): [{0, 0.2}, {0, 0.3}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {5, 0.7}, {11, 0.08}, {11, 1.0}] // values.current(): [-1, 2, 3, -5, -4, 7, -8, -2] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- Key – key type. Must be an integral type or a floating-point type. 
- Value – value type. 
- Size – integral type that represents the problem size. 
- Decomposer – The type of the decomposer functor. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys – [inout] reference to the double-buffer of keys, its - current()contains the input range and will be updated to point to the output range.
- values – [inout] reference to the double-buffer of values, its - current()contains the input range and will be updated to point to the output range.
- size – [in] number of element in the input range. 
- decomposer – [in] decomposer functor that produces a tuple of references from the input key type. 
- begin_bit – [in] index of the first (least significant) bit used in key comparison. 
- end_bit – [in] past-the-end index (most significant) bit used in key comparison. Defaults to the size of the decomposed tuple’s bit range. 
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class Key, class Value, class Size, class Decomposer>
 auto rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#
- Parallel ascending radix sort-by-key primitive for device level. - radix_sort_pairsfunction performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
- The contents of both buffers of - keysand- valuesmay be altered by the sorting function.
- current()of- keysand- valuesare used as the input.
- The function will update - current()of- keysand- valuesto point to buffers that contains the output range.
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- The function requires small - temporary_storageas it does not need a temporary buffer of- sizeelements.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) can be any trivially copyable type.
- decomposermust be a functor that implements- operator()(Key&) const. This operator must return a- rocprim::tuplethat contains one or more reference to value(s) of arithmetic types. These references must point to member variables of- Key, however not every member variable has to be exposed this way.
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
 
- Stability
- radix_sort_pairsis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level ascending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of - doubles.- #include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * keys_input; // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] custom_type * keys_tmp; // empty array of 8 elements double* values_tmp; // empty array of 8 elements // Create double-buffers rocprim::double_buffer<custom_type> keys(keys_input, keys_tmp); rocprim::double_buffer<double> values(values_input, values_tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, custom_type_decomposer{} ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, custom_type_decomposer{} ); // keys.current(): [{0, 0.2}, {0, 0.3}, {0, 0.4}, {2, 0.6}, {2, 0.65}, {5, 0.7}, {11, 0.08}, {11, 1.0}] // values.current(): [-1, 2, 3, -5, -4, 7, -8, -2] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- Key – key type. Must be an integral type or a floating-point type. 
- Value – value type. 
- Size – integral type that represents the problem size. 
- Decomposer – The type of the decomposer functor. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys – [inout] reference to the double-buffer of keys, its - current()contains the input range and will be updated to point to the output range.
- values – [inout] reference to the double-buffer of values, its - current()contains the input range and will be updated to point to the output range.
- size – [in] number of element in the input range. 
- decomposer – [in] decomposer functor that produces a tuple of references from the input key type. 
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
Descending Sort#
- 
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
 hipError_t rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#
- Parallel descending radix sort-by-key primitive for device level. - radix_sort_pairs_descfunction performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
- The contents of the inputs are not altered by the sorting function. 
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).
- Ranges specified by - keys_input,- keys_output,- values_inputand- values_outputmust have at least- sizeelements.
- If - Keyis an integer type and the range of keys is known in advance, the performance can be improved by setting- begin_bitand- end_bit, for example if all keys are in range [100, 10000],- begin_bit = 0and- end_bit = 14will cover the whole range.
 
- Stability
- radix_sort_pairs_descis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level descending radix sort is performed where input keys are represented by an array of integers and input values by an array of - doubles.- #include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size ); // keys_output: [ 8, 7, 6, 5, 4, 3, 1, 1] // values_output: [-8, 7, -5, -4, 3, 2, -1, -2] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- ValuesInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- ValuesOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- Size – integral type that represents the problem size. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys_input – [in] pointer to the first element in the range to sort. 
- keys_output – [out] pointer to the first element in the output range. 
- values_input – [in] pointer to the first element in the range to sort. 
- values_output – [out] pointer to the first element in the output range. 
- size – [in] number of element in the input range. 
- begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Must be in range - [0; 8 * sizeof(Key)). Default value:- 0. Non-default value not supported for floating-point key-types.
- end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range - (begin_bit; 8 * sizeof(Key)]. Default value:- * sizeof(Key). Non-default value not supported for floating-point key-types.
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
 auto rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#
- Parallel descending radix sort-by-key primitive for device level. - radix_sort_pairs_descfunction performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
- The contents of the inputs are not altered by the sorting function. 
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) can be any trivially copyable type.
- decomposermust be a functor that implements- operator()(Key&) const. This operator must return a- rocprim::tuplethat contains one or more reference to value(s) of arithmetic types. These references must point to member variables of- Key, however not every member variable has to be exposed this way.
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
- begin_bitand- end_bitcan be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returns- rocprim::tuple<int16_t&, uint8_t&>,- begin_bit==6and- end_bit==12, then the 2 MSBs of the- uint8_tvalue and the 4 LSBs of the- int16_tvalue are considered for sorting. The range specified by- begin_bitand- end_bitmust be valid with regards to the sizes of the return tuple’s elements.
 
- Stability
- radix_sort_pairs_descis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level descending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of - doubles.- #include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * keys_input; // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] custom_type * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements // The integer field of the keys is in range 0-11, which can be represented on 4 bits, // while for the double member we must specify full bit range [0; 63]. Therefore begin_bit // is set to 0 and end_bit is set to 68. constexpr unsigned int begin_bit = 0; constexpr unsigned int end_bit = 68; size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // keys_output: [{11, 1.0}, {11, 0.08}, {5, 0.7}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.3}, {0, 0.2}] // values_output: [-2, -1, 2, 3, -4, -5, 7, -8] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- ValuesInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- ValuesOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- Size – integral type that represents the problem size. 
- Key – The value type of the input and output iterators. 
- Decomposer – The type of the decomposer functor. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys_input – [in] pointer to the first element in the range to sort. 
- keys_output – [out] pointer to the first element in the output range. 
- values_input – [in] pointer to the first element in the range to sort. 
- values_output – [out] pointer to the first element in the output range. 
- size – [in] number of element in the input range. 
- decomposer – [in] decomposer functor that produces a tuple of references from the input key type. 
- begin_bit – [in] index of the first (least significant) bit used in key comparison. 
- end_bit – [in] past-the-end index (most significant) bit used in key comparison. 
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type, class Decomposer>
 auto rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#
- Parallel descending radix sort-by-key primitive for device level. - radix_sort_pairs_descfunction performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
- The contents of the inputs are not altered by the sorting function. 
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) can be any trivially copyable type.
- decomposermust be a functor that implements- operator()(Key&) const. This operator must return a- rocprim::tuplethat contains one or more reference to value(s) of arithmetic types. These references must point to member variables of- Key, however not every member variable has to be exposed this way.
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
 
- Stability
- radix_sort_pairs_descis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level descending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of - doubles.- #include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * keys_input; // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] custom_type * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, custom_type_decomposer{} ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, custom_type_decomposer{} ); // keys_output: [{11, 1.0}, {11, 0.08}, {5, 0.7}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.3}, {0, 0.2}] // values_output: [-2, -1, 2, 3, -4, -5, 7, -8] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- KeysInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- KeysOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- ValuesInputIterator – random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- ValuesOutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- Size – integral type that represents the problem size. 
- Key – The value type of the input and output iterators. 
- Decomposer – The type of the decomposer functor. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys_input – [in] pointer to the first element in the range to sort. 
- keys_output – [out] pointer to the first element in the output range. 
- values_input – [in] pointer to the first element in the range to sort. 
- values_output – [out] pointer to the first element in the output range. 
- size – [in] number of element in the input range. 
- decomposer – [in] decomposer functor that produces a tuple of references from the input key type. 
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class Key, class Value, class Size>
 hipError_t rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#
- Parallel descending radix sort-by-key primitive for device level. - radix_sort_pairs_descfunction performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
- The contents of both buffers of - keysand- valuesmay be altered by the sorting function.
- current()of- keysand- valuesare used as the input.
- The function will update - current()of- keysand- valuesto point to buffers that contains the output range.
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- The function requires small - temporary_storageas it does not need a temporary buffer of- sizeelements.
- Keytype must be an arithmetic type (that is, an integral type or a floating-point type).
- Buffers of - keysmust have at least- sizeelements.
- If - Keyis an integer type and the range of keys is known in advance, the performance can be improved by setting- begin_bitand- end_bit, for example if all keys are in range [100, 10000],- begin_bit = 0and- end_bit = 14will cover the whole range.
 
- Stability
- radix_sort_pairs_descis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level descending radix sort is performed where input keys are represented by an array of integers and input values by an array of - doubles.- #include <rocprim/rocprim.hpp> // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] int * keys_tmp; // empty array of 8 elements double * values_tmp; // empty array of 8 elements // Create double-buffers rocprim::double_buffer<int> keys(keys_input, keys_tmp); rocprim::double_buffer<double> values(values_input, values_tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size ); // keys.current(): [ 8, 7, 6, 5, 4, 3, 1, 1] // values.current(): [-8, 7, -5, -4, 3, 2, -1, -2] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- Key – key type. Must be an integral type or a floating-point type. 
- Value – value type. 
- Size – integral type that represents the problem size. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys – [inout] reference to the double-buffer of keys, its - current()contains the input range and will be updated to point to the output range.
- values – [inout] reference to the double-buffer of values, its - current()contains the input range and will be updated to point to the output range.
- size – [in] number of element in the input range. 
- begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Must be in range - [0; 8 * sizeof(Key)). Default value:- 0. Non-default value not supported for floating-point key-types.
- end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Must be in range - (begin_bit; 8 * sizeof(Key)]. Default value:- * sizeof(Key). Non-default value not supported for floating-point key-types.
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class Key, class Value, class Size, class Decomposer>
 auto rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, Size size, Decomposer decomposer, unsigned int begin_bit, unsigned int end_bit, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#
- Parallel descending radix sort-by-key primitive for device level. - radix_sort_pairs_descfunction performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
- The contents of both buffers of - keysand- valuesmay be altered by the sorting function.
- current()of- keysand- valuesare used as the input.
- The function will update - current()of- keysand- valuesto point to buffers that contains the output range.
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- The function requires small - temporary_storageas it does not need a temporary buffer of- sizeelements.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) can be any trivially copyable type.
- decomposermust be a functor that implements- operator()(Key&) const. This operator must return a- rocprim::tuplethat contains one or more reference to value(s) of arithmetic types. These references must point to member variables of- Key, however not every member variable has to be exposed this way.
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
- begin_bitand- end_bitcan be provided to control the radix range that is considered in the decomposed tuple. For example, if the decomposer returns- rocprim::tuple<int16_t&, uint8_t&>,- begin_bit==6and- end_bit==12, then the 2 MSBs of the- uint8_tvalue and the 4 LSBs of the- int16_tvalue are considered for sorting. The range specified by- begin_bitand- end_bitmust be valid with regards to the sizes of the return tuple’s elements.
 
- Stability
- radix_sort_pairs_descis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level descending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of - doubles.- #include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * keys_input; // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] custom_type * keys_tmp; // empty array of 8 elements double * values_tmp; // empty array of 8 elements // Create double-buffers rocprim::double_buffer<custom_type> keys(keys_input, keys_tmp); rocprim::double_buffer<double> values(values_input, values_tmp); // The integer field of the keys is in range 0-11, which can be represented on 4 bits, // while for the double member we must specify full bit range [0; 63]. Therefore begin_bit // is set to 0 and end_bit is set to 68. constexpr unsigned int begin_bit = 0; constexpr unsigned int end_bit = 68; size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, custom_type_decomposer{}, begin_bit, end_bit ); // keys.current(): [{11, 1.0}, {11, 0.08}, {5, 0.7}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.3}, {0, 0.2}] // values.current(): [-2, -1, 2, 3, -4, -5, 7, -8] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- Key – key type. Must be an integral type or a floating-point type. 
- Value – value type. 
- Size – integral type that represents the problem size. 
- Decomposer – The type of the decomposer functor. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys – [inout] reference to the double-buffer of keys, its - current()contains the input range and will be updated to point to the output range.
- values – [inout] reference to the double-buffer of values, its - current()contains the input range and will be updated to point to the output range.
- size – [in] number of element in the input range. 
- decomposer – [in] decomposer functor that produces a tuple of references from the input key type. 
- begin_bit – [in] [optional] index of the first (least significant) bit used in key comparison. Defaults to - 0.
- end_bit – [in] [optional] past-the-end index (most significant) bit used in key comparison. Defaults to the size of the decomposed tuple’s bit range. 
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
- 
template<class Config = default_config, class Key, class Value, class Size, class Decomposer>
 auto rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, Size size, Decomposer decomposer, hipStream_t stream = 0, bool debug_synchronous = false) -> std::enable_if_t<!std::is_convertible<Decomposer, unsigned int>::value, hipError_t>#
- Parallel descending radix sort-by-key primitive for device level. - radix_sort_pairs_descfunction performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
- The contents of both buffers of - keysand- valuesmay be altered by the sorting function.
- current()of- keysand- valuesare used as the input.
- The function will update - current()of- keysand- valuesto point to buffers that contains the output range.
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storageis a null pointer.
- The function requires small - temporary_storageas it does not need a temporary buffer of- sizeelements.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) can be any trivially copyable type.
- decomposermust be a functor that implements- operator()(Key&) const. This operator must return a- rocprim::tuplethat contains one or more reference to value(s) of arithmetic types. These references must point to member variables of- Key, however not every member variable has to be exposed this way.
- Ranges specified by - keys_inputand- keys_outputmust have at least- sizeelements.
 
- Stability
- radix_sort_pairs_descis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level descending radix sort is performed where input keys are represented by an array of a custom type and input values by an array of - doubles.- #include <rocprim/rocprim.hpp> struct custom_type { int i; double d; }; struct custom_type_decomposer { rocprim::tuple<int&, double&> operator()(custom_type& key) const { return rocprim::tuple<int&, double&>(key.i, key.d); } }; // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 custom_type * keys_input; // e.g., [{2, 0.6}, {0, 0.3}, {2, 0.65}, {0, 0.4}, {0, 0.2}, {11, 0.08}, {11, 1.0}, {5, 0.7}] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] custom_type * keys_tmp; // empty array of 8 elements double * values_tmp; // empty array of 8 elements // Create double-buffers rocprim::double_buffer<custom_type> keys(keys_input, keys_tmp); rocprim::double_buffer<double> values(values_input, values_tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, custom_type_decomposer{} ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, values, input_size, custom_type_decomposer{} ); // keys.current(): [{11, 1.0}, {11, 0.08}, {5, 0.7}, {2, 0.65}, {2, 0.6}, {0, 0.4}, {0, 0.3}, {0, 0.2}] // values.current(): [-2, -1, 2, 3, -4, -5, 7, -8] 
 - Template Parameters:
- Config – [optional] Configuration of the primitive, must be - default_configor- radix_sort_config.
- Key – key type. Must be an integral type or a floating-point type. 
- Value – value type. 
- Size – integral type that represents the problem size. 
- Decomposer – The type of the decomposer functor. 
 
- Parameters:
- temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] reference to a size (in bytes) of - temporary_storage.
- keys – [inout] reference to the double-buffer of keys, its - current()contains the input range and will be updated to point to the output range.
- values – [inout] reference to the double-buffer of values, its - current()contains the input range and will be updated to point to the output range.
- size – [in] number of element in the input range. 
- decomposer – [in] decomposer functor that produces a tuple of references from the input key type. 
- stream – [in] [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
Segmented Ascending Sort#
- 
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
 inline hipError_t rocprim::segmented_radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#
- Parallel ascending radix sort-by-key primitive for device level. - segmented_radix_sort_pairsfunction performs a device-wide radix sort across multiple, non-overlapping sequences of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
- The contents of the inputs are not altered by the sorting function. 
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storagein a null pointer.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).
- Ranges specified by - keys_input,- keys_output,- values_inputand- values_outputmust have at least- sizeelements.
- Ranges specified by - begin_offsetsand- end_offsetsmust have at least- segmentselements. They may use the same sequence- offsetsof at least- segments + 1elements:- offsetsfor- begin_offsetsand- offsets + 1for- end_offsets.
- If - Keyis an integer type and the range of keys is known in advance, the performance can be improved by setting- begin_bitand- end_bit, for example if all keys are in range [100, 10000],- begin_bit = 0and- end_bit = 14will cover the whole range.
 
- Stability
- segmented_radix_sort_pairsis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys.
 
- Example
- In this example a device-level ascending radix sort is performed where input keys are represented by an array of unsigned integers and input values by an array of - doubles.- #include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 unsigned int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] unsigned int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] // Keys are in range [0; 8], so we can limit compared bit to bits on indexes // 0, 1, 2, 3, and 4. In order to do this begin_bit is set to 0 and end_bit // is set to 5. size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1, 0, 5 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1, 0, 5 ); // keys_output: [3, 6, 5, 1, 1, 4, 7, 8] // values_output: [2, -5, -4, -1, -2, 3, 7, -8] 
 - Template Parameters:
- Config – - [optional] Configuration of the primitive, must be - default_configor- segmented_radix_sort_config.
- KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
 
- Parameters:
- temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] - reference to a size (in bytes) of - temporary_storage.
- keys_input – [in] - pointer to the first element in the range to sort. 
- keys_output – [out] - pointer to the first element in the output range. 
- values_input – [in] - pointer to the first element in the range to sort. 
- values_output – [out] - pointer to the first element in the output range. 
- size – [in] - number of element in the input range. 
- segments – [in] - number of segments in the input range. 
- begin_offsets – [in] - iterator to the first element in the range of beginning offsets. 
- end_offsets – [in] - iterator to the first element in the range of ending offsets. 
- begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range - [0; 8 * sizeof(Key)). Default value:- 0. Non-default value not supported for floating-point key-types.
- end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range - (begin_bit; 8 * sizeof(Key)]. Default value:- * sizeof(Key). Non-default value not supported for floating-point key-types.
- stream – [in] - [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.
 
Segmented Descending Sort#
- 
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
 inline hipError_t rocprim::segmented_radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)#
- Parallel descending radix sort-by-key primitive for device level. - segmented_radix_sort_pairs_descfunction performs a device-wide radix sort across multiple, non-overlapping sequences of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
- The contents of the inputs are not altered by the sorting function. 
- Returns the required size of - temporary_storagein- storage_sizeif- temporary_storagein a null pointer.
- Keytype (a- value_typeof- KeysInputIteratorand- KeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).
- Ranges specified by - keys_input,- keys_output,- values_inputand- values_outputmust have at least- sizeelements.
- Ranges specified by - begin_offsetsand- end_offsetsmust have at least- segmentselements. They may use the same sequence- offsetsof at least- segments + 1elements:- offsetsfor- begin_offsetsand- offsets + 1for- end_offsets.
- If - Keyis an integer type and the range of keys is known in advance, the performance can be improved by setting- begin_bitand- end_bit, for example if all keys are in range [100, 10000],- begin_bit = 0and- end_bit = 14will cover the whole range.
 
- Stability
- segmented_radix_sort_pairs_descis stable: it preserves the relative ordering of equivalent keys. That is, given two keys- aand- band a binary boolean operation- opsuch that:- aprecedes- bin the input keys, and
- op(a, b) and op(b, a) are both false, then it is guaranteed that - awill precede- bas well in the output (ordered) keys. (ordered) keys.
 
- Example
- In this example a device-level descending radix sort is performed where input keys are represented by an array of integers and input values by an array of - doubles.- #include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1 ); // keys_output: [ 6, 3, 5, 8, 7, 4, 1, 1] // values_output: [-5, 2, -4, -8, 7, 3, -1, -2] 
 - Template Parameters:
- Config – - [optional] Configuration of the primitive, must be - default_configor- segmented_radix_sort_config.
- KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type. 
- ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
- OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type. 
 
- Parameters:
- temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to - storage_sizeand function returns without performing the sort operation.
- storage_size – [inout] - reference to a size (in bytes) of - temporary_storage.
- keys_input – [in] - pointer to the first element in the range to sort. 
- keys_output – [out] - pointer to the first element in the output range. 
- values_input – [in] - pointer to the first element in the range to sort. 
- values_output – [out] - pointer to the first element in the output range. 
- size – [in] - number of element in the input range. 
- segments – [in] - number of segments in the input range. 
- begin_offsets – [in] - iterator to the first element in the range of beginning offsets. 
- end_offsets – [in] - iterator to the first element in the range of ending offsets. 
- begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range - [0; 8 * sizeof(Key)). Default value:- 0. Non-default value not supported for floating-point key-types.
- end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range - (begin_bit; 8 * sizeof(Key)]. Default value:- * sizeof(Key). Non-default value not supported for floating-point key-types.
- stream – [in] - [optional] HIP stream object. Default is - 0(default stream).
- debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is - false.
 
- Returns:
- hipSuccess(- 0) after successful sort; otherwise a HIP runtime error of type- hipError_t.