Sort#
Configuring the kernel#
merge_sort#
-
template<unsigned int MergeOddevenBlockSize = 512, unsigned int SortBlockSize = MergeOddevenBlockSize, unsigned int SortItemsPerThread = 1, unsigned int MergeMergepathPartitionBlockSize = 128, unsigned int MergeMergepathBlockSize = 128, unsigned int MergeMergepathItemsPerThread = 4, unsigned int MinInputSizeMergepath = (1 << 17) + 70000>
struct merge_sort_config : public rocprim::detail::merge_sort_config_params# Configuration of device-level merge primitives.
- Template Parameters:
SortBlockSize – - block size in the block-sort step
SortItemsPerThread – - ItemsPerThread in the block-sort step
MergeOddevenBlockSize – - block size in the block merge step using oddeven impl (used when input_size < MinInputSizeMergepath)
MergeMergepathPartitionBlockSize – - block size of the partition kernel in the block merge step using mergepath impl
MergeMergepathBlockSize – - block size in the block merge step using mergepath impl
MergeMergepathItemsPerThread – - ItemsPerThread in the block merge step using mergepath impl
MinInputSizeMergepath – - breakpoint of input-size to use mergepath impl for block merge step
radix_sort#
-
template<class SingleSortConfig = default_config, class MergeSortConfig = default_config, class OnesweepConfig = default_config, size_t MergeSortLimit = 1024 * 1024>
struct radix_sort_config# Configuration of device-level radix sort operation.
One of three algorithms is used: single sort (launches only a single block), merge sort, or Onesweep.
- Template Parameters:
SortSingleConfig – - Configuration for the single kernel subalgorithm. must be
kernel_configordefault_config.MergeSortConfig – - Configuration for the merge sort subalgorithm. must be
merge_sort_configordefault_config. Ifmerge_sort_config, the sorted items per block must be a power of two.OnesweepConfig – - Configuration for the Onesweep subalgorithm. must be
radix_sort_onesweep_configordefault_config.MergeSortLimit – - The largest number of items for which the merge sort algorithm will be used. Note that below this limit, a different algorithm may be used.
merge_sort#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<KeysInputIterator>::value_type>>
inline hipError_t rocprim::merge_sort(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, const size_t size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)# Parallel merge sort primitive for device level.
merge_sortfunction performs a device-wide merge sort of keys. Function sorts input keys based on comparison function.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storageinstorage_sizeiftemporary_storagein a null pointer.Accepts custom compare_functions for sorting across the device.
- Example
In this example a device-level ascending merge sort is performed on an array of
floatvalues.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 float * input; // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7] float * output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size ); // keys_output: [0.08, 0.2, 0.3, 0.4, 0.6, 0.65, 0.7, 1]
- Template Parameters:
KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_sizeand function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
compare_function – [in] - binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);. The signature does not need to haveconst &, but function object must not modify the objects passed to it. The default value isBinaryFunction().stream – [in] - [optional] HIP stream object. Default is
0(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false.
- Returns:
hipSuccess(0) after successful sort; otherwise a HIP runtime error of typehipError_t.
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<KeysInputIterator>::value_type>>
inline hipError_t rocprim::merge_sort(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, const size_t size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending merge sort-by-key primitive for device level.
merge_sortfunction performs a device-wide merge sort of (key, value) pairs. Function sorts input pairs based on comparison function.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storageinstorage_sizeiftemporary_storagein a null pointer.Accepts custom compare_functions for sorting across the device.
- Example
In this example a device-level ascending merge sort is performed where input keys are represented by an array of unsigned integers and input values by an array of
doubles.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 unsigned int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 2, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] unsigned int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size ); // keys_output: [ 1, 2, 3, 4, 5, 6, 7, 8] // values_output: [-1, -2, 2, 3, -4, -5, 7, -8]
- Template Parameters:
KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_sizeand function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
values_input – [in] - pointer to the first element in the range to sort.
values_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
compare_function – [in] - binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);. The signature does not need to haveconst &, but function object must not modify the objects passed to it. The default value isBinaryFunction().stream – [in] - [optional] HIP stream object. Default is
0(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false.
- Returns:
hipSuccess(0) after successful sort; otherwise a HIP runtime error of typehipError_t.
radix_sort_keys#
ascending#
-
template<class Config = default_config, class Key, class Size>
inline hipError_t rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending radix sort primitive for device level.
radix_sort_keysfunction performs a device-wide radix sort of keys. Function sorts input keys in ascending order.- Overview
The contents of both buffers of
keysmay be altered by the sorting function.current()ofkeysis used as the input.The function will update
current()ofkeysto point to the buffer that contains the output range.Returns the required size of
temporary_storageinstorage_sizeiftemporary_storagein a null pointer.The function requires small
temporary_storageas it does not need a temporary buffer ofsizeelements.Keytype must be an arithmetic type (that is, an integral type or a floating-point type).Buffers of
keysmust have at leastsizeelements.If
Keyis an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bitandend_bit, for example if all keys are in range [100, 10000],begin_bit = 0andend_bit = 14will cover the whole range.
- Example
In this example a device-level ascending radix sort is performed on an array of
floatvalues.#include <rocprim/rocprim.hpp> // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 float * input; // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7] float * tmp; // empty array of 8 elements // Create double-buffer rocprim::double_buffer<float> keys(input, tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size ); // keys.current(): [0.08, 0.2, 0.3, 0.4, 0.6, 0.65, 0.7, 1]
- Template Parameters:
Config – - [optional] configuration of the primitive. It has to be
radix_sort_configor a class derived from it.Key – - key type. Must be an integral type or a floating-point type.
Size – - integral type that represents the problem size.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_sizeand function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage.keys – [inout] - reference to the double-buffer of keys, its
current()contains the input range and will be updated to point to the output range.size – [in] - number of element in the input range.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key)). Default value:0. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]. Default value:* sizeof(Key). Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false.
- Returns:
hipSuccess(0) after successful sort; otherwise a HIP runtime error of typehipError_t.
descending#
-
template<class Config = default_config, class Key, class Size>
inline hipError_t rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel descending radix sort primitive for device level.
radix_sort_keys_descfunction performs a device-wide radix sort of keys. Function sorts input keys in descending order.- Overview
The contents of both buffers of
keysmay be altered by the sorting function.current()ofkeysis used as the input.The function will update
current()ofkeysto point to the buffer that contains the output range.Returns the required size of
temporary_storageinstorage_sizeiftemporary_storagein a null pointer.The function requires small
temporary_storageas it does not need a temporary buffer ofsizeelements.Keytype must be an arithmetic type (that is, an integral type or a floating-point type).Buffers of
keysmust have at leastsizeelements.If
Keyis an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bitandend_bit, for example if all keys are in range [100, 10000],begin_bit = 0andend_bit = 14will cover the whole range.
- Example
In this example a device-level descending radix sort is performed on an array of integer values.
#include <rocprim/rocprim.hpp> // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * input; // e.g., [6, 3, 5, 4, 2, 8, 1, 7] int * tmp; // empty array of 8 elements // Create double-buffer rocprim::double_buffer<int> keys(input, tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size ); // keys.current(): [8, 7, 6, 5, 4, 3, 2, 1]
- Template Parameters:
Config – - [optional] configuration of the primitive. It has to be
radix_sort_configor a class derived from it.Key – - key type. Must be an integral type or a floating-point type.
Size – - integral type that represents the problem size.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_sizeand function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage.keys – [inout] - reference to the double-buffer of keys, its
current()contains the input range and will be updated to point to the output range.size – [in] - number of element in the input range.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key)). Default value:0. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]. Default value:* sizeof(Key). Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false.
- Returns:
hipSuccess(0) after successful sort; otherwise a HIP runtime error of typehipError_t.
segmented, ascending#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_keys(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending radix sort primitive for device level.
segmented_radix_sort_keysfunction performs a device-wide radix sort across multiple, non-overlapping sequences of keys. Function sorts input keys in ascending order.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storageinstorage_sizeiftemporary_storagein a null pointer.Keytype (avalue_typeofKeysInputIteratorandKeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_inputandkeys_outputmust have at leastsizeelements.Ranges specified by
begin_offsetsandend_offsetsmust have at leastsegmentselements. They may use the same sequenceoffsetsof at leastsegments + 1elements:offsetsforbegin_offsetsandoffsets + 1forend_offsets.If
Keyis an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bitandend_bit, for example if all keys are in range [100, 10000],begin_bit = 0andend_bit = 14will cover the whole range.
- Example
In this example a device-level ascending radix sort is performed on an array of
floatvalues.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 float * input; // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7] float * output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // keys_output: [0.3, 0.6, 0.65, 0.08, 0.2, 0.4, 0.7, 1]
- Template Parameters:
Config – - [optional] configuration of the primitive. It has to be
segmented_radix_sort_configor a class derived from it.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_sizeand function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
segments – [in] - number of segments in the input range.
begin_offsets – [in] - iterator to the first element in the range of beginning offsets.
end_offsets – [in] - iterator to the first element in the range of ending offsets.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key)). Default value:0. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]. Default value:* sizeof(Key). Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false.
- Returns:
hipSuccess(0) after successful sort; otherwise a HIP runtime error of typehipError_t.
segmented, descending#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel descending radix sort primitive for device level.
segmented_radix_sort_keys_descfunction performs a device-wide radix sort across multiple, non-overlapping sequences of keys. Function sorts input keys in descending order.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storageinstorage_sizeiftemporary_storagein a null pointer.Keytype (avalue_typeofKeysInputIteratorandKeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_inputandkeys_outputmust have at leastsizeelements.Ranges specified by
begin_offsetsandend_offsetsmust have at leastsegmentselements. They may use the same sequenceoffsetsof at leastsegments + 1elements:offsetsforbegin_offsetsandoffsets + 1forend_offsets.If
Keyis an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bitandend_bit, for example if all keys are in range [100, 10000],begin_bit = 0andend_bit = 14will cover the whole range.
- Example
In this example a device-level descending radix sort is performed on an array of integer values.
#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * input; // e.g., [6, 3, 5, 4, 2, 8, 1, 7] int * output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // keys_output: [6, 3, 5, 8, 7, 4, 2, 1]
- Template Parameters:
Config – - [optional] configuration of the primitive. It has to be
segmented_radix_sort_configor a class derived from it.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_sizeand function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
segments – [in] - number of segments in the input range.
begin_offsets – [in] - iterator to the first element in the range of beginning offsets.
end_offsets – [in] - iterator to the first element in the range of ending offsets.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key)). Default value:0. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]. Default value:* sizeof(Key). Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false.
- Returns:
hipSuccess(0) after successful sort; otherwise a HIP runtime error of typehipError_t.
radix_sort_pairs#
ascending#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending radix sort-by-key primitive for device level.
radix_sort_pairs_descfunction performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storageinstorage_sizeiftemporary_storagein a null pointer.Keytype (avalue_typeofKeysInputIteratorandKeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input,keys_output,values_inputandvalues_outputmust have at leastsizeelements.If
Keyis an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bitandend_bit, for example if all keys are in range [100, 10000],begin_bit = 0andend_bit = 14will cover the whole range.
- Example
In this example a device-level ascending radix sort is performed where input keys are represented by an array of unsigned integers and input values by an array of
doubles.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 unsigned int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] unsigned int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements // Keys are in range [0; 8], so we can limit compared bit to bits on indexes // 0, 1, 2, 3, and 4. In order to do this begin_bit is set to 0 and end_bit // is set to 5. size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, 0, 5 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, 0, 5 ); // keys_output: [ 1, 1, 3, 4, 5, 6, 7, 8] // values_output: [-1, -2, 2, 3, -4, -5, 7, -8]
- Template Parameters:
Config – - [optional] configuration of the primitive. It has to be
radix_sort_configor a class derived from it.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
Size – - integral type that represents the problem size.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_sizeand function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
values_input – [in] - pointer to the first element in the range to sort.
values_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key)). Default value:0. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]. Default value:* sizeof(Key). Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false.
- Returns:
hipSuccess(0) after successful sort; otherwise a HIP runtime error of typehipError_t.
descending#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel descending radix sort-by-key primitive for device level.
radix_sort_pairs_descfunction performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storageinstorage_sizeiftemporary_storagein a null pointer.Keytype (avalue_typeofKeysInputIteratorandKeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input,keys_output,values_inputandvalues_outputmust have at leastsizeelements.If
Keyis an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bitandend_bit, for example if all keys are in range [100, 10000],begin_bit = 0andend_bit = 14will cover the whole range.
- Example
In this example a device-level descending radix sort is performed where input keys are represented by an array of integers and input values by an array of
doubles.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size ); // keys_output: [ 8, 7, 6, 5, 4, 3, 1, 1] // values_output: [-8, 7, -5, -4, 3, 2, -1, -2]
- Template Parameters:
Config – - [optional] configuration of the primitive. It has to be
radix_sort_configor a class derived from it.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
Size – - integral type that represents the problem size.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_sizeand function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
values_input – [in] - pointer to the first element in the range to sort.
values_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key)). Default value:0. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]. Default value:* sizeof(Key). Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false.
- Returns:
hipSuccess(0) after successful sort; otherwise a HIP runtime error of typehipError_t.
segmented, ascending#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending radix sort-by-key primitive for device level.
segmented_radix_sort_pairs_descfunction performs a device-wide radix sort across multiple, non-overlapping sequences of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storageinstorage_sizeiftemporary_storagein a null pointer.Keytype (avalue_typeofKeysInputIteratorandKeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input,keys_output,values_inputandvalues_outputmust have at leastsizeelements.Ranges specified by
begin_offsetsandend_offsetsmust have at leastsegmentselements. They may use the same sequenceoffsetsof at leastsegments + 1elements:offsetsforbegin_offsetsandoffsets + 1forend_offsets.If
Keyis an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bitandend_bit, for example if all keys are in range [100, 10000],begin_bit = 0andend_bit = 14will cover the whole range.
- Example
In this example a device-level ascending radix sort is performed where input keys are represented by an array of unsigned integers and input values by an array of
doubles.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 unsigned int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] unsigned int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] // Keys are in range [0; 8], so we can limit compared bit to bits on indexes // 0, 1, 2, 3, and 4. In order to do this begin_bit is set to 0 and end_bit // is set to 5. size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1, 0, 5 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1, 0, 5 ); // keys_output: [3, 6, 5, 1, 1, 4, 7, 8] // values_output: [2, -5, -4, -1, -2, 3, 7, -8]
- Template Parameters:
Config – - [optional] configuration of the primitive. It has to be
segmented_radix_sort_configor a class derived from it.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_sizeand function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
values_input – [in] - pointer to the first element in the range to sort.
values_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
segments – [in] - number of segments in the input range.
begin_offsets – [in] - iterator to the first element in the range of beginning offsets.
end_offsets – [in] - iterator to the first element in the range of ending offsets.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key)). Default value:0. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]. Default value:* sizeof(Key). Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false.
- Returns:
hipSuccess(0) after successful sort; otherwise a HIP runtime error of typehipError_t.
segmented, ascending#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel descending radix sort-by-key primitive for device level.
segmented_radix_sort_pairs_descfunction performs a device-wide radix sort across multiple, non-overlapping sequences of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storageinstorage_sizeiftemporary_storagein a null pointer.Keytype (avalue_typeofKeysInputIteratorandKeysOutputIterator) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input,keys_output,values_inputandvalues_outputmust have at leastsizeelements.Ranges specified by
begin_offsetsandend_offsetsmust have at leastsegmentselements. They may use the same sequenceoffsetsof at leastsegments + 1elements:offsetsforbegin_offsetsandoffsets + 1forend_offsets.If
Keyis an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bitandend_bit, for example if all keys are in range [100, 10000],begin_bit = 0andend_bit = 14will cover the whole range.
- Example
In this example a device-level descending radix sort is performed where input keys are represented by an array of integers and input values by an array of
doubles.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1 ); // keys_output: [ 6, 3, 5, 8, 7, 4, 1, 1] // values_output: [-5, 2, -4, -8, 7, 3, -1, -2]
- Template Parameters:
Config – - [optional] configuration of the primitive. It has to be
segmented_radix_sort_configor a class derived from it.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_sizeand function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
values_input – [in] - pointer to the first element in the range to sort.
values_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
segments – [in] - number of segments in the input range.
begin_offsets – [in] - iterator to the first element in the range of beginning offsets.
end_offsets – [in] - iterator to the first element in the range of ending offsets.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key)). Default value:0. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]. Default value:* sizeof(Key). Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false.
- Returns:
hipSuccess(0) after successful sort; otherwise a HIP runtime error of typehipError_t.