Sort#
Configuring the kernel#
merge_sort#
-
template<unsigned int MergeBlockSize, unsigned int SortBlockSize = MergeBlockSize, unsigned int SortItemsPerThread = 1>
using rocprim::merge_sort_config = detail::merge_sort_config_impl<SortBlockSize, SortItemsPerThread, MergeBlockSize># Configuration of device-level merge primitives.
- Template Parameters:
SortBlockSize – - block size in the block-sort step
SortItemsPerThread – - ItemsPerThread in the block-sort step
MergeBlockSize – - block size in the block merge step
radix_sort#
-
template<unsigned int LongRadixBits, unsigned int ShortRadixBits, class ScanConfig, class SortConfig, class SortSingleConfig = kernel_config<256, 10>, class SortMergeConfig = kernel_config<1024, 1>, unsigned int MergeSizeLimitBlocks = 1024U, bool ForceSingleKernelConfig = false>
struct radix_sort_config# Configuration of device-level radix sort operation.
Radix sort is excecuted in a single tile (at size < BlocksPerItem) or few iterations (passes) depending on total number of bits to be sorted (
begin_bit
andend_bit
), each iteration sorts eitherLongRadixBits
orShortRadixBits
bits choosen to cover whole bit range in optimal way.For example, if
LongRadixBits
is 7,ShortRadixBits
is 6,begin_bit
is 0 andend_bit
is 32 there will be 5 iterations: 7 + 7 + 6 + 6 + 6 = 32 bits.- Template Parameters:
LongRadixBits – - number of bits in long iterations.
ShortRadixBits – - number of bits in short iterations, must be equal to or less than
LongRadixBits
.ScanConfig – - configuration of digits scan kernel. Must be
kernel_config
.SortConfig – - configuration of radix sort kernel. Must be
kernel_config
.
merge_sort#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<KeysInputIterator>::value_type>>
inline hipError_t rocprim::merge_sort(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, const size_t size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)# Parallel merge sort primitive for device level.
merge_sort
function performs a device-wide merge sort of keys. Function sorts input keys based on comparison function.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Accepts custom compare_functions for sorting across the device.
- Example
In this example a device-level ascending merge sort is performed on an array of
float
values.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 float * input; // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7] float * output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size ); // keys_output: [0.08, 0.2, 0.3, 0.4, 0.6, 0.65, 0.7, 1]
- Template Parameters:
KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
compare_function – [in] - binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it. The default value isBinaryFunction()
.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<KeysInputIterator>::value_type>>
inline hipError_t rocprim::merge_sort(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, const size_t size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending merge sort-by-key primitive for device level.
merge_sort
function performs a device-wide merge sort of (key, value) pairs. Function sorts input pairs based on comparison function.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Accepts custom compare_functions for sorting across the device.
- Example
In this example a device-level ascending merge sort is performed where input keys are represented by an array of unsigned integers and input values by an array of
double
s.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 unsigned int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 2, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] unsigned int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size ); // keys_output: [ 1, 2, 3, 4, 5, 6, 7, 8] // values_output: [-1, -2, 2, 3, -4, -5, 7, -8]
- Template Parameters:
KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
values_input – [in] - pointer to the first element in the range to sort.
values_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
compare_function – [in] - binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it. The default value isBinaryFunction()
.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
radix_sort_keys#
ascending#
Warning
doxygenfunction: Unable to resolve function “rocprim::radix_sort_keys” with arguments (void*, size_t&, double_buffer<Key>&, Size, unsigned int, unsigned int, hipStream_t, bool) in doxygen xml output for project “rocPRIM Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-rocprim/checkouts/docs-5.0.2/docs/.doxygen/docBin/xml. Potential matches:
- template<class Config = default_config, class Key> hipError_t radix_sort_keys(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, unsigned int size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)
- template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type> hipError_t radix_sort_keys(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, unsigned int size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)
descending#
Warning
doxygenfunction: Unable to resolve function “rocprim::radix_sort_keys_desc” with arguments (void*, size_t&, double_buffer<Key>&, Size, unsigned int, unsigned int, hipStream_t, bool) in doxygen xml output for project “rocPRIM Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-rocprim/checkouts/docs-5.0.2/docs/.doxygen/docBin/xml. Potential matches:
- template<class Config = default_config, class Key> hipError_t radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, unsigned int size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)
- template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type> hipError_t radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, unsigned int size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)
segmented, ascending#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_keys(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending radix sort primitive for device level.
segmented_radix_sort_keys
function performs a device-wide radix sort across multiple, non-overlapping sequences of keys. Function sorts input keys in ascending order.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.Ranges specified by
begin_offsets
andend_offsets
must have at leastsegments
elements. They may use the same sequenceoffsets
of at leastsegments + 1
elements:offsets
forbegin_offsets
andoffsets + 1
forend_offsets
.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level ascending radix sort is performed on an array of
float
values.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 float * input; // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7] float * output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // keys_output: [0.3, 0.6, 0.65, 0.08, 0.2, 0.4, 0.7, 1]
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
segmented_radix_sort_config
or a custom class with the same members.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
segments – [in] - number of segments in the input range.
begin_offsets – [in] - iterator to the first element in the range of beginning offsets.
end_offsets – [in] - iterator to the first element in the range of ending offsets.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
segmented, descending#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel descending radix sort primitive for device level.
segmented_radix_sort_keys_desc
function performs a device-wide radix sort across multiple, non-overlapping sequences of keys. Function sorts input keys in descending order.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.Ranges specified by
begin_offsets
andend_offsets
must have at leastsegments
elements. They may use the same sequenceoffsets
of at leastsegments + 1
elements:offsets
forbegin_offsets
andoffsets + 1
forend_offsets
.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level descending radix sort is performed on an array of integer values.
#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * input; // e.g., [6, 3, 5, 4, 2, 8, 1, 7] int * output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // keys_output: [6, 3, 5, 8, 7, 4, 2, 1]
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
segmented_radix_sort_config
or a custom class with the same members.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
segments – [in] - number of segments in the input range.
begin_offsets – [in] - iterator to the first element in the range of beginning offsets.
end_offsets – [in] - iterator to the first element in the range of ending offsets.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
radix_sort_pairs#
ascending#
Warning
doxygenfunction: Unable to resolve function “rocprim::radix_sort_pairs” with arguments (void*, size_t&, KeysInputIterator, KeysOutputIterator, ValuesInputIterator, ValuesOutputIterator, Size, unsigned int, unsigned int, hipStream_t, bool) in doxygen xml output for project “rocPRIM Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-rocprim/checkouts/docs-5.0.2/docs/.doxygen/docBin/xml. Potential matches:
- template<class Config = default_config, class Key, class Value> hipError_t radix_sort_pairs(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, unsigned int size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)
- template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type> hipError_t radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, unsigned int size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)
descending#
Warning
doxygenfunction: Unable to resolve function “rocprim::radix_sort_pairs_desc” with arguments (void*, size_t&, KeysInputIterator, KeysOutputIterator, ValuesInputIterator, ValuesOutputIterator, Size, unsigned int, unsigned int, hipStream_t, bool) in doxygen xml output for project “rocPRIM Documentation” from directory: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-rocprim/checkouts/docs-5.0.2/docs/.doxygen/docBin/xml. Potential matches:
- template<class Config = default_config, class Key, class Value> hipError_t radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, double_buffer<Value> &values, unsigned int size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)
- template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type> hipError_t radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, unsigned int size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)
segmented, ascending#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending radix sort-by-key primitive for device level.
segmented_radix_sort_pairs_desc
function performs a device-wide radix sort across multiple, non-overlapping sequences of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input
,keys_output
,values_input
andvalues_output
must have at leastsize
elements.Ranges specified by
begin_offsets
andend_offsets
must have at leastsegments
elements. They may use the same sequenceoffsets
of at leastsegments + 1
elements:offsets
forbegin_offsets
andoffsets + 1
forend_offsets
.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level ascending radix sort is performed where input keys are represented by an array of unsigned integers and input values by an array of
double
s.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 unsigned int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] unsigned int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] // Keys are in range [0; 8], so we can limit compared bit to bits on indexes // 0, 1, 2, 3, and 4. In order to do this begin_bit is set to 0 and end_bit // is set to 5. size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1, 0, 5 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1, 0, 5 ); // keys_output: [3, 6, 5, 1, 1, 4, 7, 8] // values_output: [2, -5, -4, -1, -2, 3, 7, -8]
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
segmented_radix_sort_config
or a custom class with the same members.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
values_input – [in] - pointer to the first element in the range to sort.
values_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
segments – [in] - number of segments in the input range.
begin_offsets – [in] - iterator to the first element in the range of beginning offsets.
end_offsets – [in] - iterator to the first element in the range of ending offsets.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
segmented, ascending#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel descending radix sort-by-key primitive for device level.
segmented_radix_sort_pairs_desc
function performs a device-wide radix sort across multiple, non-overlapping sequences of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input
,keys_output
,values_input
andvalues_output
must have at leastsize
elements.Ranges specified by
begin_offsets
andend_offsets
must have at leastsegments
elements. They may use the same sequenceoffsets
of at leastsegments + 1
elements:offsets
forbegin_offsets
andoffsets + 1
forend_offsets
.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level descending radix sort is performed where input keys are represented by an array of integers and input values by an array of
double
s.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1 ); // keys_output: [ 6, 3, 5, 8, 7, 4, 1, 1] // values_output: [-5, 2, -4, -8, 7, 3, -1, -2]
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
segmented_radix_sort_config
or a custom class with the same members.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
values_input – [in] - pointer to the first element in the range to sort.
values_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
segments – [in] - number of segments in the input range.
begin_offsets – [in] - iterator to the first element in the range of beginning offsets.
end_offsets – [in] - iterator to the first element in the range of ending offsets.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.