Sort#
Configuring the kernel#
merge_sort#
-
template<unsigned int MergeImpl1BlockSize = 512, unsigned int SortBlockSize = MergeImpl1BlockSize, unsigned int SortItemsPerThread = 1, unsigned int MergeImplMPPartitionBlockSize = 128, unsigned int MergeImplMPBlockSize = std::min(SortBlockSize, 128u), unsigned int MergeImplMPItemsPerThread = SortBlockSize * SortItemsPerThread / MergeImplMPBlockSize, unsigned int MinInputSizeMergepath = 200000>
using rocprim::merge_sort_config = detail::merge_sort_config_impl<SortBlockSize, SortItemsPerThread, MergeImpl1BlockSize, MergeImplMPPartitionBlockSize, MergeImplMPBlockSize, MergeImplMPItemsPerThread, MinInputSizeMergepath># Configuration of device-level merge primitives.
- Template Parameters:
SortBlockSize – - block size in the block-sort step
SortItemsPerThread – - ItemsPerThread in the block-sort step
MergeImpl1BlockSize – - block size in the block merge step using impl1 (used when input_size < MinInputSizeMergepath)
MergeImplMPPartitionBlockSize – - block size of the partition kernel in the block merge step using mergepath impl
MergeImplMPBlockSize – - block size in the block merge step using mergepath impl
MergeImplMPItemsPerThread – - ItemsPerThread in the block merge step using mergepath impl
MinInputSizeMergepath – - breakpoint of input-size to use mergepath impl for block merge step
radix_sort#
-
template<unsigned int LongRadixBits, unsigned int ShortRadixBits, class ScanConfig, class SortConfig, class SortSingleConfig = kernel_config<256, 10>, class SortMergeConfig = kernel_config<1024, 1>, unsigned int MergeSizeLimitBlocks = 1024U, bool ForceSingleKernelConfig = false>
struct radix_sort_config# Configuration of device-level radix sort operation.
Radix sort is excecuted in a single tile (at size < BlocksPerItem) or few iterations (passes) depending on total number of bits to be sorted (
begin_bit
andend_bit
), each iteration sorts eitherLongRadixBits
orShortRadixBits
bits choosen to cover whole bit range in optimal way.For example, if
LongRadixBits
is 7,ShortRadixBits
is 6,begin_bit
is 0 andend_bit
is 32 there will be 5 iterations: 7 + 7 + 6 + 6 + 6 = 32 bits.- Template Parameters:
LongRadixBits – - number of bits in long iterations.
ShortRadixBits – - number of bits in short iterations, must be equal to or less than
LongRadixBits
.ScanConfig – - configuration of digits scan kernel. Must be
kernel_config
.SortConfig – - configuration of radix sort kernel. Must be
kernel_config
.
merge_sort#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<KeysInputIterator>::value_type>>
inline hipError_t rocprim::merge_sort(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, const size_t size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)# Parallel merge sort primitive for device level.
merge_sort
function performs a device-wide merge sort of keys. Function sorts input keys based on comparison function.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Accepts custom compare_functions for sorting across the device.
- Example
In this example a device-level ascending merge sort is performed on an array of
float
values.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 float * input; // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7] float * output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size ); // keys_output: [0.08, 0.2, 0.3, 0.4, 0.6, 0.65, 0.7, 1]
- Template Parameters:
KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
compare_function – [in] - binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it. The default value isBinaryFunction()
.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<KeysInputIterator>::value_type>>
inline hipError_t rocprim::merge_sort(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, const size_t size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending merge sort-by-key primitive for device level.
merge_sort
function performs a device-wide merge sort of (key, value) pairs. Function sorts input pairs based on comparison function.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Accepts custom compare_functions for sorting across the device.
- Example
In this example a device-level ascending merge sort is performed where input keys are represented by an array of unsigned integers and input values by an array of
double
s.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 unsigned int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 2, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] unsigned int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::merge_sort( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size ); // keys_output: [ 1, 2, 3, 4, 5, 6, 7, 8] // values_output: [-1, -2, 2, 3, -4, -5, 7, -8]
- Template Parameters:
KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
values_input – [in] - pointer to the first element in the range to sort.
values_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
compare_function – [in] - binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it. The default value isBinaryFunction()
.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
radix_sort_keys#
ascending#
-
template<class Config = default_config, class Key, class Size>
inline hipError_t rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending radix sort primitive for device level.
radix_sort_keys
function performs a device-wide radix sort of keys. Function sorts input keys in ascending order.- Overview
The contents of both buffers of
keys
may be altered by the sorting function.current()
ofkeys
is used as the input.The function will update
current()
ofkeys
to point to the buffer that contains the output range.Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.The function requires small
temporary_storage
as it does not need a temporary buffer ofsize
elements.Key
type must be an arithmetic type (that is, an integral type or a floating-point type).Buffers of
keys
must have at leastsize
elements.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level ascending radix sort is performed on an array of
float
values.#include <rocprim/rocprim.hpp> // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 float * input; // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7] float * tmp; // empty array of 8 elements // Create double-buffer rocprim::double_buffer<float> keys(input, tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size ); // keys.current(): [0.08, 0.2, 0.3, 0.4, 0.6, 0.65, 0.7, 1]
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
radix_sort_config
or a custom class with the same members.Key – - key type. Must be an integral type or a floating-point type.
Size – - integral type that represents the problem size.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys – [inout] - reference to the double-buffer of keys, its
current()
contains the input range and will be updated to point to the output range.size – [in] - number of element in the input range.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
descending#
-
template<class Config = default_config, class Key, class Size>
inline hipError_t rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, double_buffer<Key> &keys, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel descending radix sort primitive for device level.
radix_sort_keys_desc
function performs a device-wide radix sort of keys. Function sorts input keys in descending order.- Overview
The contents of both buffers of
keys
may be altered by the sorting function.current()
ofkeys
is used as the input.The function will update
current()
ofkeys
to point to the buffer that contains the output range.Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.The function requires small
temporary_storage
as it does not need a temporary buffer ofsize
elements.Key
type must be an arithmetic type (that is, an integral type or a floating-point type).Buffers of
keys
must have at leastsize
elements.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level descending radix sort is performed on an array of integer values.
#include <rocprim/rocprim.hpp> // Prepare input and tmp (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * input; // e.g., [6, 3, 5, 4, 2, 8, 1, 7] int * tmp; // empty array of 8 elements // Create double-buffer rocprim::double_buffer<int> keys(input, tmp); size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys, input_size ); // keys.current(): [8, 7, 6, 5, 4, 3, 2, 1]
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
radix_sort_config
or a custom class with the same members.Key – - key type. Must be an integral type or a floating-point type.
Size – - integral type that represents the problem size.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys – [inout] - reference to the double-buffer of keys, its
current()
contains the input range and will be updated to point to the output range.size – [in] - number of element in the input range.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
segmented, ascending#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_keys(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending radix sort primitive for device level.
segmented_radix_sort_keys
function performs a device-wide radix sort across multiple, non-overlapping sequences of keys. Function sorts input keys in ascending order.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.Ranges specified by
begin_offsets
andend_offsets
must have at leastsegments
elements. They may use the same sequenceoffsets
of at leastsegments + 1
elements:offsets
forbegin_offsets
andoffsets + 1
forend_offsets
.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level ascending radix sort is performed on an array of
float
values.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 float * input; // e.g., [0.6, 0.3, 0.65, 0.4, 0.2, 0.08, 1, 0.7] float * output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_keys( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // keys_output: [0.3, 0.6, 0.65, 0.08, 0.2, 0.4, 0.7, 1]
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
segmented_radix_sort_config
or a custom class with the same members.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
segments – [in] - number of segments in the input range.
begin_offsets – [in] - iterator to the first element in the range of beginning offsets.
end_offsets – [in] - iterator to the first element in the range of ending offsets.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
segmented, descending#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel descending radix sort primitive for device level.
segmented_radix_sort_keys_desc
function performs a device-wide radix sort across multiple, non-overlapping sequences of keys. Function sorts input keys in descending order.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input
andkeys_output
must have at leastsize
elements.Ranges specified by
begin_offsets
andend_offsets
must have at leastsegments
elements. They may use the same sequenceoffsets
of at leastsegments + 1
elements:offsets
forbegin_offsets
andoffsets + 1
forend_offsets
.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level descending radix sort is performed on an array of integer values.
#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * input; // e.g., [6, 3, 5, 4, 2, 8, 1, 7] int * output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_keys_desc( temporary_storage_ptr, temporary_storage_size_bytes, input, output, input_size, segments, offsets, offsets + 1 ); // keys_output: [6, 3, 5, 8, 7, 4, 2, 1]
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
segmented_radix_sort_config
or a custom class with the same members.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
segments – [in] - number of segments in the input range.
begin_offsets – [in] - iterator to the first element in the range of beginning offsets.
end_offsets – [in] - iterator to the first element in the range of ending offsets.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
radix_sort_pairs#
ascending#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending radix sort-by-key primitive for device level.
radix_sort_pairs_desc
function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input
,keys_output
,values_input
andvalues_output
must have at leastsize
elements.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level ascending radix sort is performed where input keys are represented by an array of unsigned integers and input values by an array of
double
s.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 unsigned int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] unsigned int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements // Keys are in range [0; 8], so we can limit compared bit to bits on indexes // 0, 1, 2, 3, and 4. In order to do this begin_bit is set to 0 and end_bit // is set to 5. size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, 0, 5 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, 0, 5 ); // keys_output: [ 1, 1, 3, 4, 5, 6, 7, 8] // values_output: [-1, -2, 2, 3, -4, -5, 7, -8]
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
radix_sort_config
or a custom class with the same members.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
Size – - integral type that represents the problem size.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
values_input – [in] - pointer to the first element in the range to sort.
values_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
descending#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class Size, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel descending radix sort-by-key primitive for device level.
radix_sort_pairs_desc
function performs a device-wide radix sort of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input
,keys_output
,values_input
andvalues_output
must have at leastsize
elements.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level descending radix sort is performed where input keys are represented by an array of integers and input values by an array of
double
s.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size ); // keys_output: [ 8, 7, 6, 5, 4, 3, 1, 1] // values_output: [-8, 7, -5, -4, 3, 2, -1, -2]
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
radix_sort_config
or a custom class with the same members.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
Size – - integral type that represents the problem size.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
values_input – [in] - pointer to the first element in the range to sort.
values_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
segmented, ascending#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel ascending radix sort-by-key primitive for device level.
segmented_radix_sort_pairs_desc
function performs a device-wide radix sort across multiple, non-overlapping sequences of (key, value) pairs. Function sorts input pairs in ascending order of keys.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input
,keys_output
,values_input
andvalues_output
must have at leastsize
elements.Ranges specified by
begin_offsets
andend_offsets
must have at leastsegments
elements. They may use the same sequenceoffsets
of at leastsegments + 1
elements:offsets
forbegin_offsets
andoffsets + 1
forend_offsets
.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level ascending radix sort is performed where input keys are represented by an array of unsigned integers and input values by an array of
double
s.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 unsigned int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] unsigned int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] // Keys are in range [0; 8], so we can limit compared bit to bits on indexes // 0, 1, 2, 3, and 4. In order to do this begin_bit is set to 0 and end_bit // is set to 5. size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1, 0, 5 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_pairs( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1, 0, 5 ); // keys_output: [3, 6, 5, 1, 1, 4, 7, 8] // values_output: [2, -5, -4, -1, -2, 3, 7, -8]
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
segmented_radix_sort_config
or a custom class with the same members.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
values_input – [in] - pointer to the first element in the range to sort.
values_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
segments – [in] - number of segments in the input range.
begin_offsets – [in] - iterator to the first element in the range of beginning offsets.
end_offsets – [in] - iterator to the first element in the range of ending offsets.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
segmented, ascending#
-
template<class Config = default_config, class KeysInputIterator, class KeysOutputIterator, class ValuesInputIterator, class ValuesOutputIterator, class OffsetIterator, class Key = typename std::iterator_traits<KeysInputIterator>::value_type>
inline hipError_t rocprim::segmented_radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit = 0, unsigned int end_bit = 8 * sizeof(Key), hipStream_t stream = 0, bool debug_synchronous = false)# Parallel descending radix sort-by-key primitive for device level.
segmented_radix_sort_pairs_desc
function performs a device-wide radix sort across multiple, non-overlapping sequences of (key, value) pairs. Function sorts input pairs in descending order of keys.- Overview
The contents of the inputs are not altered by the sorting function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Key
type (avalue_type
ofKeysInputIterator
andKeysOutputIterator
) must be an arithmetic type (that is, an integral type or a floating-point type).Ranges specified by
keys_input
,keys_output
,values_input
andvalues_output
must have at leastsize
elements.Ranges specified by
begin_offsets
andend_offsets
must have at leastsegments
elements. They may use the same sequenceoffsets
of at leastsegments + 1
elements:offsets
forbegin_offsets
andoffsets + 1
forend_offsets
.If
Key
is an integer type and the range of keys is known in advance, the performance can be improved by settingbegin_bit
andend_bit
, for example if all keys are in range [100, 10000],begin_bit = 0
andend_bit = 14
will cover the whole range.
- Example
In this example a device-level descending radix sort is performed where input keys are represented by an array of integers and input values by an array of
double
s.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * keys_input; // e.g., [ 6, 3, 5, 4, 1, 8, 1, 7] double * values_input; // e.g., [-5, 2, -4, 3, -1, -8, -2, 7] int * keys_output; // empty array of 8 elements double * values_output; // empty array of 8 elements unsigned int segments; // e.g., 3 int * offsets; // e.g. [0, 2, 3, 8] size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::segmented_radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform sort rocprim::segmented_radix_sort_pairs_desc( temporary_storage_ptr, temporary_storage_size_bytes, keys_input, keys_output, values_input, values_output, input_size, segments, offsets, offsets + 1 ); // keys_output: [ 6, 3, 5, 8, 7, 4, 1, 1] // values_output: [-5, 2, -4, -8, 7, 3, -1, -2]
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
segmented_radix_sort_config
or a custom class with the same members.KeysInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
OffsetIterator – - random-access iterator type of segment offsets. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.keys_input – [in] - pointer to the first element in the range to sort.
keys_output – [out] - pointer to the first element in the output range.
values_input – [in] - pointer to the first element in the range to sort.
values_output – [out] - pointer to the first element in the output range.
size – [in] - number of element in the input range.
segments – [in] - number of segments in the input range.
begin_offsets – [in] - iterator to the first element in the range of beginning offsets.
end_offsets – [in] - iterator to the first element in the range of ending offsets.
begin_bit – [in] - [optional] index of the first (least significant) bit used in key comparison. Must be in range
[0; 8 * sizeof(Key))
. Default value:0
. Non-default value not supported for floating-point key-types.end_bit – [in] - [optional] past-the-end index (most significant) bit used in key comparison. Must be in range
(begin_bit; 8 * sizeof(Key)]
. Default value:* sizeof(Key)
. Non-default value not supported for floating-point key-types.stream – [in] - [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.