Merge#
Configuring the kernel#
Merge#
-
template<unsigned int BlockSize, unsigned int ItemsPerThread>
struct merge_config : public rocprim::detail::merge_config_params# Configuration of device-level merge operation.
- Template Parameters:
BlockSize – number of threads in a block.
ItemsPerThread – number of items processed by each thread per tile.
Subclassed by rocprim::detail::default_merge_config< arch, key_type, value_type, enable >
Merge#
-
template<class Config = default_config, class InputIterator1, class InputIterator2, class OutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<InputIterator1>::value_type>>
inline hipError_t rocprim::merge(void *temporary_storage, size_t &storage_size, InputIterator1 input1, InputIterator2 input2, OutputIterator output, const size_t input1_size, const size_t input2_size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)# Parallel merge primitive for device level.
merge
function performs a device-wide merge. Function merges two ordered sets of input values based on comparison function.- Overview
The contents of the inputs are not altered by the merging function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Accepts custom compare_functions for merging across the device.
- Example
In this example a device-level ascending merge is performed on an array of
int
values.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size1; // e.g., 4 size_t input_size2; // e.g., 4 int * input1; // e.g., [0, 1, 2, 3] int * input2; // e.g., [0, 1, 2, 3] int * output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::merge( temporary_storage_ptr, temporary_storage_size_bytes, input1, input2, output, input_size1, input_size2 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform merge rocprim::merge( temporary_storage_ptr, temporary_storage_size_bytes, input1, input2, output, input_size1, input_size2 ); // output: [0, 0, 1, 1, 2, 2, 3, 3]
- Template Parameters:
Config – [optional] Configuration of the primitive, must be
default_config
ormerge_config
.InputIterator1 – random-access iterator type of the first input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
InputIterator2 – random-access iterator type of the second input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
OutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.input1 – [in] iterator to the first element in the first range to merge.
input2 – [in] iterator to the first element in the second range to merge.
output – [out] iterator to the first element in the output range.
input1_size – [in] number of element in the first input range.
input2_size – [in] number of element in the second input range.
compare_function – [in] binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it. The default value isBinaryFunction()
.stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
-
template<class Config = default_config, class KeysInputIterator1, class KeysInputIterator2, class KeysOutputIterator, class ValuesInputIterator1, class ValuesInputIterator2, class ValuesOutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<KeysInputIterator1>::value_type>>
inline hipError_t rocprim::merge(void *temporary_storage, size_t &storage_size, KeysInputIterator1 keys_input1, KeysInputIterator2 keys_input2, KeysOutputIterator keys_output, ValuesInputIterator1 values_input1, ValuesInputIterator2 values_input2, ValuesOutputIterator values_output, const size_t input1_size, const size_t input2_size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)# Parallel merge primitive for device level.
merge
function performs a device-wide merge of (key, value) pairs. Function merges two ordered sets of input keys and corresponding values based on key comparison function.- Overview
The contents of the inputs are not altered by the merging function.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Accepts custom compare_functions for merging across the device.
- Example
In this example a device-level ascending merge is performed on an array of
int
values.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size1; // e.g., 4 size_t input_size2; // e.g., 4 int * keys_input1; // e.g., [0, 1, 2, 3] int * keys_input2; // e.g., [0, 1, 2, 3] int * keys_output; // empty array of 8 elements int * values_input1; // e.g., [10, 11, 12, 13] int * values_input2; // e.g., [20, 21, 22, 23] int * values_output; // empty array of 8 elements size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::merge( temporary_storage_ptr, temporary_storage_size_bytes, keys_input1, keys_input2, keys_output, values_input1, values_input2, values_output, // input_size1, input_size2 ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform merge rocprim::merge( temporary_storage_ptr, temporary_storage_size_bytes, keys_input1, keys_input2, keys_output, values_input1, values_input2, values_output, // input_size1, input_size2 ); // keys_output: [0, 0, 1, 1, 2, 2, 3, 3] // values_output: [10, 20, 11, 21, 12, 22, 13, 23]
- Template Parameters:
Config – [optional] Configuration of the primitive, must be
default_config
ormerge_config
.KeysInputIterator1 – random-access iterator type of the first keys input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysInputIterator2 – random-access iterator type of the second keys input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
KeysOutputIterator – random-access iterator type of the keys output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
ValuesInputIterator1 – random-access iterator type of the first values input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesInputIterator2 – random-access iterator type of the second values input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
ValuesOutputIterator – random-access iterator type of the values output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the sort operation.storage_size – [inout] reference to a size (in bytes) of
temporary_storage
.keys_input1 – [in] iterator to the first key in the first range to merge.
keys_input2 – [in] iterator to the first key in the second range to merge.
keys_output – [out] iterator to the first key in the output range.
values_input1 – [in] iterator to the first value in the first range to merge.
values_input2 – [in] iterator to the first value in the second range to merge.
values_output – [out] iterator to the first value in the output range.
input1_size – [in] number of element in the first input range.
input2_size – [in] number of element in the second input range.
compare_function – [in] binary operation function object that will be used for key comparison. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);
. The signature does not need to haveconst &
, but function object must not modify the objects passed to it. The default value isBinaryFunction()
.stream – [in] [optional] HIP stream object. Default is
0
(default stream).debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is
false
.
- Returns:
hipSuccess
(0
) after successful sort; otherwise a HIP runtime error of typehipError_t
.
Merge inplace#
-
template<class Config = default_config, class Iterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<Iterator>::value_type>>
inline hipError_t rocprim::merge_inplace(void *temporary_storage, size_t &storage_size, Iterator data, size_t left_size, size_t right_size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)# Parallel merge inplace primitive for device level.
The
merge_inplace
function performs a device-wide merge in place. It merges two ordered sets of input values based on a comparison function using significantly less temporary storage compared tomerge
.- Overview
The function can write intermediate values to the data array while the algorithm is running.
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
is a null pointer.Accepts a custom
compare_function
.
- Example
#include <rocprim/rocprim.hpp> size_t left_size; // e.g. 4 size_t right_size; // e.g. 4 int* data; // e.g. [1, 3, 5, 7, 0, 2, 4, 6] // output: [0, 1, 2, 3, 4, 5, 6, 7] size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; rocprim::merge_inplace( temporary_storage_ptr, temporary_storage_size_bytes, data, left_size, right_size); hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); rocprim::merge_inplace( temporary_storage_ptr, temporary_storage_size_bytes, data, left_size, right_size);
Warning
This function prioritizes temporary storage over speed. In most cases using
merge
and a device copy is significantly faster.Warning
This function uses cooperative groups. Please make sure that the device and platform supports cooperative kernel launches.
- Template Parameters:
Config – Configuration of the primitive, must be
default_config
.Iterator – Random access iterator type for the input and output range. Must meet the requirements of
std::random_access_iterator
.BinaryFunction – Binary function type that is used for the comparison.
- Parameters:
temporary_storage – [in] Pointer to a device-accessible temporary storage. When a null pointer is passed the required allocation size in bytes is written to
storage_size
and the function returnshipSuccess
without performing the merge operation.storage_size – [inout] Reference to size in bytes of
temporary_storage
.data – [inout] Iterator to the first value to merge.
left_size – [in] Number of elements in the first input range.
right_size – [in] Number of elements in the second input range.
compare_function – [in] Binary operation function that will be used for comparison. The signature of the function should be equivalent to the following:
bool f(const T &a, const T &b);
. The signature does not need to haveconst &
, but the function object must not modify the objects passed to it. The default value isBinaryFunction()
.stream – [in] The HIP stream object. Default is
0
(hipDefaultStream
).debug_synchronous – [in] If
true
, forces a device synchronization after every kernel launch in order to check for errors. Default value isfalse
.
- Returns:
hipSuccess
(0
) after succesful merge; otherwise a HIP runtime error of typehipError_t
.