Merge#

Configuring the kernel#

Merge#

template<unsigned int BlockSize, unsigned int ItemsPerThread>
struct merge_config : public rocprim::detail::merge_config_params#

Configuration of device-level merge operation.

Template Parameters:
  • BlockSize – number of threads in a block.

  • ItemsPerThread – number of items processed by each thread per tile.

Subclassed by rocprim::detail::default_merge_config< arch, key_type, value_type, enable >

Merge#

template<class Config = default_config, class InputIterator1, class InputIterator2, class OutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<InputIterator1>::value_type>>
inline hipError_t rocprim::merge(void *temporary_storage, size_t &storage_size, InputIterator1 input1, InputIterator2 input2, OutputIterator output, const size_t input1_size, const size_t input2_size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel merge primitive for device level.

merge function performs a device-wide merge. Function merges two ordered sets of input values based on comparison function.

Overview

  • The contents of the inputs are not altered by the merging function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Accepts custom compare_functions for merging across the device.

Example

In this example a device-level ascending merge is performed on an array of int values.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size1;     // e.g., 4
size_t input_size2;     // e.g., 4
int * input1;           // e.g., [0, 1, 2, 3]
int * input2;           // e.g., [0, 1, 2, 3]
int * output;           // empty array of 8 elements

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::merge(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input1, input2, output, input_size1, input_size2
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform merge
rocprim::merge(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input1, input2, output, input_size1, input_size2
);
// output: [0, 0, 1, 1, 2, 2, 3, 3]

Template Parameters:
  • Config – [optional] Configuration of the primitive, must be default_config or merge_config.

  • InputIterator1 – random-access iterator type of the first input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • InputIterator2 – random-access iterator type of the second input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • OutputIterator – random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • input1[in] iterator to the first element in the first range to merge.

  • input2[in] iterator to the first element in the second range to merge.

  • output[out] iterator to the first element in the output range.

  • input1_size[in] number of element in the first input range.

  • input2_size[in] number of element in the second input range.

  • compare_function[in] binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b);. The signature does not need to have const &, but function object must not modify the objects passed to it. The default value is BinaryFunction().

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

template<class Config = default_config, class KeysInputIterator1, class KeysInputIterator2, class KeysOutputIterator, class ValuesInputIterator1, class ValuesInputIterator2, class ValuesOutputIterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<KeysInputIterator1>::value_type>>
inline hipError_t rocprim::merge(void *temporary_storage, size_t &storage_size, KeysInputIterator1 keys_input1, KeysInputIterator2 keys_input2, KeysOutputIterator keys_output, ValuesInputIterator1 values_input1, ValuesInputIterator2 values_input2, ValuesOutputIterator values_output, const size_t input1_size, const size_t input2_size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel merge primitive for device level.

merge function performs a device-wide merge of (key, value) pairs. Function merges two ordered sets of input keys and corresponding values based on key comparison function.

Overview

  • The contents of the inputs are not altered by the merging function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Accepts custom compare_functions for merging across the device.

Example

In this example a device-level ascending merge is performed on an array of int values.

    #include <rocprim/rocprim.hpp>
   
    // Prepare input and output (declare pointers, allocate device memory etc.)
    size_t input_size1;     // e.g., 4
    size_t input_size2;     // e.g., 4
    int * keys_input1;      // e.g., [0, 1, 2, 3]
    int * keys_input2;      // e.g., [0, 1, 2, 3]
    int * keys_output;      // empty array of 8 elements
    int * values_input1;    // e.g., [10, 11, 12, 13]
    int * values_input2;    // e.g., [20, 21, 22, 23]
    int * values_output;    // empty array of 8 elements
   
    size_t temporary_storage_size_bytes;
    void * temporary_storage_ptr = nullptr;
    // Get required size of the temporary storage
    rocprim::merge(
        temporary_storage_ptr, temporary_storage_size_bytes,
        keys_input1, keys_input2, keys_output,
        values_input1, values_input2, values_output,
//      input_size1, input_size2
    );
   
    // allocate temporary storage
    hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);
   
    // perform merge
    rocprim::merge(
        temporary_storage_ptr, temporary_storage_size_bytes,
        keys_input1, keys_input2, keys_output,
        values_input1, values_input2, values_output,
//      input_size1, input_size2
    );
    // keys_output: [0, 0, 1, 1, 2, 2, 3, 3]
    // values_output: [10, 20, 11, 21, 12, 22, 13, 23]

Template Parameters:
  • Config – [optional] Configuration of the primitive, must be default_config or merge_config.

  • KeysInputIterator1 – random-access iterator type of the first keys input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysInputIterator2 – random-access iterator type of the second keys input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • KeysOutputIterator – random-access iterator type of the keys output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • ValuesInputIterator1 – random-access iterator type of the first values input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • ValuesInputIterator2 – random-access iterator type of the second values input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • ValuesOutputIterator – random-access iterator type of the values output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

Parameters:
  • temporary_storage[in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the sort operation.

  • storage_size[inout] reference to a size (in bytes) of temporary_storage.

  • keys_input1[in] iterator to the first key in the first range to merge.

  • keys_input2[in] iterator to the first key in the second range to merge.

  • keys_output[out] iterator to the first key in the output range.

  • values_input1[in] iterator to the first value in the first range to merge.

  • values_input2[in] iterator to the first value in the second range to merge.

  • values_output[out] iterator to the first value in the output range.

  • input1_size[in] number of element in the first input range.

  • input2_size[in] number of element in the second input range.

  • compare_function[in] binary operation function object that will be used for key comparison. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b);. The signature does not need to have const &, but function object must not modify the objects passed to it. The default value is BinaryFunction().

  • stream[in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful sort; otherwise a HIP runtime error of type hipError_t.

Merge inplace#

template<class Config = default_config, class Iterator, class BinaryFunction = ::rocprim::less<typename std::iterator_traits<Iterator>::value_type>>
inline hipError_t rocprim::merge_inplace(void *temporary_storage, size_t &storage_size, Iterator data, size_t left_size, size_t right_size, BinaryFunction compare_function = BinaryFunction(), const hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel merge inplace primitive for device level.

The merge_inplace function performs a device-wide merge in place. It merges two ordered sets of input values based on a comparison function using significantly less temporary storage compared to merge.

Overview

  • The function can write intermediate values to the data array while the algorithm is running.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • Accepts a custom compare_function.

Example
#include <rocprim/rocprim.hpp>

size_t left_size;  // e.g. 4
size_t right_size; // e.g. 4
int*   data;       // e.g. [1, 3, 5, 7, 0, 2, 4, 6]
// output: [0, 1, 2, 3, 4, 5, 6, 7]

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;

rocprim::merge_inplace(
    temporary_storage_ptr,
    temporary_storage_size_bytes,
    data,
    left_size,
    right_size);

hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

rocprim::merge_inplace(
    temporary_storage_ptr,
    temporary_storage_size_bytes,
    data,
    left_size,
    right_size);

Warning

This function prioritizes temporary storage over speed. In most cases using merge and a device copy is significantly faster.

Warning

This function uses cooperative groups. Please make sure that the device and platform supports cooperative kernel launches.

Template Parameters:
  • Config – Configuration of the primitive, must be default_config.

  • Iterator – Random access iterator type for the input and output range. Must meet the requirements of std::random_access_iterator.

  • BinaryFunction – Binary function type that is used for the comparison.

Parameters:
  • temporary_storage[in] Pointer to a device-accessible temporary storage. When a null pointer is passed the required allocation size in bytes is written to storage_size and the function returns hipSuccess without performing the merge operation.

  • storage_size[inout] Reference to size in bytes of temporary_storage.

  • data[inout] Iterator to the first value to merge.

  • left_size[in] Number of elements in the first input range.

  • right_size[in] Number of elements in the second input range.

  • compare_function[in] Binary operation function that will be used for comparison. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b);. The signature does not need to have const &, but the function object must not modify the objects passed to it. The default value is BinaryFunction().

  • stream[in] The HIP stream object. Default is 0 (hipDefaultStream).

  • debug_synchronous[in] If true, forces a device synchronization after every kernel launch in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after succesful merge; otherwise a HIP runtime error of type hipError_t.