Select#

Configuring the kernel#

template<unsigned int BlockSize, unsigned int ItemsPerThread, ::rocprim::block_load_method ValueBlockLoadMethod, ::rocprim::block_load_method FlagBlockLoadMethod, ::rocprim::block_scan_algorithm BlockScanMethod>
struct select_config#

Configuration of device-level select operation.

Template Parameters:
  • BlockSize – - number of threads in a block.

  • ItemsPerThread – - number of items processed by each thread.

  • ValueBlockLoadMethod – - method for loading input values.

  • FlagBlockLoadMethod – - method for loading flag values.

  • BlockScanMethod – - algorithm for block scan.

select#

template<class Config = default_config, class InputIterator, class FlagIterator, class OutputIterator, class SelectedCountOutputIterator>
inline hipError_t rocprim::select(void *temporary_storage, size_t &storage_size, InputIterator input, FlagIterator flags, OutputIterator output, SelectedCountOutputIterator selected_count_output, const size_t size, const hipStream_t stream = 0, const bool debug_synchronous = false)#

Parallel select primitive for device level using range of flags.

Performs a device-wide selection based on input flags. If a value from input should be selected and copied into output range the corresponding item from flags range should be set to such value that can be implicitly converted to true (bool type).

Overview

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Ranges specified by input and flags must have at least size elements.

  • Range specified by output must have at least so many elements, that all positively flagged values can be copied into it.

  • Range specified by selected_count_output must have at least 1 element.

  • Values of flag range should be implicitly convertible to bool type.

Example

In this example a device-level select operation is performed on an array of integer values with array of chars used as flags.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;     // e.g., 8
int * input;           // e.g., [1, 2, 3, 4, 5, 6, 7, 8]
char * flags;          // e.g., [0, 1, 1, 0, 0, 1, 0, 1]
int * output;          // empty array of 8 elements
size_t * output_count; // empty array of 1 element

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::select(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, flags,
    output, output_count,
    input_size
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform selection
rocprim::select(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, flags,
    output, output_count,
    input_size
);
// output: [2, 3, 6, 8]
// output_count: 4

Template Parameters:
  • Config – - [optional] configuration of the primitive. It can be select_config or a custom class with the same members.

  • InputIterator – - random-access iterator type of the input range. It can be a simple pointer type.

  • FlagIterator – - random-access iterator type of the flag range. It can be a simple pointer type.

  • OutputIterator – - random-access iterator type of the output range. It can be a simple pointer type.

  • SelectedCountOutputIterator – - random-access iterator type of the selected_count_output value. It can be a simple pointer type.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the select operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • input[in] - iterator to the first element in the range to select values from.

  • flags[in] - iterator to the selection flag corresponding to the first element from input range.

  • output[out] - iterator to the first element in the output range.

  • selected_count_output[out] - iterator to the total number of selected values (length of output).

  • size[in] - number of element in the input range.

  • stream[in] - [optional] HIP stream object. The default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. The default value is false.

template<class Config = default_config, class InputIterator, class OutputIterator, class SelectedCountOutputIterator, class UnaryPredicate>
inline hipError_t rocprim::select(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, SelectedCountOutputIterator selected_count_output, const size_t size, UnaryPredicate predicate, const hipStream_t stream = 0, const bool debug_synchronous = false)#

Parallel select primitive for device level using selection operator.

Performs a device-wide selection using selection operator. If a value x from input should be selected and copied into output range, then predicate(x) has to return true.

Overview

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Range specified by input must have at least size elements.

  • Range specified by output must have at least so many elements, that all selected values can be copied into it.

  • Range specified by selected_count_output must have at least 1 element.

Example

In this example a device-level select operation is performed on an array of integer values, only even values are selected.

#include <rocprim/rocprim.hpp>

auto predicate =
    [] __device__ (int a) -> bool
    {
        return (a%2) == 0;
    };

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;     // e.g., 8
int * input;           // e.g., [1, 2, 3, 4, 5, 6, 7, 8]
int * output;          // empty array of 8 elements
size_t * output_count; // empty array of 1 element

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::select(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, output_count,
    predicate, input_size
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform selection
rocprim::select(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, output_count,
    predicate, input_size
);
// output: [2, 4, 6, 8]
// output_count: 4

Template Parameters:
  • Config – - [optional] configuration of the primitive. It can be select_config or a custom class with the same members.

  • InputIterator – - random-access iterator type of the input range. It can be a simple pointer type.

  • OutputIterator – - random-access iterator type of the output range. It can be a simple pointer type.

  • SelectedCountOutputIterator – - random-access iterator type of the selected_count_output value. It can be a simple pointer type.

  • UnaryPredicate – - type of a unary selection predicate.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the select operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • input[in] - iterator to the first element in the range to select values from.

  • output[out] - iterator to the first element in the output range.

  • selected_count_output[out] - iterator to the total number of selected values (length of output).

  • size[in] - number of element in the input range.

  • predicate[in] - unary function object that will be used for selecting values. The signature of the function should be equivalent to the following: bool f(const T &a);. The signature does not need to have const &, but function object must not modify the object passed to it.

  • stream[in] - [optional] HIP stream object. The default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. The default value is false.