Select#
Configuring the kernel#
-
template<unsigned int BlockSize, unsigned int ItemsPerThread, ::rocprim::block_load_method KeyBlockLoadMethod, ::rocprim::block_load_method ValueBlockLoadMethod, ::rocprim::block_load_method FlagBlockLoadMethod, ::rocprim::block_scan_algorithm BlockScanMethod, unsigned int SizeLimit = std::numeric_limits<unsigned int>::max()>
struct select_config# Configuration of device-level select operation.
- Template Parameters:
BlockSize – - number of threads in a block.
ItemsPerThread – - number of items processed by each thread.
KeyBlockLoadMethod – - method for loading input keys.
ValueBlockLoadMethod – - method for loading input values.
FlagBlockLoadMethod – - method for loading flag values.
BlockScanMethod – - algorithm for block scan.
SizeLimit – - limit on the number of items for a single select kernel launch.
select#
-
template<class Config = default_config, class InputIterator, class FlagIterator, class OutputIterator, class SelectedCountOutputIterator>
inline hipError_t rocprim::select(void *temporary_storage, size_t &storage_size, InputIterator input, FlagIterator flags, OutputIterator output, SelectedCountOutputIterator selected_count_output, const size_t size, const hipStream_t stream = 0, const bool debug_synchronous = false)# Parallel select primitive for device level using range of flags.
Performs a device-wide selection based on input
flags. If a value frominputshould be selected and copied intooutputrange the corresponding item fromflagsrange should be set to such value that can be implicitly converted totrue(booltype).- Overview
Returns the required size of
temporary_storageinstorage_sizeiftemporary_storagein a null pointer.Ranges specified by
inputandflagsmust have at leastsizeelements.Range specified by
outputmust have at least so many elements, that all positively flagged values can be copied into it.Range specified by
selected_count_outputmust have at least 1 element.Values of
flagrange should be implicitly convertible tobooltype.
- Example
In this example a device-level select operation is performed on an array of integer values with array of
chars used as flags.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * input; // e.g., [1, 2, 3, 4, 5, 6, 7, 8] char * flags; // e.g., [0, 1, 1, 0, 0, 1, 0, 1] int * output; // empty array of 8 elements size_t * output_count; // empty array of 1 element size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::select( temporary_storage_ptr, temporary_storage_size_bytes, input, flags, output, output_count, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform selection rocprim::select( temporary_storage_ptr, temporary_storage_size_bytes, input, flags, output, output_count, input_size ); // output: [2, 3, 6, 8] // output_count: 4
- Template Parameters:
Config – - [optional] configuration of the primitive. It has to be
select_configor a class derived from it.InputIterator – - random-access iterator type of the input range. It can be a simple pointer type.
FlagIterator – - random-access iterator type of the flag range. It can be a simple pointer type.
OutputIterator – - random-access iterator type of the output range. It can be a simple pointer type.
SelectedCountOutputIterator – - random-access iterator type of the selected_count_output value. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_sizeand function returns without performing the select operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage.input – [in] - iterator to the first element in the range to select values from.
flags – [in] - iterator to the selection flag corresponding to the first element from
inputrange.output – [out] - iterator to the first element in the output range.
selected_count_output – [out] - iterator to the total number of selected values (length of
output).size – [in] - number of element in the input range.
stream – [in] - [optional] HIP stream object. The default is
0(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. The default value is
false.
-
template<class Config = default_config, class InputIterator, class OutputIterator, class SelectedCountOutputIterator, class UnaryPredicate>
inline hipError_t rocprim::select(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, SelectedCountOutputIterator selected_count_output, const size_t size, UnaryPredicate predicate, const hipStream_t stream = 0, const bool debug_synchronous = false)# Parallel select primitive for device level using selection operator.
Performs a device-wide selection using selection operator. If a value
xfrominputshould be selected and copied intooutputrange, thenpredicate(x)has to returntrue.- Overview
Returns the required size of
temporary_storageinstorage_sizeiftemporary_storagein a null pointer.Range specified by
inputmust have at leastsizeelements.Range specified by
outputmust have at least so many elements, that all selected values can be copied into it.Range specified by
selected_count_outputmust have at least 1 element.
- Example
In this example a device-level select operation is performed on an array of integer values, only even values are selected.
#include <rocprim/rocprim.hpp> auto predicate = [] __device__ (int a) -> bool { return (a%2) == 0; }; // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * input; // e.g., [1, 2, 3, 4, 5, 6, 7, 8] int * output; // empty array of 8 elements size_t * output_count; // empty array of 1 element size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::select( temporary_storage_ptr, temporary_storage_size_bytes, input, output, output_count, predicate, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform selection rocprim::select( temporary_storage_ptr, temporary_storage_size_bytes, input, output, output_count, predicate, input_size ); // output: [2, 4, 6, 8] // output_count: 4
- Template Parameters:
Config – - [optional] configuration of the primitive. It has to be
select_configor a class derived from it.InputIterator – - random-access iterator type of the input range. It can be a simple pointer type.
OutputIterator – - random-access iterator type of the output range. It can be a simple pointer type.
SelectedCountOutputIterator – - random-access iterator type of the selected_count_output value. It can be a simple pointer type.
UnaryPredicate – - type of a unary selection predicate.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_sizeand function returns without performing the select operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage.input – [in] - iterator to the first element in the range to select values from.
output – [out] - iterator to the first element in the output range.
selected_count_output – [out] - iterator to the total number of selected values (length of
output).size – [in] - number of element in the input range.
predicate – [in] - unary function object that will be used for selecting values. The signature of the function should be equivalent to the following:
bool f(const T &a);. The signature does not need to haveconst &, but function object must not modify the object passed to it.stream – [in] - [optional] HIP stream object. The default is
0(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. The default value is
false.