Select#
Configuring the kernel#
-
template<unsigned int BlockSize, unsigned int ItemsPerThread, ::rocprim::block_load_method KeyBlockLoadMethod, ::rocprim::block_load_method ValueBlockLoadMethod, ::rocprim::block_load_method FlagBlockLoadMethod, ::rocprim::block_scan_algorithm BlockScanMethod, unsigned int SizeLimit = std::numeric_limits<unsigned int>::max()>
struct select_config# Configuration of device-level select operation.
- Template Parameters:
BlockSize – - number of threads in a block.
ItemsPerThread – - number of items processed by each thread.
KeyBlockLoadMethod – - method for loading input keys.
ValueBlockLoadMethod – - method for loading input values.
FlagBlockLoadMethod – - method for loading flag values.
BlockScanMethod – - algorithm for block scan.
SizeLimit – - limit on the number of items for a single select kernel launch.
select#
-
template<class Config = default_config, class InputIterator, class FlagIterator, class OutputIterator, class SelectedCountOutputIterator>
inline hipError_t rocprim::select(void *temporary_storage, size_t &storage_size, InputIterator input, FlagIterator flags, OutputIterator output, SelectedCountOutputIterator selected_count_output, const size_t size, const hipStream_t stream = 0, const bool debug_synchronous = false)# Parallel select primitive for device level using range of flags.
Performs a device-wide selection based on input
flags
. If a value frominput
should be selected and copied intooutput
range the corresponding item fromflags
range should be set to such value that can be implicitly converted totrue
(bool
type).- Overview
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Ranges specified by
input
andflags
must have at leastsize
elements.Range specified by
output
must have at least so many elements, that all positively flagged values can be copied into it.Range specified by
selected_count_output
must have at least 1 element.Values of
flag
range should be implicitly convertible tobool
type.
- Example
In this example a device-level select operation is performed on an array of integer values with array of
char
s used as flags.#include <rocprim/rocprim.hpp> // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * input; // e.g., [1, 2, 3, 4, 5, 6, 7, 8] char * flags; // e.g., [0, 1, 1, 0, 0, 1, 0, 1] int * output; // empty array of 8 elements size_t * output_count; // empty array of 1 element size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::select( temporary_storage_ptr, temporary_storage_size_bytes, input, flags, output, output_count, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform selection rocprim::select( temporary_storage_ptr, temporary_storage_size_bytes, input, flags, output, output_count, input_size ); // output: [2, 3, 6, 8] // output_count: 4
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
select_config
or a custom class with the same members.InputIterator – - random-access iterator type of the input range. It can be a simple pointer type.
FlagIterator – - random-access iterator type of the flag range. It can be a simple pointer type.
OutputIterator – - random-access iterator type of the output range. It can be a simple pointer type.
SelectedCountOutputIterator – - random-access iterator type of the selected_count_output value. It can be a simple pointer type.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the select operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.input – [in] - iterator to the first element in the range to select values from.
flags – [in] - iterator to the selection flag corresponding to the first element from
input
range.output – [out] - iterator to the first element in the output range.
selected_count_output – [out] - iterator to the total number of selected values (length of
output
).size – [in] - number of element in the input range.
stream – [in] - [optional] HIP stream object. The default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. The default value is
false
.
-
template<class Config = default_config, class InputIterator, class OutputIterator, class SelectedCountOutputIterator, class UnaryPredicate>
inline hipError_t rocprim::select(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, SelectedCountOutputIterator selected_count_output, const size_t size, UnaryPredicate predicate, const hipStream_t stream = 0, const bool debug_synchronous = false)# Parallel select primitive for device level using selection operator.
Performs a device-wide selection using selection operator. If a value
x
frominput
should be selected and copied intooutput
range, thenpredicate(x)
has to returntrue
.- Overview
Returns the required size of
temporary_storage
instorage_size
iftemporary_storage
in a null pointer.Range specified by
input
must have at leastsize
elements.Range specified by
output
must have at least so many elements, that all selected values can be copied into it.Range specified by
selected_count_output
must have at least 1 element.
- Example
In this example a device-level select operation is performed on an array of integer values, only even values are selected.
#include <rocprim/rocprim.hpp> auto predicate = [] __device__ (int a) -> bool { return (a%2) == 0; }; // Prepare input and output (declare pointers, allocate device memory etc.) size_t input_size; // e.g., 8 int * input; // e.g., [1, 2, 3, 4, 5, 6, 7, 8] int * output; // empty array of 8 elements size_t * output_count; // empty array of 1 element size_t temporary_storage_size_bytes; void * temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::select( temporary_storage_ptr, temporary_storage_size_bytes, input, output, output_count, predicate, input_size ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform selection rocprim::select( temporary_storage_ptr, temporary_storage_size_bytes, input, output, output_count, predicate, input_size ); // output: [2, 4, 6, 8] // output_count: 4
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
select_config
or a custom class with the same members.InputIterator – - random-access iterator type of the input range. It can be a simple pointer type.
OutputIterator – - random-access iterator type of the output range. It can be a simple pointer type.
SelectedCountOutputIterator – - random-access iterator type of the selected_count_output value. It can be a simple pointer type.
UnaryPredicate – - type of a unary selection predicate.
- Parameters:
temporary_storage – [in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the select operation.storage_size – [inout] - reference to a size (in bytes) of
temporary_storage
.input – [in] - iterator to the first element in the range to select values from.
output – [out] - iterator to the first element in the output range.
selected_count_output – [out] - iterator to the total number of selected values (length of
output
).size – [in] - number of element in the input range.
predicate – [in] - unary function object that will be used for selecting values. The signature of the function should be equivalent to the following:
bool f(const T &a);
. The signature does not need to haveconst &
, but function object must not modify the object passed to it.stream – [in] - [optional] HIP stream object. The default is
0
(default stream).debug_synchronous – [in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. The default value is
false
.