Configuring the Kernels#
A kernel config is a way to select the grid/block dimensions, but also
how the data will be fetched and stored (the algorithms used for
load
and store
) for the operations using them (such as select
).
-
template<unsigned int BlockSize, unsigned int ItemsPerThread, unsigned int SizeLimit = std::numeric_limits<unsigned int>::max()>
struct kernel_config : public rocprim::detail::kernel_config_params# Configuration of particular kernels launched by device-level operation.
- Template Parameters:
BlockSize – number of threads in a block.
ItemsPerThread – number of items processed by each thread.
Subclassed by rocprim::detail::default_radix_sort_block_sort_config< arch, key_type, value_type, enable >
Setting the configuration is important to better tune the kernel to a given GPU model.
rocPRIM
uses a placeholder type to let the macros select the default configuration for
the GPU model
-
struct default_config#
Special type used to show that the given device-level operation will be executed with optimal configuration dependent on types of the function’s parameters. With dynamic dispatch algorithms will launch using optimal configuration based on the target architecture derived from the stream.
The default configuration. When used the dynamic dispatch will find an optimal configuration based on the type of the input data and the target architecture of the stream.