Configuring the Kernels#
A kernel config is a way to select the grid/block dimensions, but also
how the data will be fetched and stored (the algorithms used for
store ) for the operations using them (such as
template<unsigned int BlockSize, unsigned int ItemsPerThread, unsigned int SizeLimit = std::numeric_limits<unsigned int>::max()>
struct kernel_config : public rocprim::detail::kernel_config_params#
Configuration of particular kernels launched by device-level operation.
- Template Parameters:
BlockSize – - number of threads in a block.
ItemsPerThread – - number of items processed by each thread.
Subclassed by rocprim::adjacent_difference_config< BlockSize, ItemsPerThread, LoadMethod, StoreMethod, SizeLimit >, rocprim::detail::default_radix_sort_block_sort_config< arch, key_type, value_type, enable >
Setting the configuration is important to better tune the kernel to a given GPU model.
rocPRIM uses a placeholder type to let the macros select the default configuration for
the GPU model
Special type used to show that the given device-level operation will be executed with optimal configuration dependent on types of the function’s parameters and the target device architecture specified by ROCPRIM_TARGET_ARCH. Algorithms supporting dynamic dispatch will ignore ROCPRIM_TARGET_ARCH and launch using optimal configuration based on the target architecture derived from the stream.
To provide information about the GPU you’re targeting, you have to
If the target is not supported by
rocPRIM, the templates will
use the configuration for the model
ROCPRIM_TARGET_TARGET is not defined, it defaults to
which is not supported by
rocPRIM and thus the configurations
will be for the model