Configuring the Kernels#
A kernel config is a way to select the grid/block dimensions, but also
how the data will be fetched and stored (the algorithms used for
load and store) for the operations using them (such as select).
-
template<unsigned int BlockSize, unsigned int ItemsPerThread, unsigned int SizeLimit = std::numeric_limits<unsigned int>::max()>
struct kernel_config : public rocprim::detail::kernel_config_params# Configuration of particular kernels launched by device-level operation.
- Template Parameters:
BlockSize – - number of threads in a block.
ItemsPerThread – - number of items processed by each thread.
Subclassed by rocprim::detail::default_radix_sort_block_sort_config< arch, key_type, value_type, enable >
Setting the configuration is important to better tune the kernel to a given GPU model.
rocPRIM uses a placeholder type to let the macros select the default configuration for
the GPU model
-
struct default_config#
Special type used to show that the given device-level operation will be executed with optimal configuration dependent on types of the function’s parameters and the target device architecture specified by ROCPRIM_TARGET_ARCH. Algorithms supporting dynamic dispatch will ignore ROCPRIM_TARGET_ARCH and launch using optimal configuration based on the target architecture derived from the stream.
Warning
To provide information about the GPU you’re targeting, you have to
set ROCPRIM_TARGET_ARCH.
If the target is not supported by rocPRIM, the templates will
use the configuration for the model 900.
If ROCPRIM_TARGET_TARGET is not defined, it defaults to 0,
which is not supported by rocPRIM and thus the configurations
will be for the model 900.