Configuring the Kernels#
A kernel config is a way to select the grid/block dimensions, but also
how the data will be fetched and stored (the algorithms used for
load
and store
) for the operations using them (such as select
).
-
template<unsigned int BlockSize, unsigned int ItemsPerThread, unsigned int SizeLimit = std::numeric_limits<unsigned int>::max()>
struct kernel_config : public rocprim::detail::kernel_config_params# Configuration of particular kernels launched by device-level operation.
- Template Parameters:
BlockSize – - number of threads in a block.
ItemsPerThread – - number of items processed by each thread.
Subclassed by rocprim::detail::default_radix_sort_block_sort_config< arch, key_type, value_type, enable >
Setting the configuration is important to better tune the kernel to a given GPU model.
rocPRIM
uses a placeholder type to let the macros select the default configuration for
the GPU model
-
struct default_config#
Special type used to show that the given device-level operation will be executed with optimal configuration dependent on types of the function’s parameters and the target device architecture specified by ROCPRIM_TARGET_ARCH. Algorithms supporting dynamic dispatch will ignore ROCPRIM_TARGET_ARCH and launch using optimal configuration based on the target architecture derived from the stream.
Warning
To provide information about the GPU you’re targeting, you have to
set ROCPRIM_TARGET_ARCH
.
If the target is not supported by rocPRIM
, the templates will
use the configuration for the model 900
.
If ROCPRIM_TARGET_TARGET
is not defined, it defaults to 0
,
which is not supported by rocPRIM
and thus the configurations
will be for the model 900
.