Configuring the Kernels#
A kernel config is a way to select the grid/block dimensions, but also
how the data will be fetched and stored (the algorithms used for
store ) for the operations using them (such as
template<unsigned int BlockSize, unsigned int ItemsPerThread, unsigned int SizeLimit = std::numeric_limits<unsigned int>::max()>
Configuration of particular kernels launched by device-level operation.
- Template Parameters:
BlockSize – - number of threads in a block.
ItemsPerThread – - number of items processed by each thread.
Setting the configuration is important to better tune the kernel to a given GPU model.
rocPRIM uses a placeholder type to let the macros select the default configuration for
the GPU model
Special type used to show that the given device-level operation will be executed with optimal configuration dependent on types of the function’s parameters and the target device architecture specified by ROCPRIM_TARGET_ARCH.
To provide information about the GPU you’re targeting, you have to
If the target is not supported by
rocPRIM, the templates will
use the configuration for the model
ROCPRIM_TARGET_TARGET is not defined, it defaults to
which is not supported by
rocPRIM and thus the configurations
will be for the model