Configuring the Kernels

Configuring the Kernels#

A kernel config is a way to select the grid/block dimensions, but also how the data will be fetched and stored (the algorithms used for load/store ) for the operations using them (such as select).

template<unsigned int BlockSize, unsigned int ItemsPerThread, unsigned int SizeLimit = std::numeric_limits<unsigned int>::max()>
struct kernel_config#

Configuration of particular kernels launched by device-level operation.

Template Parameters:
  • BlockSize – - number of threads in a block.

  • ItemsPerThread – - number of items processed by each thread.

Setting the configuration is important to better tune the kernel to a given GPU model. rocPRIM uses a placeholder type to let the macros select the default configuration for the GPU model

struct default_config#

Special type used to show that the given device-level operation will be executed with optimal configuration dependent on types of the function’s parameters and the target device architecture specified by ROCPRIM_TARGET_ARCH.

Warning

To provide information about the GPU you’re targeting, you have to set ROCPRIM_TARGET_ARCH.

If the target is not supported by rocPRIM, the templates will use the configuration for the model 900.

If ROCPRIM_TARGET_TARGET is not defined, it defaults to 0, which is not supported by rocPRIM and thus the configurations will be for the model 900.