Configuring the Kernels

Configuring the Kernels#

A kernel config is a way to select the grid/block dimensions, but also how the data will be fetched and stored (the algorithms used for load and store) for the operations using them (such as select).

template<unsigned int BlockSize, unsigned int ItemsPerThread, unsigned int SizeLimit = std::numeric_limits<unsigned int>::max()>
struct kernel_config : public rocprim::detail::kernel_config_params#

Configuration of particular kernels launched by device-level operation.

Template Parameters:
  • BlockSize – - number of threads in a block.

  • ItemsPerThread – - number of items processed by each thread.

Subclassed by rocprim::detail::default_radix_sort_block_sort_config< arch, key_type, value_type, enable >

Setting the configuration is important to better tune the kernel to a given GPU model. rocPRIM uses a placeholder type to let the macros select the default configuration for the GPU model

struct default_config#

Special type used to show that the given device-level operation will be executed with optimal configuration dependent on types of the function’s parameters and the target device architecture specified by ROCPRIM_TARGET_ARCH. Algorithms supporting dynamic dispatch will ignore ROCPRIM_TARGET_ARCH and launch using optimal configuration based on the target architecture derived from the stream.

Warning

To provide information about the GPU you’re targeting, you have to set ROCPRIM_TARGET_ARCH.

If the target is not supported by rocPRIM, the templates will use the configuration for the model 900.

If ROCPRIM_TARGET_TARGET is not defined, it defaults to 0, which is not supported by rocPRIM and thus the configurations will be for the model 900.