block_raking_layout< T, BLOCK_THREADS, ARCH > Struct Template Reference

block_raking_layout< T, BLOCK_THREADS, ARCH > Struct Template Reference#

hipCUB: hipcub::block_raking_layout< T, BLOCK_THREADS, ARCH > Struct Template Reference

BlockRakingLayout provides a conflict-free shared memory layout abstraction for 1D raking across thread block data. More...

Classes
struct	TempStorage
	Alias wrapper allowing storage to be unioned. More...

Public Types
enum	{ SHARED_ELEMENTS = BLOCK_THREADS , MAX_RAKING_THREADS = ::rocprim::detail::get_min_warp_size(BLOCK_THREADS, HIPCUB_DEVICE_WARP_THREADS) , SEGMENT_LENGTH = (SHARED_ELEMENTS + MAX_RAKING_THREADS - 1) / MAX_RAKING_THREADS , RAKING_THREADS = (SHARED_ELEMENTS + SEGMENT_LENGTH - 1) / SEGMENT_LENGTH , USE_SEGMENT_PADDING = ((SEGMENT_LENGTH & 1) == 0) && (SEGMENT_LENGTH > 2) , GRID_ELEMENTS = RAKING_THREADS * (SEGMENT_LENGTH + USE_SEGMENT_PADDING) , UNGUARDED = (SHARED_ELEMENTS % RAKING_THREADS == 0) }

Static Public Member Functions
static __device__ T *	PlacementPtr (TempStorage &temp_storage, unsigned int linear_tid)
	Returns the location for the calling thread to place data into the grid.

static __device__ T *	RakingPtr (TempStorage &temp_storage, unsigned int linear_tid)
	Returns the location for the calling thread to begin sequential raking.

BlockRakingLayout provides a conflict-free shared memory layout abstraction for 1D raking across thread block data.

Overview: This type facilitates a shared memory usage pattern where a block of CUDA threads places elements into shared memory and then reduces the active parallelism to one "raking" warp of threads for serially aggregating consecutive sequences of shared items. Padding is inserted to eliminate bank conflicts (for most data types).

Template Parameters

T	The data type to be exchanged.
BLOCK_THREADS	The thread block size in threads.
PTX_ARCH	[optional] \ptxversion

template<typename T , int BLOCK_THREADS, int ARCH = HIPCUB_ARCH>

anonymous enum

Enumerator
SHARED_ELEMENTS	The total number of elements that need to be cooperatively reduced.
MAX_RAKING_THREADS	Maximum number of warp-synchronous raking threads.
SEGMENT_LENGTH	Number of raking elements per warp-synchronous raking thread (rounded up)
RAKING_THREADS	Never use a raking thread that will have no valid data (e.g., when BLOCK_THREADS is 62 and SEGMENT_LENGTH is 2, we should only use 31 raking threads)
USE_SEGMENT_PADDING	Pad each segment length with one element if segment length is not relatively prime to warp size and can't be optimized as a vector load.
GRID_ELEMENTS	Total number of elements in the raking grid.
UNGUARDED	Whether or not we need bounds checking during raking (the number of reduction elements is not a multiple of the number of raking threads)

The documentation for this struct was generated from the following file:

/home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hipcub/checkouts/docs-5.3.3/hipcub/include/hipcub/backend/rocprim/block/block_raking_layout.hpp