BlockRakingLayout< T, BLOCK_THREADS, ARCH > Struct Template Reference

BlockRakingLayout&lt; T, BLOCK_THREADS, ARCH &gt; Struct Template Reference#

hipCUB: hipcub::BlockRakingLayout< T, BLOCK_THREADS, ARCH > Struct Template Reference
hipcub::BlockRakingLayout< T, BLOCK_THREADS, ARCH > Struct Template Reference

BlockRakingLayout provides a conflict-free shared memory layout abstraction for 1D raking across thread block data. More...

#include <block_raking_layout.hpp>

Classes

struct  TempStorage
 Alias wrapper allowing storage to be unioned. More...
 

Public Types

enum  {
  SHARED_ELEMENTS = BLOCK_THREADS , MAX_RAKING_THREADS , SEGMENT_LENGTH = (SHARED_ELEMENTS + MAX_RAKING_THREADS - 1) / MAX_RAKING_THREADS , RAKING_THREADS = (SHARED_ELEMENTS + SEGMENT_LENGTH - 1) / SEGMENT_LENGTH ,
  USE_SEGMENT_PADDING = ((SEGMENT_LENGTH & 1) == 0) && (SEGMENT_LENGTH > 2) , GRID_ELEMENTS = RAKING_THREADS * (SEGMENT_LENGTH + USE_SEGMENT_PADDING) , UNGUARDED = (SHARED_ELEMENTS % RAKING_THREADS == 0)
}
 

Static Public Member Functions

static __device__ T * PlacementPtr (TempStorage &temp_storage, unsigned int linear_tid)
 Returns the location for the calling thread to place data into the grid.
 
static __device__ T * RakingPtr (TempStorage &temp_storage, unsigned int linear_tid)
 Returns the location for the calling thread to begin sequential raking.
 

Detailed Description

template<typename T, int BLOCK_THREADS, int ARCH = 1>
struct hipcub::BlockRakingLayout< T, BLOCK_THREADS, ARCH >

BlockRakingLayout provides a conflict-free shared memory layout abstraction for 1D raking across thread block data.

Overview
This type facilitates a shared memory usage pattern where a block of CUDA threads places elements into shared memory and then reduces the active parallelism to one "raking" warp of threads for serially aggregating consecutive sequences of shared items. Padding is inserted to eliminate bank conflicts (for most data types).
Template Parameters
TThe data type to be exchanged.
BLOCK_THREADSThe thread block size in threads.
PTX_ARCH[optional] \ptxversion

Member Enumeration Documentation

◆ anonymous enum

template<typename T , int BLOCK_THREADS, int ARCH = 1>
anonymous enum
Enumerator
SHARED_ELEMENTS 

The total number of elements that need to be cooperatively reduced.

MAX_RAKING_THREADS 

Maximum number of warp-synchronous raking threads.

SEGMENT_LENGTH 

Number of raking elements per warp-synchronous raking thread (rounded up)

RAKING_THREADS 

Never use a raking thread that will have no valid data (e.g., when BLOCK_THREADS is 62 and SEGMENT_LENGTH is 2, we should only use 31 raking threads)

USE_SEGMENT_PADDING 

Pad each segment length with one element if segment length is not relatively prime to warp size and can't be optimized as a vector load.

GRID_ELEMENTS 

Total number of elements in the raking grid.

UNGUARDED 

Whether or not we need bounds checking during raking (the number of reduction elements is not a multiple of the number of raking threads)


The documentation for this struct was generated from the following file:
  • /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hipcub/checkouts/docs-5.4.3/hipcub/include/hipcub/backend/rocprim/block/block_raking_layout.hpp