Shuffle#

template<class T>
__device__ inline T rocprim::warp_shuffle(const T &input, const int src_lane, const int width = device_warp_size())#

Shuffle for any data type.

Each thread in warp obtains input from src_lane-th thread in warp. If width is less than device_warp_size() then each subsection of the warp behaves as a separate entity with a starting logical lane id of 0. If src_lane is not in [0; width) range, the returned value is equal to input passed by the src_lane modulo width thread.

Note: The optional width parameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than device_warp_size().

Parameters
  • input – - input to pass to other threads

  • src_lane – - warp if of a thread whose input should be returned

  • width – - logical warp width

template<class T>
__device__ inline T rocprim::warp_shuffle_down(const T &input, const unsigned int delta, const int width = device_warp_size())#

Shuffle down for any data type.

i-th thread in warp obtains input from i+delta-th thread in warp. If is not in [0; width) range, thread’s own input is returned.

Note: The optional width parameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than device_warp_size().

Parameters
  • input – - input to pass to other threads

  • delta – - offset for calculating source lane id

  • width – - logical warp width

template<class T>
__device__ inline T rocprim::warp_shuffle_xor(const T &input, const int lane_mask, const int width = device_warp_size())#

Shuffle XOR for any data type.

i-th thread in warp obtains input from i^lane_mask-th thread in warp.

Note: The optional width parameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than device_warp_size().

Parameters
  • input – - input to pass to other threads

  • lane_mask – - mask used for calculating source lane id

  • width – - logical warp width