Shuffle#
-
template<class T>
__device__ inline T rocprim::warp_shuffle(const T &input, const int src_lane, const int width = arch::wavefront::min_size())# Shuffle for any data type.
Each thread in warp obtains
inputfromsrc_lane-th thread in warp. Ifwidthis less than arch::wavefront::min_size() then each subsection of the warp behaves as a separate entity with a starting logical lane id of 0. Ifsrc_laneis not in [0;width) range, the returned value is equal toinputpassed by thesrc_lane modulo widththread.Note: The optional
widthparameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than arch::wavefront::min_size().- Parameters:
input – input to pass to other threads
src_lane – warp if of a thread whose
inputshould be returnedwidth – logical warp width
-
template<class T>
__device__ inline T rocprim::warp_shuffle_down(const T &input, const unsigned int delta, const int width = arch::wavefront::min_size())# Shuffle down for any data type.
i-th thread in warp obtainsinputfromi+delta-th thread in warp. Ifis not in [0;width) range, thread’s owninputis returned.Note: The optional
widthparameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than arch::wavefront::min_size().- Parameters:
input – input to pass to other threads
delta – offset for calculating source lane id
width – logical warp width
-
template<class T>
__device__ inline T rocprim::warp_shuffle_xor(const T &input, const int lane_mask, const int width = arch::wavefront::min_size())# Shuffle XOR for any data type.
i-th thread in warp obtainsinputfromi^lane_mask-th thread in warp.Note: The optional
widthparameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than arch::wavefront::min_size().- Parameters:
input – input to pass to other threads
lane_mask – mask used for calculating source lane id
width – logical warp width