Shuffle#
-
template<class T>
__device__ inline T rocprim::warp_shuffle(const T &input, const int src_lane, const int width = device_warp_size())# Shuffle for any data type.
Each thread in warp obtains
inputfromsrc_lane-th thread in warp. Ifwidthis less than device_warp_size() then each subsection of the warp behaves as a separate entity with a starting logical lane id of 0. Ifsrc_laneis not in [0;width) range, the returned value is equal toinputpassed by thesrc_lane modulo widththread.Note: The optional
widthparameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than device_warp_size().- Parameters:
input – - input to pass to other threads
src_lane – - warp if of a thread whose
inputshould be returnedwidth – - logical warp width
-
template<class T>
__device__ inline T rocprim::warp_shuffle_down(const T &input, const unsigned int delta, const int width = device_warp_size())# Shuffle down for any data type.
i-th thread in warp obtainsinputfromi+delta-th thread in warp. Ifis not in [0;width) range, thread’s owninputis returned.Note: The optional
widthparameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than device_warp_size().- Parameters:
input – - input to pass to other threads
delta – - offset for calculating source lane id
width – - logical warp width
-
template<class T>
__device__ inline T rocprim::warp_shuffle_xor(const T &input, const int lane_mask, const int width = device_warp_size())# Shuffle XOR for any data type.
i-th thread in warp obtainsinputfromi^lane_mask-th thread in warp.Note: The optional
widthparameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than device_warp_size().- Parameters:
input – - input to pass to other threads
lane_mask – - mask used for calculating source lane id
width – - logical warp width