Shuffle#
- 
template<class T>
 __device__ inline T rocprim::warp_shuffle(const T &input, const int src_lane, const int width = device_warp_size())#
- Shuffle for any data type. - Each thread in warp obtains - inputfrom- src_lane-th thread in warp. If- widthis less than device_warp_size() then each subsection of the warp behaves as a separate entity with a starting logical lane id of 0. If- src_laneis not in [0;- width) range, the returned value is equal to- inputpassed by the- src_lane modulo widththread.- Note: The optional - widthparameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than device_warp_size().- Parameters:
- input – - input to pass to other threads 
- src_lane – - warp if of a thread whose - inputshould be returned
- width – - logical warp width 
 
 
- 
template<class T>
 __device__ inline T rocprim::warp_shuffle_down(const T &input, const unsigned int delta, const int width = device_warp_size())#
- Shuffle down for any data type. - i-th thread in warp obtains- inputfrom- i+delta-th thread in warp. If- is not in [0;- width) range, thread’s own- inputis returned.- Note: The optional - widthparameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than device_warp_size().- Parameters:
- input – - input to pass to other threads 
- delta – - offset for calculating source lane id 
- width – - logical warp width 
 
 
- 
template<class T>
 __device__ inline T rocprim::warp_shuffle_xor(const T &input, const int lane_mask, const int width = device_warp_size())#
- Shuffle XOR for any data type. - i-th thread in warp obtains- inputfrom- i^lane_mask-th thread in warp.- Note: The optional - widthparameter must be a power of 2; results are undefined if it is not a power of 2, or it is greater than device_warp_size().- Parameters:
- input – - input to pass to other threads 
- lane_mask – - mask used for calculating source lane id 
- width – - logical warp width