Intrinsics#
Bitwise#
-
__device__ inline int rocprim::get_bit(int x, int i)#
Returns a single bit at ‘i’ from ‘x’.
-
__device__ inline unsigned int rocprim::bit_count(unsigned int x)#
Bit count.
Returns the number of bit of
x
set.
-
__device__ inline unsigned int rocprim::bit_count(unsigned long long x)#
Bit count.
Returns the number of bit of
x
set.
Warp size#
-
__host__ __device__ inline constexpr unsigned int rocprim::warp_size()#
[DEPRECATED] Returns a number of threads in a hardware warp.
It is constant for a device. This function is not supported for the gfx1030 architecture and will be removed in a future release. Please use the new host_warp_size() and device_warp_size() functions.
-
__host__ inline unsigned int rocprim::host_warp_size()#
Returns a number of threads in a hardware warp for the actual device. At host side this constant is available at runtime time only.
It is constant for a device.
-
__device__ inline constexpr unsigned int rocprim::device_warp_size()#
Returns a number of threads in a hardware warp for the actual target. At device side this constant is available at compile time.
It is constant for a device.
Lane and Warp ID#
-
__device__ inline unsigned int warp_id()#
Returns warp id in a block (tile).
Returns warp id in a block (tile). Use template parameters to optimize 1D or 2D kernels.
Flat ID#
-
__device__ inline unsigned int flat_block_thread_id()#
Returns flat (linear, 1D) thread identifier in a multidimensional block (tile).
Returns flat (linear, 1D) thread identifier in a multidimensional block (tile). Use template parameters to optimize 1D or 2D kernels.
-
__device__ inline unsigned int flat_block_id()#
Returns flat (linear, 1D) block identifier in a multidimensional grid.
Flat Size#
-
__device__ inline unsigned int rocprim::flat_block_size()#
Returns flat size of a multidimensional block (tile).
-
__device__ inline unsigned int rocprim::flat_tile_size()#
Returns flat size of a multidimensional tile (block).
Synchronization#
-
__device__ inline void rocprim::syncthreads()#
Synchronize all threads in a block (tile)
-
__device__ inline void rocprim::wave_barrier()#
All lanes in a wave come to convergence point simultaneously with SIMT, thus no special instruction is needed in the ISA.
Active threads#
-
__device__ inline lane_mask_type rocprim::ballot(int predicate)#
Evaluate predicate for all active work-items in the warp and return an integer whose
i
-th bit is set if and only ifpredicate
istrue
for thei
-th thread of the warp and thei
-th thread is active.- Parameters:
predicate – - input to be evaluated for all active lanes
-
__device__ inline unsigned int rocprim::masked_bit_count(lane_mask_type x, unsigned int add = 0)#
Masked bit count.
For each thread, this function returns the number of active threads which have
i
-th bit ofx
set and come before the current thread.