Intrinsics#

Bitwise#

__device__ inline int rocprim::get_bit(int x, int i)#

Returns a single bit at ‘i’ from ‘x’.

__device__ inline unsigned int rocprim::bit_count(unsigned int x)#

Bit count.

Returns the number of bit of x set.

__device__ inline unsigned int rocprim::bit_count(unsigned long long x)#

Bit count.

Returns the number of bit of x set.

Warp size#

__host__ __device__ inline constexpr unsigned int rocprim::warp_size()#

[DEPRECATED] Returns a number of threads in a hardware warp.

It is constant for a device. This function is not supported for the gfx1030 architecture and will be removed in a future release. Please use the new host_warp_size() and device_warp_size() functions.

__host__ inline unsigned int rocprim::host_warp_size()#

Returns a number of threads in a hardware warp for the actual device. At host side this constant is available at runtime time only.

It is constant for a device.

__device__ inline constexpr unsigned int rocprim::device_warp_size()#

Returns a number of threads in a hardware warp for the actual target. At device side this constant is available at compile time.

It is constant for a device.

Lane and Warp ID#

__device__ inline unsigned int warp_id()#

Returns warp id in a block (tile).

Returns warp id in a block (tile). Use template parameters to optimize 1D or 2D kernels.

Flat ID#

__device__ inline unsigned int flat_block_thread_id()#

Returns flat (linear, 1D) thread identifier in a multidimensional block (tile).

Returns flat (linear, 1D) thread identifier in a multidimensional block (tile). Use template parameters to optimize 1D or 2D kernels.

__device__ inline unsigned int flat_block_id()#

Returns flat (linear, 1D) block identifier in a multidimensional grid.

Flat Size#

__device__ inline unsigned int rocprim::flat_block_size()#

Returns flat size of a multidimensional block (tile).

__device__ inline unsigned int rocprim::flat_tile_size()#

Returns flat size of a multidimensional tile (block).

Synchronization#

__device__ inline void rocprim::syncthreads()#

Synchronize all threads in a block (tile)

__device__ inline void rocprim::wave_barrier()#

All lanes in a wave come to convergence point simultaneously with SIMT, thus no special instruction is needed in the ISA.

Active threads#

__device__ inline lane_mask_type rocprim::ballot(int predicate)#

Evaluate predicate for all active work-items in the warp and return an integer whose i-th bit is set if and only if predicate is true for the i-th thread of the warp and the i-th thread is active.

Parameters:

predicate – - input to be evaluated for all active lanes

__device__ inline unsigned int rocprim::masked_bit_count(lane_mask_type x, unsigned int add = 0)#

Masked bit count.

For each thread, this function returns the number of active threads which have i-th bit of x set and come before the current thread.