Chipcub::ArgMax | |
Chipcub::ArgMin | |
►Chipcub::BaseDigitExtractor< KeyT > | Base struct for digit extractor. Contains common code to provide special handling for floating-point -0.0 |
Chipcub::BFEDigitExtractor< KeyT > | A wrapper type to extract digits. Uses the BFE intrinsic to extract a key from a digit |
Chipcub::ShiftDigitExtractor< KeyT > | A wrapper type to extract digits. Uses a combination of shift and bitwise and to extract digits |
Chipcub::BinaryFlip< BinaryOpT > | |
►Crocprim::block_adjacent_difference | |
Chipcub::BlockAdjacentDifference< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, ARCH > | |
►Crocprim::block_discontinuity | |
Chipcub::BlockDiscontinuity< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, ARCH > | |
►Crocprim::block_exchange | |
Chipcub::BlockExchange< InputT, BLOCK_DIM_X, ITEMS_PER_THREAD, WARP_TIME_SLICING, BLOCK_DIM_Y, BLOCK_DIM_Z, ARCH > | |
►Crocprim::block_histogram | |
Chipcub::BlockHistogram< T, BLOCK_DIM_X, ITEMS_PER_THREAD, BINS, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, ARCH > | |
►Crocprim::block_load | |
Chipcub::BlockLoad< T, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, ARCH > | |
►Crocprim::block_radix_rank | |
Chipcub::BlockRadixRank< BLOCK_DIM_X, RADIX_BITS, IS_DESCENDING, MEMOIZE_OUTER_SCAN, INNER_SCAN_ALGORITHM, SMEM_CONFIG, BLOCK_DIM_Y, BLOCK_DIM_Z, ARCH > | BlockRadixRank provides operations for ranking unsigned integer types within a CUDA thread block |
Chipcub::BlockRadixRankMatch< BLOCK_DIM_X, RADIX_BITS, IS_DESCENDING, INNER_SCAN_ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, ARCH > | |
►Crocprim::block_radix_sort | |
Chipcub::BlockRadixSort< KeyT, BLOCK_DIM_X, ITEMS_PER_THREAD, ValueT, RADIX_BITS, MEMOIZE_OUTER_SCAN, INNER_SCAN_ALGORITHM, SMEM_CONFIG, BLOCK_DIM_Y, BLOCK_DIM_Z, PTX_ARCH > | |
►Crocprim::block_reduce | |
CBlockReduce< T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, ARCH > | |
►Crocprim::block_scan | |
Chipcub::BlockScan< T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, ARCH > | |
►Crocprim::block_shuffle | |
Chipcub::BlockShuffle< T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, ARCH > | |
►Crocprim::block_store | |
Chipcub::BlockStore< T, BLOCK_DIM_X, ITEMS_PER_THREAD, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, ARCH > | |
Chipcub::CachingDeviceAllocator::BlockDescriptor | |
Chipcub::BlockMergeSortStrategy< KeyT, ValueT, NUM_THREADS, ITEMS_PER_THREAD, SynchronizationPolicy > | Generalized merge sort algorithm |
►Chipcub::BlockMergeSortStrategy< KeyT, NullType, ::rocprim::device_warp_size(), ITEMS_PER_THREAD, WarpMergeSort< KeyT, ITEMS_PER_THREAD, ::rocprim::device_warp_size(), NullType, 1 > > | |
Chipcub::WarpMergeSort< KeyT, ITEMS_PER_THREAD, LOGICAL_WARP_THREADS, ValueT, PTX_ARCH > | The WarpMergeSort class provides methods for sorting items partitioned across a CUDA warp using a merge sorting method |
►Chipcub::BlockMergeSortStrategy< KeyT, NullType, BLOCK_DIM_X *1 *1, ITEMS_PER_THREAD, BlockMergeSort< KeyT, BLOCK_DIM_X, ITEMS_PER_THREAD, NullType, 1, 1 > > | |
Chipcub::BlockMergeSort< KeyT, BLOCK_DIM_X, ITEMS_PER_THREAD, ValueT, BLOCK_DIM_Y, BLOCK_DIM_Z > | The BlockMergeSort class provides methods for sorting items partitioned across a CUDA thread block using a merge sorting method |
Chipcub::BlockRakingLayout< T, BLOCK_THREADS, ARCH > | BlockRakingLayout provides a conflict-free shared memory layout abstraction for 1D raking across thread block data |
Chipcub::BlockRunLengthDecode< ItemT, BLOCK_DIM_X, RUNS_PER_THREAD, DECODED_ITEMS_PER_THREAD, DecodedOffsetT, BLOCK_DIM_Y, BLOCK_DIM_Z > | The BlockRunLengthDecode class supports decoding a run-length encoded array of items. That is, given the two arrays run_value[N] and run_lengths[N], run_value[i] is repeated run_lengths[i] many times in the output array. Due to the nature of the run-length decoding algorithm ("decompression"), the output size of the run-length decoded array is runtime-dependent and potentially without any upper bound. To address this, BlockRunLengthDecode allows retrieving a "window" from the run-length decoded array. The window's offset can be specified and BLOCK_THREADS * DECODED_ITEMS_PER_THREAD (i.e., referred to as window_size) decoded items from the specified window will be returned |
Chipcub::CacheModifiedInputIterator< MODIFIER, ValueType, OffsetT > | |
Chipcub::CacheModifiedOutputIterator< MODIFIER, ValueType, OffsetT > | |
►Ccub::CachingDeviceAllocator | |
Chipcub::CachingDeviceAllocator | |
Chipcub::CastOp< B > | |
Chipcub::DeviceAdjacentDifference | |
Chipcub::DeviceHistogram | |
Chipcub::DeviceMergeSort | |
Chipcub::DevicePartition | |
Chipcub::DeviceRadixSort | |
Chipcub::DeviceReduce | |
Chipcub::DeviceRunLengthEncode | |
Chipcub::DeviceScan | |
Chipcub::DeviceSegmentedRadixSort | |
Chipcub::DeviceSegmentedReduce | |
Chipcub::DeviceSegmentedSort | |
Chipcub::DeviceSelect | |
Chipcub::DeviceSpmv | |
Chipcub::Difference | |
Chipcub::DiscardOutputIterator< OffsetT > | A discard iterator |
Chipcub::Division | |
Chipcub::DoubleBuffer< T > | |
Chipcub::Equality | |
►Chipcub::GridBarrier | GridBarrier implements a software global barrier among thread blocks within a hip grid |
Chipcub::GridBarrierLifetime | GridBarrierLifetime extends GridBarrier to provide lifetime management of the temporary device storage needed for cooperation |
►Ccub::GridBarrierLifetime | |
Chipcub::GridBarrierLifetime | GridBarrierLifetime extends GridBarrier to provide lifetime management of the temporary device storage needed for cooperation |
Chipcub::GridEvenShare< OffsetT > | GridEvenShare is a descriptor utility for distributing input among CUDA thread blocks in an "even-share" fashion. Each thread block gets roughly the same number of input tiles |
Chipcub::GridQueue< OffsetT > | GridQueue is a descriptor utility for dynamic queue management |
Chipcub::If< B, T, F > | |
Chipcub::Inequality | |
Chipcub::InequalityWrapper< EqualityOp > | |
Chipcub::Int2Type< A > | |
Chipcub::IsPointer< T > | |
Chipcub::IsVolatile< T > | |
Chipcub::Log2< N > | |
Chipcub::Max | |
Chipcub::Min | |
Chipcub::PowerOfTwo< N > | |
Chipcub::RadixSortTwiddle< IS_DESCENDING, KeyT > | Twiddling keys for radix sort |
Chipcub::ReduceByKeyOp< ReductionOpT > | |
Chipcub::ReduceBySegmentOp< ReductionOpT > | |
Chipcub::RemoveQualifiers< T > | |
Chipcub::DeviceSpmv::SpmvParams< ValueT, OffsetT > | < Signed integer type for sequence offsets |
Chipcub::Sum | |
Chipcub::SwizzleScanOp< ScanOp > | |
►Crocprim::texture_cache_iterator | |
Chipcub::TexRefInputIterator< ValueT, 66778899, OffsetT > | |
Chipcub::TexObjInputIterator< T, OffsetT > | |
Chipcub::TexRefInputIterator< T, UNIQUE_ID, OffsetT > | |
Chipcub::CachingDeviceAllocator::TotalBytes | |
Chipcub::Uninitialized< T > | A storage-backing wrapper that allows types with non-trivial constructors to be aliased in unions |
►Chipcub::Uninitialized< _TempStorage > | |
Chipcub::BlockMergeSortStrategy< KeyT, ValueT, NUM_THREADS, ITEMS_PER_THREAD, SynchronizationPolicy >::TempStorage | \smemstorage{BlockMergeSort} |
Chipcub::BlockRakingLayout< T, BLOCK_THREADS, ARCH >::TempStorage | Alias wrapper allowing storage to be unioned |
Chipcub::BlockRunLengthDecode< ItemT, BLOCK_DIM_X, RUNS_PER_THREAD, DECODED_ITEMS_PER_THREAD, DecodedOffsetT, BLOCK_DIM_Y, BLOCK_DIM_Z >::TempStorage | |
Chipcub::WarpLoad< InputT, ITEMS_PER_THREAD, ALGORITHM, LOGICAL_WARP_THREADS, ARCH >::TempStorage | |
Chipcub::WarpStore< T, ITEMS_PER_THREAD, ALGORITHM, LOGICAL_WARP_THREADS, ARCH >::TempStorage | |
►Crocprim::warp_reduce | |
Chipcub::WarpReduce< T, LOGICAL_WARP_THREADS, ARCH > | |
►Crocprim::warp_scan | |
Chipcub::WarpScan< T, LOGICAL_WARP_THREADS, ARCH > | |
Chipcub::WarpExchange< InputT, ITEMS_PER_THREAD, LOGICAL_WARP_THREADS, ARCH > | |
Chipcub::WarpLoad< InputT, ITEMS_PER_THREAD, ALGORITHM, LOGICAL_WARP_THREADS, ARCH > | |
Chipcub::WarpStore< T, ITEMS_PER_THREAD, ALGORITHM, LOGICAL_WARP_THREADS, ARCH > | |