WarpMergeSort< KeyT, ITEMS_PER_THREAD, LOGICAL_WARP_THREADS, ValueT, PTX_ARCH > Class Template Reference#
The WarpMergeSort class provides methods for sorting items partitioned across a CUDA warp using a merge sorting method. More...
#include <warp_merge_sort.hpp>
Public Member Functions | |
__device__ __forceinline__ | WarpMergeSort (typename BlockMergeSortStrategyT::TempStorage &temp_storage) |
__device__ __forceinline__ uint64_t | get_member_mask () const |
__device__ __forceinline__ unsigned int | get_linear_tid () const |
__device__ __forceinline__ void | Sort (KeyT(&keys)[ITEMS_PER_THREAD], CompareOp compare_op) |
Sorts items partitioned across a CUDA thread block using a merge sorting method. More... | |
__device__ __forceinline__ void | Sort (KeyT(&keys)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default) |
Sorts items partitioned across a CUDA thread block using a merge sorting method. More... | |
__device__ __forceinline__ void | Sort (KeyT(&keys)[ITEMS_PER_THREAD], NullType(&items)[ITEMS_PER_THREAD], CompareOp compare_op) |
Sorts items partitioned across a CUDA thread block using a merge sorting method. More... | |
__device__ __forceinline__ void | Sort (KeyT(&keys)[ITEMS_PER_THREAD], NullType(&items)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default) |
Sorts items partitioned across a CUDA thread block using a merge sorting method. More... | |
__device__ __forceinline__ void | StableSort (KeyT(&keys)[ITEMS_PER_THREAD], CompareOp compare_op) |
Sorts items partitioned across a CUDA thread block using a merge sorting method. More... | |
__device__ __forceinline__ void | StableSort (KeyT(&keys)[ITEMS_PER_THREAD], NullType(&items)[ITEMS_PER_THREAD], CompareOp compare_op) |
Sorts items partitioned across a CUDA thread block using a merge sorting method. More... | |
__device__ __forceinline__ void | StableSort (KeyT(&keys)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default) |
Sorts items partitioned across a CUDA thread block using a merge sorting method. More... | |
__device__ __forceinline__ void | StableSort (KeyT(&keys)[ITEMS_PER_THREAD], NullType(&items)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default) |
Sorts items partitioned across a CUDA thread block using a merge sorting method. More... | |
Detailed Description
template<typename KeyT, int ITEMS_PER_THREAD, int LOGICAL_WARP_THREADS = ::rocprim::device_warp_size(), typename ValueT = NullType, int PTX_ARCH = 1>
class hipcub::WarpMergeSort< KeyT, ITEMS_PER_THREAD, LOGICAL_WARP_THREADS, ValueT, PTX_ARCH >
The WarpMergeSort class provides methods for sorting items partitioned across a CUDA warp using a merge sorting method.
- Template Parameters
-
KeyT Key type ITEMS_PER_THREAD The number of items per thread LOGICAL_WARP_THREADS [optional] The number of threads per "logical" warp (may be less than the number of hardware warp threads). Default is the warp size of the targeted CUDA compute-capability (e.g., 32 threads for SM86). Must be a power of two. ValueT [optional] Value type (default: cub::NullType, which indicates a keys-only sort) PTX_ARCH [optional] \ptxversion
- Overview
- WarpMergeSort arranges items into ascending order using a comparison functor with less-than semantics. Merge sort can handle arbitrary types and comparison functors.
- A Simple Example
- The code snippet below illustrates a sort of 64 integer keys that are partitioned across 16 threads where each thread owns 4 consecutive items.
- #include <cub/cub.cuh> // or equivalently <cub/warp/warp_merge_sort.cuh>struct CustomLess{template <typename DataType>__device__ bool operator()(const DataType &lhs, const DataType &rhs){return lhs < rhs;}};__global__ void ExampleKernel(...){constexpr int warp_threads = 16;constexpr int block_threads = 256;constexpr int items_per_thread = 4;constexpr int warps_per_block = block_threads / warp_threads;const int warp_id = static_cast<int>(threadIdx.x) / warp_threads;// Specialize WarpMergeSort for a virtual warp of 16 threads// owning 4 integer items eachusing WarpMergeSortT =cub::WarpMergeSort<int, items_per_thread, warp_threads>;// Allocate shared memory for WarpMergeSort__shared__ typename WarpMergeSort::TempStorage temp_storage[warps_per_block];// Obtain a segment of consecutive items that are blocked across threadsint thread_keys[items_per_thread];// ...WarpMergeSort(temp_storage[warp_id]).Sort(thread_keys, CustomLess());// ...}
- Suppose the set of input
thread_keys
across the block of threads is{ [0,511,1,510], [2,509,3,508], [4,507,5,506], ..., [254,257,255,256] }
. The corresponding outputthread_keys
in those threads will be{ [0,1,2,3], [4,5,6,7], [8,9,10,11], ..., [508,509,510,511] }
.
Member Function Documentation
◆ Sort() [1/4]
|
inlineinherited |
Sorts items partitioned across a CUDA thread block using a merge sorting method.
- Sort is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
- Template Parameters
-
CompareOp functor type having member bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.
- Parameters
-
[in,out] keys Keys to sort [in] compare_op Comparison function object which returns true if the first argument is ordered before the second
◆ Sort() [2/4]
|
inlineinherited |
Sorts items partitioned across a CUDA thread block using a merge sorting method.
- Sort is not guaranteed to be stable. That is, suppose that
i
andj
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort. - The value of
oob_default
is assigned to all elements that are out ofvalid_items
boundaries. It's expected thatoob_default
is ordered after any value in thevalid_items
boundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS
. If there is a value that is ordered afteroob_default
, it won't be placed withinvalid_items
boundaries.
- Sort is not guaranteed to be stable. That is, suppose that
- Template Parameters
-
CompareOp functor type having member bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.
- Parameters
-
[in,out] keys Keys to sort [in] compare_op Comparison function object which returns true if the first argument is ordered before the second [in] valid_items Number of valid items to sort [in] oob_default Default value to assign out-of-bound items
◆ Sort() [3/4]
|
inlineinherited |
Sorts items partitioned across a CUDA thread block using a merge sorting method.
- Sort is not guaranteed to be stable. That is, suppose that
i
andj
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
- Template Parameters
-
CompareOp functor type having member bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.
- Parameters
-
[in,out] keys Keys to sort [in,out] items Values to sort [in] compare_op Comparison function object which returns true if the first argument is ordered before the second
◆ Sort() [4/4]
|
inlineinherited |
Sorts items partitioned across a CUDA thread block using a merge sorting method.
- Sort is not guaranteed to be stable. That is, suppose that
i
andj
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort. - The value of
oob_default
is assigned to all elements that are out ofvalid_items
boundaries. It's expected thatoob_default
is ordered after any value in thevalid_items
boundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS
. If there is a value that is ordered afteroob_default
, it won't be placed withinvalid_items
boundaries.
- Sort is not guaranteed to be stable. That is, suppose that
- Template Parameters
-
CompareOp functor type having member bool operator()(KeyT lhs, KeyT rhs)
CompareOp
is a model of Strict Weak Ordering.IS_LAST_TILE True if valid_items
isn't equal to theITEMS_PER_TILE
- Parameters
-
[in,out] keys Keys to sort [in,out] items Values to sort [in] compare_op Comparison function object which returns true if the first argument is ordered before the second [in] valid_items Number of valid items to sort [in] oob_default Default value to assign out-of-bound items
◆ StableSort() [1/4]
|
inlineinherited |
Sorts items partitioned across a CUDA thread block using a merge sorting method.
- StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
x
andy
are elements such thatx
precedesy
, and if the two elements are equivalent (neitherx < y
nory < x
) then a postcondition of StableSort is thatx
still precedesy
.
- Template Parameters
-
CompareOp functor type having member bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.
- Parameters
-
[in,out] keys Keys to sort [in] compare_op Comparison function object which returns true if the first argument is ordered before the second
◆ StableSort() [2/4]
|
inlineinherited |
Sorts items partitioned across a CUDA thread block using a merge sorting method.
- StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
x
andy
are elements such thatx
precedesy
, and if the two elements are equivalent (neitherx < y
nory < x
) then a postcondition of StableSort is thatx
still precedesy
.
- Template Parameters
-
CompareOp functor type having member bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.
- Parameters
-
[in,out] keys Keys to sort [in,out] items Values to sort [in] compare_op Comparison function object which returns true if the first argument is ordered before the second
◆ StableSort() [3/4]
|
inlineinherited |
Sorts items partitioned across a CUDA thread block using a merge sorting method.
- StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
x
andy
are elements such thatx
precedesy
, and if the two elements are equivalent (neitherx < y
nory < x
) then a postcondition of StableSort is thatx
still precedesy
. - The value of
oob_default
is assigned to all elements that are out ofvalid_items
boundaries. It's expected thatoob_default
is ordered after any value in thevalid_items
boundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS
. If there is a value that is ordered afteroob_default
, it won't be placed withinvalid_items
boundaries.
- StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
- Template Parameters
-
CompareOp functor type having member bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.
- Parameters
-
[in,out] keys Keys to sort [in] compare_op Comparison function object which returns true if the first argument is ordered before the second [in] valid_items Number of valid items to sort [in] oob_default Default value to assign out-of-bound items
◆ StableSort() [4/4]
|
inlineinherited |
Sorts items partitioned across a CUDA thread block using a merge sorting method.
- StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
x
andy
are elements such thatx
precedesy
, and if the two elements are equivalent (neitherx < y
nory < x
) then a postcondition of StableSort is thatx
still precedesy
. - The value of
oob_default
is assigned to all elements that are out ofvalid_items
boundaries. It's expected thatoob_default
is ordered after any value in thevalid_items
boundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS
. If there is a value that is ordered afteroob_default
, it won't be placed withinvalid_items
boundaries.
- StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
- Template Parameters
-
CompareOp functor type having member bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.IS_LAST_TILE True if valid_items
isn't equal to theITEMS_PER_TILE
- Parameters
-
[in,out] keys Keys to sort [in,out] items Values to sort [in] compare_op Comparison function object which returns true if the first argument is ordered before the second [in] valid_items Number of valid items to sort [in] oob_default Default value to assign out-of-bound items
The documentation for this class was generated from the following file:
- /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hipcub/checkouts/docs-5.3.3/hipcub/include/hipcub/backend/rocprim/warp/warp_merge_sort.hpp