BlockMergeSort< KeyT, BLOCK_DIM_X, ITEMS_PER_THREAD, ValueT, BLOCK_DIM_Y, BLOCK_DIM_Z > Class Template Reference#
The BlockMergeSort class provides methods for sorting items partitioned across a CUDA thread block using a merge sorting method. More...
#include <block_merge_sort.hpp>
Public Member Functions | |
__device__ __forceinline__ | BlockMergeSort (typename BlockMergeSortStrategyT::TempStorage &temp_storage) |
__device__ __forceinline__ unsigned int | get_linear_tid () const |
__device__ __forceinline__ void | Sort (KeyT(&keys)[ITEMS_PER_THREAD], CompareOp compare_op) |
Sorts items partitioned across a CUDA thread block using a merge sorting method. More... | |
__device__ __forceinline__ void | Sort (KeyT(&keys)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default) |
Sorts items partitioned across a CUDA thread block using a merge sorting method. More... | |
__device__ __forceinline__ void | Sort (KeyT(&keys)[ITEMS_PER_THREAD], NullType(&items)[ITEMS_PER_THREAD], CompareOp compare_op) |
Sorts items partitioned across a CUDA thread block using a merge sorting method. More... | |
__device__ __forceinline__ void | Sort (KeyT(&keys)[ITEMS_PER_THREAD], NullType(&items)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default) |
Sorts items partitioned across a CUDA thread block using a merge sorting method. More... | |
__device__ __forceinline__ void | StableSort (KeyT(&keys)[ITEMS_PER_THREAD], CompareOp compare_op) |
Sorts items partitioned across a CUDA thread block using a merge sorting method. More... | |
__device__ __forceinline__ void | StableSort (KeyT(&keys)[ITEMS_PER_THREAD], NullType(&items)[ITEMS_PER_THREAD], CompareOp compare_op) |
Sorts items partitioned across a CUDA thread block using a merge sorting method. More... | |
__device__ __forceinline__ void | StableSort (KeyT(&keys)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default) |
Sorts items partitioned across a CUDA thread block using a merge sorting method. More... | |
__device__ __forceinline__ void | StableSort (KeyT(&keys)[ITEMS_PER_THREAD], NullType(&items)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default) |
Sorts items partitioned across a CUDA thread block using a merge sorting method. More... | |
Detailed Description
template<typename KeyT, int BLOCK_DIM_X, int ITEMS_PER_THREAD, typename ValueT = NullType, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1>
class hipcub::BlockMergeSort< KeyT, BLOCK_DIM_X, ITEMS_PER_THREAD, ValueT, BLOCK_DIM_Y, BLOCK_DIM_Z >
The BlockMergeSort class provides methods for sorting items partitioned across a CUDA thread block using a merge sorting method.
- Template Parameters
-
KeyT KeyT type BLOCK_DIM_X The thread block length in threads along the X dimension ITEMS_PER_THREAD The number of items per thread ValueT [optional] ValueT type (default: hipcub::NullType
, which indicates a keys-only sort)BLOCK_DIM_Y [optional] The thread block length in threads along the Y dimension (default: 1) BLOCK_DIM_Z [optional] The thread block length in threads along the Z dimension (default: 1)
- Overview
- BlockMergeSort arranges items into ascending order using a comparison functor with less-than semantics. Merge sort can handle arbitrary types and comparison functors, but is slower than BlockRadixSort when sorting arithmetic types into ascending/descending order.
- A Simple Example
- @blockcollective{BlockMergeSort}
- The code snippet below illustrates a sort of 512 integer keys that are partitioned across 128 threads * where each thread owns 4 consecutive items.
- #include <hipcub/hipcub.hpp> // or equivalently <hipcub/block/block_merge_sort.hpp>struct CustomLess{template <typename DataType>__device__ bool operator()(const DataType &lhs, const DataType &rhs){return lhs < rhs;}};__global__ void ExampleKernel(...){// Specialize BlockMergeSort for a 1D block of 128 threads owning 4 integer items eachtypedef hipcub::BlockMergeSort<int, 128, 4> BlockMergeSort;// Allocate shared memory for BlockMergeSort__shared__ typename BlockMergeSort::TempStorage temp_storage_shuffle;// Obtain a segment of consecutive items that are blocked across threadsint thread_keys[4];...BlockMergeSort(temp_storage_shuffle).Sort(thread_keys, CustomLess());...}The BlockMergeSort class provides methods for sorting items partitioned across a CUDA thread block us...Definition: block_merge_sort.hpp:771
- Suppose the set of input
thread_keys
across the block of threads is{ [0,511,1,510], [2,509,3,508], [4,507,5,506], ..., [254,257,255,256] }
. The corresponding outputthread_keys
in those threads will be{ [0,1,2,3], [4,5,6,7], [8,9,10,11], ..., [508,509,510,511] }
.
- Re-using dynamically allocating shared memory
- The following example under the examples/block folder illustrates usage of dynamically shared memory with BlockReduce and how to re-purpose the same memory region: example_block_reduce_dyn_smem.cu
This example can be easily adapted to the storage required by BlockMergeSort.
Member Function Documentation
◆ Sort() [1/4]
|
inlineinherited |
Sorts items partitioned across a CUDA thread block using a merge sorting method.
- Sort is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
- Template Parameters
-
CompareOp functor type having member bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.
- Parameters
-
[in,out] keys Keys to sort [in] compare_op Comparison function object which returns true if the first argument is ordered before the second
◆ Sort() [2/4]
|
inlineinherited |
Sorts items partitioned across a CUDA thread block using a merge sorting method.
- Sort is not guaranteed to be stable. That is, suppose that
i
andj
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort. - The value of
oob_default
is assigned to all elements that are out ofvalid_items
boundaries. It's expected thatoob_default
is ordered after any value in thevalid_items
boundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS
. If there is a value that is ordered afteroob_default
, it won't be placed withinvalid_items
boundaries.
- Sort is not guaranteed to be stable. That is, suppose that
- Template Parameters
-
CompareOp functor type having member bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.
- Parameters
-
[in,out] keys Keys to sort [in] compare_op Comparison function object which returns true if the first argument is ordered before the second [in] valid_items Number of valid items to sort [in] oob_default Default value to assign out-of-bound items
◆ Sort() [3/4]
|
inlineinherited |
Sorts items partitioned across a CUDA thread block using a merge sorting method.
- Sort is not guaranteed to be stable. That is, suppose that
i
andj
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
- Template Parameters
-
CompareOp functor type having member bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.
- Parameters
-
[in,out] keys Keys to sort [in,out] items Values to sort [in] compare_op Comparison function object which returns true if the first argument is ordered before the second
◆ Sort() [4/4]
|
inlineinherited |
Sorts items partitioned across a CUDA thread block using a merge sorting method.
- Sort is not guaranteed to be stable. That is, suppose that
i
andj
are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort. - The value of
oob_default
is assigned to all elements that are out ofvalid_items
boundaries. It's expected thatoob_default
is ordered after any value in thevalid_items
boundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS
. If there is a value that is ordered afteroob_default
, it won't be placed withinvalid_items
boundaries.
- Sort is not guaranteed to be stable. That is, suppose that
- Template Parameters
-
CompareOp functor type having member bool operator()(KeyT lhs, KeyT rhs)
CompareOp
is a model of Strict Weak Ordering.IS_LAST_TILE True if valid_items
isn't equal to theITEMS_PER_TILE
- Parameters
-
[in,out] keys Keys to sort [in,out] items Values to sort [in] compare_op Comparison function object which returns true if the first argument is ordered before the second [in] valid_items Number of valid items to sort [in] oob_default Default value to assign out-of-bound items
◆ StableSort() [1/4]
|
inlineinherited |
Sorts items partitioned across a CUDA thread block using a merge sorting method.
- StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
x
andy
are elements such thatx
precedesy
, and if the two elements are equivalent (neitherx < y
nory < x
) then a postcondition of StableSort is thatx
still precedesy
.
- Template Parameters
-
CompareOp functor type having member bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.
- Parameters
-
[in,out] keys Keys to sort [in] compare_op Comparison function object which returns true if the first argument is ordered before the second
◆ StableSort() [2/4]
|
inlineinherited |
Sorts items partitioned across a CUDA thread block using a merge sorting method.
- StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
x
andy
are elements such thatx
precedesy
, and if the two elements are equivalent (neitherx < y
nory < x
) then a postcondition of StableSort is thatx
still precedesy
.
- Template Parameters
-
CompareOp functor type having member bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.
- Parameters
-
[in,out] keys Keys to sort [in,out] items Values to sort [in] compare_op Comparison function object which returns true if the first argument is ordered before the second
◆ StableSort() [3/4]
|
inlineinherited |
Sorts items partitioned across a CUDA thread block using a merge sorting method.
- StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
x
andy
are elements such thatx
precedesy
, and if the two elements are equivalent (neitherx < y
nory < x
) then a postcondition of StableSort is thatx
still precedesy
. - The value of
oob_default
is assigned to all elements that are out ofvalid_items
boundaries. It's expected thatoob_default
is ordered after any value in thevalid_items
boundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS
. If there is a value that is ordered afteroob_default
, it won't be placed withinvalid_items
boundaries.
- StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
- Template Parameters
-
CompareOp functor type having member bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.
- Parameters
-
[in,out] keys Keys to sort [in] compare_op Comparison function object which returns true if the first argument is ordered before the second [in] valid_items Number of valid items to sort [in] oob_default Default value to assign out-of-bound items
◆ StableSort() [4/4]
|
inlineinherited |
Sorts items partitioned across a CUDA thread block using a merge sorting method.
- StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
x
andy
are elements such thatx
precedesy
, and if the two elements are equivalent (neitherx < y
nory < x
) then a postcondition of StableSort is thatx
still precedesy
. - The value of
oob_default
is assigned to all elements that are out ofvalid_items
boundaries. It's expected thatoob_default
is ordered after any value in thevalid_items
boundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS
. If there is a value that is ordered afteroob_default
, it won't be placed withinvalid_items
boundaries.
- StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
- Template Parameters
-
CompareOp functor type having member bool operator()(KeyT lhs, KeyT rhs)
.CompareOp
is a model of Strict Weak Ordering.IS_LAST_TILE True if valid_items
isn't equal to theITEMS_PER_TILE
- Parameters
-
[in,out] keys Keys to sort [in,out] items Values to sort [in] compare_op Comparison function object which returns true if the first argument is ordered before the second [in] valid_items Number of valid items to sort [in] oob_default Default value to assign out-of-bound items
The documentation for this class was generated from the following file:
- /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hipcub/checkouts/docs-5.4.3/hipcub/include/hipcub/backend/rocprim/block/block_merge_sort.hpp