WarpMergeSort< KeyT, ITEMS_PER_THREAD, LOGICAL_WARP_THREADS, ValueT, PTX_ARCH > Class Template Reference

WarpMergeSort&lt; KeyT, ITEMS_PER_THREAD, LOGICAL_WARP_THREADS, ValueT, PTX_ARCH &gt; Class Template Reference#

hipCUB: hipcub::WarpMergeSort< KeyT, ITEMS_PER_THREAD, LOGICAL_WARP_THREADS, ValueT, PTX_ARCH > Class Template Reference
hipcub::WarpMergeSort< KeyT, ITEMS_PER_THREAD, LOGICAL_WARP_THREADS, ValueT, PTX_ARCH > Class Template Reference

The WarpMergeSort class provides methods for sorting items partitioned across a CUDA warp using a merge sorting method. More...

#include <warp_merge_sort.hpp>

Inheritance diagram for hipcub::WarpMergeSort< KeyT, ITEMS_PER_THREAD, LOGICAL_WARP_THREADS, ValueT, PTX_ARCH >:
hipcub::BlockMergeSortStrategy< KeyT, NullType, ::rocprim::device_warp_size(), ITEMS_PER_THREAD, WarpMergeSort< KeyT, ITEMS_PER_THREAD, ::rocprim::device_warp_size(), NullType, 1 > >

Public Member Functions

__device__ __forceinline__ WarpMergeSort (typename BlockMergeSortStrategyT::TempStorage &temp_storage)
 
__device__ __forceinline__ uint64_t get_member_mask () const
 
__device__ __forceinline__ unsigned int get_linear_tid () const
 
__device__ __forceinline__ void Sort (KeyT(&keys)[ITEMS_PER_THREAD], CompareOp compare_op)
 Sorts items partitioned across a CUDA thread block using a merge sorting method. More...
 
__device__ __forceinline__ void Sort (KeyT(&keys)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default)
 Sorts items partitioned across a CUDA thread block using a merge sorting method. More...
 
__device__ __forceinline__ void Sort (KeyT(&keys)[ITEMS_PER_THREAD], NullType(&items)[ITEMS_PER_THREAD], CompareOp compare_op)
 Sorts items partitioned across a CUDA thread block using a merge sorting method. More...
 
__device__ __forceinline__ void Sort (KeyT(&keys)[ITEMS_PER_THREAD], NullType(&items)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default)
 Sorts items partitioned across a CUDA thread block using a merge sorting method. More...
 
__device__ __forceinline__ void StableSort (KeyT(&keys)[ITEMS_PER_THREAD], CompareOp compare_op)
 Sorts items partitioned across a CUDA thread block using a merge sorting method. More...
 
__device__ __forceinline__ void StableSort (KeyT(&keys)[ITEMS_PER_THREAD], NullType(&items)[ITEMS_PER_THREAD], CompareOp compare_op)
 Sorts items partitioned across a CUDA thread block using a merge sorting method. More...
 
__device__ __forceinline__ void StableSort (KeyT(&keys)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default)
 Sorts items partitioned across a CUDA thread block using a merge sorting method. More...
 
__device__ __forceinline__ void StableSort (KeyT(&keys)[ITEMS_PER_THREAD], NullType(&items)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default)
 Sorts items partitioned across a CUDA thread block using a merge sorting method. More...
 

Detailed Description

template<typename KeyT, int ITEMS_PER_THREAD, int LOGICAL_WARP_THREADS = ::rocprim::device_warp_size(), typename ValueT = NullType, int PTX_ARCH = 1>
class hipcub::WarpMergeSort< KeyT, ITEMS_PER_THREAD, LOGICAL_WARP_THREADS, ValueT, PTX_ARCH >

The WarpMergeSort class provides methods for sorting items partitioned across a CUDA warp using a merge sorting method.

Template Parameters
KeyTKey type
ITEMS_PER_THREADThe number of items per thread
LOGICAL_WARP_THREADS[optional] The number of threads per "logical" warp (may be less than the number of hardware warp threads). Default is the warp size of the targeted CUDA compute-capability (e.g., 32 threads for SM86). Must be a power of two.
ValueT[optional] Value type (default: cub::NullType, which indicates a keys-only sort)
PTX_ARCH[optional] \ptxversion
Overview
WarpMergeSort arranges items into ascending order using a comparison functor with less-than semantics. Merge sort can handle arbitrary types and comparison functors.
A Simple Example
The code snippet below illustrates a sort of 64 integer keys that are partitioned across 16 threads where each thread owns 4 consecutive items.
#include <cub/cub.cuh> // or equivalently <cub/warp/warp_merge_sort.cuh>
struct CustomLess
{
template <typename DataType>
__device__ bool operator()(const DataType &lhs, const DataType &rhs)
{
return lhs < rhs;
}
};
__global__ void ExampleKernel(...)
{
constexpr int warp_threads = 16;
constexpr int block_threads = 256;
constexpr int items_per_thread = 4;
constexpr int warps_per_block = block_threads / warp_threads;
const int warp_id = static_cast<int>(threadIdx.x) / warp_threads;
// Specialize WarpMergeSort for a virtual warp of 16 threads
// owning 4 integer items each
using WarpMergeSortT =
cub::WarpMergeSort<int, items_per_thread, warp_threads>;
// Allocate shared memory for WarpMergeSort
__shared__ typename WarpMergeSort::TempStorage temp_storage[warps_per_block];
// Obtain a segment of consecutive items that are blocked across threads
int thread_keys[items_per_thread];
// ...
WarpMergeSort(temp_storage[warp_id]).Sort(thread_keys, CustomLess());
// ...
}
Suppose the set of input thread_keys across the block of threads is { [0,511,1,510], [2,509,3,508], [4,507,5,506], ..., [254,257,255,256] }. The corresponding output thread_keys in those threads will be { [0,1,2,3], [4,5,6,7], [8,9,10,11], ..., [508,509,510,511] }.

Member Function Documentation

◆ Sort() [1/4]

__device__ __forceinline__ void hipcub::BlockMergeSortStrategy< KeyT, NullType , NUM_THREADS, ITEMS_PER_THREAD, WarpMergeSort< KeyT, ITEMS_PER_THREAD, ::rocprim::device_warp_size(), NullType, 1 > >::Sort ( KeyT(&)  keys[ITEMS_PER_THREAD],
CompareOp  compare_op 
)
inlineinherited

Sorts items partitioned across a CUDA thread block using a merge sorting method.

Sort is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
Template Parameters
CompareOpfunctor type having member bool operator()(KeyT lhs, KeyT rhs). CompareOp is a model of Strict Weak Ordering.
Parameters
[in,out]keysKeys to sort
[in]compare_opComparison function object which returns true if the first argument is ordered before the second

◆ Sort() [2/4]

__device__ __forceinline__ void hipcub::BlockMergeSortStrategy< KeyT, NullType , NUM_THREADS, ITEMS_PER_THREAD, WarpMergeSort< KeyT, ITEMS_PER_THREAD, ::rocprim::device_warp_size(), NullType, 1 > >::Sort ( KeyT(&)  keys[ITEMS_PER_THREAD],
CompareOp  compare_op,
int  valid_items,
KeyT  oob_default 
)
inlineinherited

Sorts items partitioned across a CUDA thread block using a merge sorting method.

  • Sort is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
  • The value of oob_default is assigned to all elements that are out of valid_items boundaries. It's expected that oob_default is ordered after any value in the valid_items boundaries. The algorithm always sorts a fixed amount of elements, which is equal to ITEMS_PER_THREAD * BLOCK_THREADS. If there is a value that is ordered after oob_default, it won't be placed within valid_items boundaries.
Template Parameters
CompareOpfunctor type having member bool operator()(KeyT lhs, KeyT rhs). CompareOp is a model of Strict Weak Ordering.
Parameters
[in,out]keysKeys to sort
[in]compare_opComparison function object which returns true if the first argument is ordered before the second
[in]valid_itemsNumber of valid items to sort
[in]oob_defaultDefault value to assign out-of-bound items

◆ Sort() [3/4]

__device__ __forceinline__ void hipcub::BlockMergeSortStrategy< KeyT, NullType , NUM_THREADS, ITEMS_PER_THREAD, WarpMergeSort< KeyT, ITEMS_PER_THREAD, ::rocprim::device_warp_size(), NullType, 1 > >::Sort ( KeyT(&)  keys[ITEMS_PER_THREAD],
NullType (&)  items[ITEMS_PER_THREAD],
CompareOp  compare_op 
)
inlineinherited

Sorts items partitioned across a CUDA thread block using a merge sorting method.

Sort is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
Template Parameters
CompareOpfunctor type having member bool operator()(KeyT lhs, KeyT rhs). CompareOp is a model of Strict Weak Ordering.
Parameters
[in,out]keysKeys to sort
[in,out]itemsValues to sort
[in]compare_opComparison function object which returns true if the first argument is ordered before the second

◆ Sort() [4/4]

__device__ __forceinline__ void hipcub::BlockMergeSortStrategy< KeyT, NullType , NUM_THREADS, ITEMS_PER_THREAD, WarpMergeSort< KeyT, ITEMS_PER_THREAD, ::rocprim::device_warp_size(), NullType, 1 > >::Sort ( KeyT(&)  keys[ITEMS_PER_THREAD],
NullType (&)  items[ITEMS_PER_THREAD],
CompareOp  compare_op,
int  valid_items,
KeyT  oob_default 
)
inlineinherited

Sorts items partitioned across a CUDA thread block using a merge sorting method.

  • Sort is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
  • The value of oob_default is assigned to all elements that are out of valid_items boundaries. It's expected that oob_default is ordered after any value in the valid_items boundaries. The algorithm always sorts a fixed amount of elements, which is equal to ITEMS_PER_THREAD * BLOCK_THREADS. If there is a value that is ordered after oob_default, it won't be placed within valid_items boundaries.
Template Parameters
CompareOpfunctor type having member bool operator()(KeyT lhs, KeyT rhs) CompareOp is a model of Strict Weak Ordering.
IS_LAST_TILETrue if valid_items isn't equal to the ITEMS_PER_TILE
Parameters
[in,out]keysKeys to sort
[in,out]itemsValues to sort
[in]compare_opComparison function object which returns true if the first argument is ordered before the second
[in]valid_itemsNumber of valid items to sort
[in]oob_defaultDefault value to assign out-of-bound items

◆ StableSort() [1/4]

__device__ __forceinline__ void hipcub::BlockMergeSortStrategy< KeyT, NullType , NUM_THREADS, ITEMS_PER_THREAD, WarpMergeSort< KeyT, ITEMS_PER_THREAD, ::rocprim::device_warp_size(), NullType, 1 > >::StableSort ( KeyT(&)  keys[ITEMS_PER_THREAD],
CompareOp  compare_op 
)
inlineinherited

Sorts items partitioned across a CUDA thread block using a merge sorting method.

StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if x and y are elements such that x precedes y, and if the two elements are equivalent (neither x < y nor y < x) then a postcondition of StableSort is that x still precedes y.
Template Parameters
CompareOpfunctor type having member bool operator()(KeyT lhs, KeyT rhs). CompareOp is a model of Strict Weak Ordering.
Parameters
[in,out]keysKeys to sort
[in]compare_opComparison function object which returns true if the first argument is ordered before the second

◆ StableSort() [2/4]

__device__ __forceinline__ void hipcub::BlockMergeSortStrategy< KeyT, NullType , NUM_THREADS, ITEMS_PER_THREAD, WarpMergeSort< KeyT, ITEMS_PER_THREAD, ::rocprim::device_warp_size(), NullType, 1 > >::StableSort ( KeyT(&)  keys[ITEMS_PER_THREAD],
NullType (&)  items[ITEMS_PER_THREAD],
CompareOp  compare_op 
)
inlineinherited

Sorts items partitioned across a CUDA thread block using a merge sorting method.

StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if x and y are elements such that x precedes y, and if the two elements are equivalent (neither x < y nor y < x) then a postcondition of StableSort is that x still precedes y.
Template Parameters
CompareOpfunctor type having member bool operator()(KeyT lhs, KeyT rhs). CompareOp is a model of Strict Weak Ordering.
Parameters
[in,out]keysKeys to sort
[in,out]itemsValues to sort
[in]compare_opComparison function object which returns true if the first argument is ordered before the second

◆ StableSort() [3/4]

__device__ __forceinline__ void hipcub::BlockMergeSortStrategy< KeyT, NullType , NUM_THREADS, ITEMS_PER_THREAD, WarpMergeSort< KeyT, ITEMS_PER_THREAD, ::rocprim::device_warp_size(), NullType, 1 > >::StableSort ( KeyT(&)  keys[ITEMS_PER_THREAD],
CompareOp  compare_op,
int  valid_items,
KeyT  oob_default 
)
inlineinherited

Sorts items partitioned across a CUDA thread block using a merge sorting method.

  • StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if x and y are elements such that x precedes y, and if the two elements are equivalent (neither x < y nor y < x) then a postcondition of StableSort is that x still precedes y.
  • The value of oob_default is assigned to all elements that are out of valid_items boundaries. It's expected that oob_default is ordered after any value in the valid_items boundaries. The algorithm always sorts a fixed amount of elements, which is equal to ITEMS_PER_THREAD * BLOCK_THREADS. If there is a value that is ordered after oob_default, it won't be placed within valid_items boundaries.
Template Parameters
CompareOpfunctor type having member bool operator()(KeyT lhs, KeyT rhs). CompareOp is a model of Strict Weak Ordering.
Parameters
[in,out]keysKeys to sort
[in]compare_opComparison function object which returns true if the first argument is ordered before the second
[in]valid_itemsNumber of valid items to sort
[in]oob_defaultDefault value to assign out-of-bound items

◆ StableSort() [4/4]

__device__ __forceinline__ void hipcub::BlockMergeSortStrategy< KeyT, NullType , NUM_THREADS, ITEMS_PER_THREAD, WarpMergeSort< KeyT, ITEMS_PER_THREAD, ::rocprim::device_warp_size(), NullType, 1 > >::StableSort ( KeyT(&)  keys[ITEMS_PER_THREAD],
NullType (&)  items[ITEMS_PER_THREAD],
CompareOp  compare_op,
int  valid_items,
KeyT  oob_default 
)
inlineinherited

Sorts items partitioned across a CUDA thread block using a merge sorting method.

  • StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if x and y are elements such that x precedes y, and if the two elements are equivalent (neither x < y nor y < x) then a postcondition of StableSort is that x still precedes y.
  • The value of oob_default is assigned to all elements that are out of valid_items boundaries. It's expected that oob_default is ordered after any value in the valid_items boundaries. The algorithm always sorts a fixed amount of elements, which is equal to ITEMS_PER_THREAD * BLOCK_THREADS. If there is a value that is ordered after oob_default, it won't be placed within valid_items boundaries.
Template Parameters
CompareOpfunctor type having member bool operator()(KeyT lhs, KeyT rhs). CompareOp is a model of Strict Weak Ordering.
IS_LAST_TILETrue if valid_items isn't equal to the ITEMS_PER_TILE
Parameters
[in,out]keysKeys to sort
[in,out]itemsValues to sort
[in]compare_opComparison function object which returns true if the first argument is ordered before the second
[in]valid_itemsNumber of valid items to sort
[in]oob_defaultDefault value to assign out-of-bound items

The documentation for this class was generated from the following file:
  • /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-hipcub/checkouts/docs-5.5.1/hipcub/include/hipcub/backend/rocprim/warp/warp_merge_sort.hpp