Histogram#
Class#
- 
template<class T, unsigned int BlockSizeX, unsigned int ItemsPerThread, unsigned int Bins, block_histogram_algorithm Algorithm = block_histogram_algorithm::default_algorithm, unsigned int BlockSizeY = 1, unsigned int BlockSizeZ = 1>
 class block_histogram#
- The block_histogram class is a block level parallel primitive which provides methods for constructing block-wide histograms from items partitioned across threads in a block. - Overview
- block_histogram has two alternative implementations: - block_histogram_algorithm::using_atomicand- block_histogram_algorithm::using_sort.
 
- Examples
- In the examples histogram operation is performed on block of 192 threads, each provides one - intvalue, result is returned using the same variable as for input.- __global__ void example_kernel(...) { // specialize block_histogram for int, logical block of 192 threads, // 2 items per thread and a bin size of 192. using block_histogram_int = rocprim::block_histogram<int, 192, 2, 192>; // allocate storage in shared memory __shared__ block_histogram_int::storage_type storage; __shared__ int hist[192]; int value[2]; ... // execute histogram block_histogram_int().histogram( value, // input hist, // output storage ); ... } 
 - Template Parameters:
- T – - the input/output type. 
- BlockSize – - the number of threads in a block. 
- ItemsPerThread – - the number of items to be processed by each thread. 
- Bins – - the number of bins within the histogram. 
- Algorithm – - selected histogram algorithm, block_histogram_algorithm::default_algorithm by default. 
 
 - Public Types - 
using storage_type = typename base_type::storage_type#
- Struct used to allocate a temporary memory that is required for thread communication during operations provided by related parallel primitive. - Depending on the implemention the operations exposed by parallel primitive may require a temporary storage for thread communication. The storage should be allocated using keywords - . It can be aliased to an externally allocated memory, or be a part of a union type with other storage types to increase shared memory reusability.
 - Public Functions - 
template<class Counter>
 __device__ inline void init_histogram(Counter hist[Bins])#
- Initialize histogram counters to zero. - Template Parameters:
- Counter – - [inferred] counter type of histogram. 
- Parameters:
- hist – [out] - histogram bin count. 
 
 - 
template<class Counter>
 __device__ inline void composite(T (&input)[ItemsPerThread], Counter hist[Bins], storage_type &storage)#
- Update an existing block-wide histogram. Each thread composites an array of input elements. - Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Examples
- In the examples histogram operation is performed on block of 192 threads, each provides one - intvalue, result is returned using the same variable as for input.- __global__ void example_kernel(...) { // specialize block_histogram for int, logical block of 192 threads, // 2 items per thread and a bin size of 192. using block_histogram_int = rocprim::block_histogram<int, 192, 2, 192>; // allocate storage in shared memory __shared__ block_histogram_int::storage_type storage; __shared__ int hist[192]; int value[2]; ... // initialize histogram block_histogram_int().init_histogram( hist // output ); rocprim::syncthreads(); // update histogram block_histogram_int().composite( value, // input hist, // output storage ); ... } 
 - Template Parameters:
- Counter – - [inferred] counter type of histogram. 
- Parameters:
- input – [in] - reference to an array containing thread input values. The function expects each value to satisfy 0 <= input[i] < BINS. 
- hist – [out] - histogram bin count. 
- storage – [in] - reference to a temporary storage object of type storage_type. 
 
 
 - 
template<class Counter>
 __device__ inline void composite(T (&input)[ItemsPerThread], Counter hist[Bins])#
- Update an existing block-wide histogram. Each thread composites an array of input elements. - This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. - This overload does not accept storage argument. Required shared memory is allocated by the method itself. 
 - Template Parameters:
- Counter – - [inferred] counter type of histogram. 
- Parameters:
- input – [in] - reference to an array containing thread input values. The function expects each value to satisfy 0 <= input[i] < BINS. 
- hist – [out] - histogram bin count. 
 
 
 - 
template<class Counter>
 __device__ inline void histogram(T (&input)[ItemsPerThread], Counter hist[Bins], storage_type &storage)#
- Construct a new block-wide histogram. Each thread contributes an array of input elements. - Storage reusage
- Synchronization barrier should be placed before - storageis reused or repurposed:- __syncthreads()or- rocprim::syncthreads().
- Examples
- In the examples histogram operation is performed on block of 192 threads, each provides one - intvalue, result is returned using the same variable as for input.- __global__ void example_kernel(...) { // specialize block_histogram for int, logical block of 192 threads, // 2 items per thread and a bin size of 192. using block_histogram_int = rocprim::block_histogram<int, 192, 2, 192>; // allocate storage in shared memory __shared__ block_histogram_int::storage_type storage; __shared__ int hist[192]; int value[2]; ... // execute histogram block_histogram_int().histogram( value, // input hist, // output storage ); ... } 
 - Template Parameters:
- Counter – - [inferred] counter type of histogram. 
- Parameters:
- input – [in] - reference to an array containing thread input values. The function expects each value to satisfy 0 <= input[i] < BINS. 
- hist – [out] - histogram bin count. 
- storage – [in] - reference to a temporary storage object of type storage_type. 
 
 
 - 
template<class Counter>
 __device__ inline void histogram(T (&input)[ItemsPerThread], Counter hist[Bins])#
- Construct a new block-wide histogram. Each thread contributes an array of input elements. - This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. - This overload does not accept storage argument. Required shared memory is allocated by the method itself. 
 - Template Parameters:
- Counter – - [inferred] counter type of histogram. 
- Parameters:
- input – [in] - reference to an array containing thread input values. The function expects each value to satisfy 0 <= input[i] < BINS. 
- hist – [out] - histogram bin count. 
 
 
 
Algorithms#
- 
enum class rocprim::block_histogram_algorithm#
- Available algorithms for block_histogram primitive. - Values: - 
enumerator using_atomic#
- Atomic addition is used to update bin count directly. - Performance Notes:
- Performance is dependent on hardware implementation of atomic addition. 
- Performance may decrease for non-uniform random input distributions where many concurrent updates may be made to the same bin counter. 
 
 
 - 
enumerator using_sort#
- A two-phase operation is used:- - Data is sorted using radix-sort. 
- ”Runs” of same-valued keys are detected using discontinuity; run-lengths are bin counts. - Performance Notes:
 
- Performance is consistent regardless of sample bin distribution. 
 
 - 
enumerator default_algorithm#
- Default block_histogram algorithm. 
 
- 
enumerator using_atomic#