Run Length Encode#

Configuring the kernel#

template<class ReduceByKeyConfig, class SelectConfig = default_config>
struct run_length_encode_config#

Configuration of device-level run-length encoding operation.

Template Parameters:

run_length_encode#

template<class Config = default_config, class InputIterator, class UniqueOutputIterator, class CountsOutputIterator, class RunsCountOutputIterator>
inline hipError_t rocprim::run_length_encode(void *temporary_storage, size_t &storage_size, InputIterator input, unsigned int size, UniqueOutputIterator unique_output, CountsOutputIterator counts_output, RunsCountOutputIterator runs_count_output, hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel run-length encoding for device level.

run_length_encode function performs a device-wide run-length encoding of runs (groups) of consecutive values. The first value of each run is copied to unique_output and the length of the run is written to counts_output. The total number of runs is written to runs_count_output.

Overview

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Range specified by input must have at least size elements.

  • Range specified by runs_count_output must have at least 1 element.

  • Ranges specified by unique_output and counts_output must have at least *runs_count_output (i.e. the number of runs) elements.

Example

In this example a device-level run-length encoding operation is performed on an array of integer values.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
int * input;                // e.g., [1, 1, 1, 2, 10, 10, 10, 88]
int * unique_output;        // empty array of at least 4 elements
int * counts_output;        // empty array of at least 4 elements
int * runs_count_output;    // empty array of 1 element

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::run_length_encode(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, input_size,
    unique_output, counts_output, runs_count_output
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform encoding
rocprim::run_length_encode(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, input_size,
    unique_output, counts_output, runs_count_output
);
// unique_output:     [1, 2, 10, 88]
// counts_output:     [3, 1,  3,  1]
// runs_count_output: [4]

Template Parameters:
  • Config – - [optional] configuration of the primitive. It has to be run_length_encode_config or a class derived from it.

  • InputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • UniqueOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • CountsOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • RunsCountOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • input[in] - iterator to the first element in the range of values.

  • size[in] - number of element in the input range.

  • unique_output[out] - iterator to the first element in the output range of unique values.

  • counts_output[out] - iterator to the first element in the output range of lenghts.

  • runs_count_output[out] - iterator to total number of runs.

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful operation; otherwise a HIP runtime error of type hipError_t.

run_length_encode_non_trivial_runs#

template<class Config = default_config, class InputIterator, class OffsetsOutputIterator, class CountsOutputIterator, class RunsCountOutputIterator>
inline hipError_t rocprim::run_length_encode_non_trivial_runs(void *temporary_storage, size_t &storage_size, InputIterator input, unsigned int size, OffsetsOutputIterator offsets_output, CountsOutputIterator counts_output, RunsCountOutputIterator runs_count_output, hipStream_t stream = 0, bool debug_synchronous = false)#

Parallel run-length encoding of non-trivial runs for device level.

run_length_encode_non_trivial_runs function performs a device-wide run-length encoding of non-trivial runs (groups) of consecutive values (groups of more than one element). The offset of the first value of each non-trivial run is copied to offsets_output and the length of the run (the count of elements) is written to counts_output. The total number of non-trivial runs is written to runs_count_output.

Overview

  • Returns the required size of temporary_storage in storage_size if temporary_storage in a null pointer.

  • Range specified by input must have at least size elements.

  • Range specified by runs_count_output must have at least 1 element.

  • Ranges specified by offsets_output and counts_output must have at least *runs_count_output (i.e. the number of non-trivial runs) elements.

Example

In this example a device-level run-length encoding of non-trivial runs is performed on an array of integer values.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t input_size;          // e.g., 8
int * input;                // e.g., [1, 1, 1, 2, 10, 10, 10, 88]
int * offsets_output;       // empty array of at least 2 elements
int * counts_output;        // empty array of at least 2 elements
int * runs_count_output;    // empty array of 1 element

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::run_length_encode_non_trivial_runs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, input_size,
    offsets_output, counts_output, runs_count_output
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform encoding
rocprim::run_length_encode_non_trivial_runs(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, input_size,
    offsets_output, counts_output, runs_count_output
);
// offsets_output:    [0, 4]
// counts_output:     [3, 3]
// runs_count_output: [2]

Template Parameters:
  • Config – - [optional] configuration of the primitive. It has to be run_length_encode_config or a class derived from it.

  • InputIterator – - random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • OffsetsOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • CountsOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • RunsCountOutputIterator – - random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

Parameters:
  • temporary_storage[in] - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the operation.

  • storage_size[inout] - reference to a size (in bytes) of temporary_storage.

  • input[in] - iterator to the first element in the range of values.

  • size[in] - number of element in the input range.

  • offsets_output[out] - iterator to the first element in the output range of offsets.

  • counts_output[out] - iterator to the first element in the output range of lenghts.

  • runs_count_output[out] - iterator to total number of runs.

  • stream[in] - [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous[in] - [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful operation; otherwise a HIP runtime error of type hipError_t.