Adjacent difference#

Configuring the kernel#

template<unsigned int BlockSize, unsigned int ItemsPerThread, block_load_method LoadMethod = block_load_method::block_load_transpose, block_store_method StoreMethod = block_store_method::block_store_transpose, unsigned int SizeLimit = std::numeric_limits<unsigned int>::max()>
struct adjacent_difference_config : public rocprim::kernel_config<BlockSize, ItemsPerThread, std::numeric_limits<unsigned int>::max()>#

Configuration of device-level adjacent_difference primitives.

Template Parameters:
  • BlockSize – - number of threads in a block.

  • ItemsPerThread – - number of items processed by each thread

  • LoadMethod – - method for loading input values

  • StoreMethod – - method for storing values

  • SizeLimit – - limit on the number of items for a single adjacent_difference kernel launch. Larger input sizes will be broken up to multiple kernel launches.

left#

template<typename Config = default_config, typename InputIt, typename OutputIt, typename BinaryFunction = ::rocprim::minus<>>
hipError_t rocprim::adjacent_difference(void *const temporary_storage, std::size_t &storage_size, const InputIt input, const OutputIt output, const std::size_t size, const BinaryFunction op = BinaryFunction{}, const hipStream_t stream = 0, const bool debug_synchronous = false)#

Parallel primitive for applying a binary operation across pairs of consecutive elements in device accessible memory. Writes the output to the position of the left item.

Copies the first item to the output then performs calls the supplied operator with each pair of neighboring elements and writes its result to the location of the second element. Equivalent to the following code

output[0] = input[0];
for(std::size_t int i = 1; i < size; ++i)
{
    output[i] = op(input[i], input[i - 1]);
}

Example

In this example a device-level adjacent_difference operation is performed on integer values.

#include <rocprim/rocprim.hpp> //or <rocprim/device/device_adjacent_difference.hpp>

// custom binary function
auto binary_op =
    [] __device__ (int a, int b) -> int
    {
        return a - b;
    };

// Prepare input and output (declare pointers, allocate device memory etc.)
std::size_t size; // e.g., 8
int* input1; // e.g., [8, 7, 6, 5, 4, 3, 2, 1]
int* output; // empty array of 8 elements

std::size_t temporary_storage_size_bytes;
void* temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::adjacent_difference(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, size, binary_op
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform adjacent difference
rocprim::adjacent_difference(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, size, binary_op
);
// output: [8, 1, 1, 1, 1, 1, 1, 1]

Template Parameters:
  • Config – - [optional] configuration of the primitive. It can be adjacent_difference_config or a class with the same members.

  • InputIt – - [inferred] random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • OutputIt – - [inferred] random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • BinaryFunction – - [inferred] binary operation function object that will be applied to consecutive items. The signature of the function should be equivalent to the following: U f(const T1& a, const T2& b). The signature does not need to have const &, but function object must not modify the object passed to it

Parameters:
  • temporary_storage – - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the scan operation

  • storage_size – - reference to a size (in bytes) of temporary_storage

  • input – - iterator to the input range

  • output – - iterator to the output range, must have any overlap with input

  • size – - number of items in the input

  • op – - [optional] the binary operation to apply

  • stream – - [optional] HIP stream object. Default is 0 (the default stream)

  • debug_synchronous – - [optional] If true, synchronization after every kernel launch is forced in order to check for errors and extra debugging info is printed to the standard output. Default value is false

Returns:

hipSuccess (0) after successful scan, otherwise the HIP runtime error of type hipError_t

left, inplace#

template<typename Config = default_config, typename InputIt, typename BinaryFunction = ::rocprim::minus<>>
hipError_t rocprim::adjacent_difference_inplace(void *const temporary_storage, std::size_t &storage_size, const InputIt values, const std::size_t size, const BinaryFunction op = BinaryFunction{}, const hipStream_t stream = 0, const bool debug_synchronous = false)#

Parallel primitive for applying a binary operation across pairs of consecutive elements in device accessible memory. Writes the output to the position of the left item in place.

Copies the first item to the output then performs calls the supplied operator with each pair of neighboring elements and writes its result to the location of the second element. Equivalent to the following code

for(std::size_t int i = size - 1; i > 0; --i)
{
    input[i] = op(input[i], input[i - 1]);
}

Template Parameters:
  • Config – - [optional] configuration of the primitive. It can be adjacent_difference_config or a class with the same members.

  • InputIt – - [inferred] random-access iterator type of the value range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • BinaryFunction – - [inferred] binary operation function object that will be applied to consecutive items. The signature of the function should be equivalent to the following: U f(const T1& a, const T2& b). The signature does not need to have const &, but function object must not modify the object passed to it

Parameters:
  • temporary_storage – - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the scan operation

  • storage_size – - reference to a size (in bytes) of temporary_storage

  • values – - iterator to the range values, will be overwritten with the results

  • size – - number of items in the input

  • op – - [optional] the binary operation to apply

  • stream – - [optional] HIP stream object. Default is 0 (the default stream)

  • debug_synchronous – - [optional] If true, synchronization after every kernel launch is forced in order to check for errors and extra debugging info is printed to the standard output. Default value is false

Returns:

hipSuccess (0) after successful scan, otherwise the HIP runtime error of type hipError_t

right, inplace#

template<typename Config = default_config, typename InputIt, typename BinaryFunction = ::rocprim::minus<>>
hipError_t rocprim::adjacent_difference_right_inplace(void *const temporary_storage, std::size_t &storage_size, const InputIt values, const std::size_t size, const BinaryFunction op = BinaryFunction{}, const hipStream_t stream = 0, const bool debug_synchronous = false)#

Parallel primitive for applying a binary operation across pairs of consecutive elements in device accessible memory. Writes the output to the position of the right item in place.

Copies the last item to the output then performs calls the supplied operator with each pair of neighboring elements and writes its result to the location of the first element. Equivalent to the following code

for(std::size_t int i = 0; i < size - 1; --i)
{
    input[i] = op(input[i], input[i + 1]);
}

Template Parameters:
  • Config – - [optional] configuration of the primitive. It can be adjacent_difference_config or a class with the same members.

  • InputIt – - [inferred] random-access iterator type of the value range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • BinaryFunction – - [inferred] binary operation function object that will be applied to consecutive items. The signature of the function should be equivalent to the following: U f(const T1& a, const T2& b). The signature does not need to have const &, but function object must not modify the object passed to it

Parameters:
  • temporary_storage – - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the scan operation

  • storage_size – - reference to a size (in bytes) of temporary_storage

  • values – - iterator to the range values, will be overwritten with the results

  • size – - number of items in the input

  • op – - [optional] the binary operation to apply

  • stream – - [optional] HIP stream object. Default is 0 (the default stream)

  • debug_synchronous – - [optional] If true, synchronization after every kernel launch is forced in order to check for errors and extra debugging info is printed to the standard output. Default value is false

Returns:

hipSuccess (0) after successful scan, otherwise the HIP runtime error of type hipError_t