Adjacent difference#

Configuring the kernel#

template<unsigned int BlockSize, unsigned int ItemsPerThread, block_load_method BlockLoadMethod = block_load_method::block_load_transpose, block_store_method BlockStoreMethod = block_store_method::block_store_transpose, unsigned int SizeLimit = std::numeric_limits<unsigned int>::max()>
struct adjacent_difference_config : public rocprim::detail::adjacent_difference_config_params#

Configuration of device-level adjacent difference primitives.

Template Parameters:
  • BlockSize – - number of threads in a block.

  • ItemsPerThread – - number of items processed by each thread.

  • BlockLoadMethod – - method for loading input values.

  • BlockStoreMethod – - method for storing values.

  • SizeLimit – - limit on the number of items for a single adjacent difference kernel launch.

Subclassed by rocprim::detail::default_adjacent_difference_config< arch, value_type, enable >, rocprim::detail::default_adjacent_difference_inplace_config< arch, value_type, enable >

left#

template<typename Config = default_config, typename InputIt, typename OutputIt, typename BinaryFunction = ::rocprim::minus<>>
hipError_t rocprim::adjacent_difference(void *const temporary_storage, std::size_t &storage_size, const InputIt input, const OutputIt output, const std::size_t size, const BinaryFunction op = BinaryFunction{}, const hipStream_t stream = 0, const bool debug_synchronous = false)#

Parallel primitive for applying a binary operation across pairs of consecutive elements in device accessible memory. Writes the output to the position of the left item.

Copies the first item to the output then performs calls the supplied operator with each pair of neighboring elements and writes its result to the location of the second element. Equivalent to the following code

output[0] = input[0];
for(std::size_t int i = 1; i < size; ++i)
{
    output[i] = op(input[i], input[i - 1]);
}

Example

In this example a device-level adjacent_difference operation is performed on integer values.

#include <rocprim/rocprim.hpp> //or <rocprim/device/device_adjacent_difference.hpp>

// custom binary function
auto binary_op =
    [] __device__ (int a, int b) -> int
    {
        return a - b;
    };

// Prepare input and output (declare pointers, allocate device memory etc.)
std::size_t size; // e.g., 8
int* input1; // e.g., [8, 7, 6, 5, 4, 3, 2, 1]
int* output; // empty array of 8 elements

std::size_t temporary_storage_size_bytes;
void* temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::adjacent_difference(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, size, binary_op
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform adjacent difference
rocprim::adjacent_difference(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, output, size, binary_op
);
// output: [8, 1, 1, 1, 1, 1, 1, 1]

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be adjacent_difference_config or a class derived from it.

  • InputIt – [inferred] random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • OutputIt – [inferred] random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • BinaryFunction – [inferred] binary operation function object that will be applied to consecutive items. The signature of the function should be equivalent to the following: U f(const T1& a, const T2& b). The signature does not need to have const &, but function object must not modify the object passed to it

Parameters:
  • temporary_storage – pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the scan operation

  • storage_size – reference to a size (in bytes) of temporary_storage

  • input – iterator to the input range

  • output – iterator to the output range, must not have any overlap with input.

  • size – number of items in the input

  • op – [optional] the binary operation to apply

  • stream – [optional] HIP stream object. Default is 0 (the default stream)

  • debug_synchronous – [optional] If true, synchronization after every kernel launch is forced in order to check for errors and extra debugging info is printed to the standard output. Default value is false

Returns:

hipSuccess (0) after successful scan, otherwise the HIP runtime error of type hipError_t

left, inplace#

template<typename Config = default_config, typename InputIt, typename BinaryFunction = ::rocprim::minus<>>
hipError_t rocprim::adjacent_difference_inplace(void *const temporary_storage, std::size_t &storage_size, const InputIt values, const std::size_t size, const BinaryFunction op = BinaryFunction{}, const hipStream_t stream = 0, const bool debug_synchronous = false)#

Parallel primitive for applying a binary operation across pairs of consecutive elements in device accessible memory. Writes the output to the position of the left item in place.

Copies the first item to the output then performs calls the supplied operator with each pair of neighboring elements and writes its result to the location of the second element. Equivalent to the following code

for(std::size_t int i = size - 1; i > 0; --i)
{
    input[i] = op(input[i], input[i - 1]);
}

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be adjacent_difference_config or a class derived from it.

  • InputIt – [inferred] random-access iterator type of the value range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • BinaryFunction – [inferred] binary operation function object that will be applied to consecutive items. The signature of the function should be equivalent to the following: U f(const T1& a, const T2& b). The signature does not need to have const &, but function object must not modify the object passed to it

Parameters:
  • temporary_storage – pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the scan operation

  • storage_size – reference to a size (in bytes) of temporary_storage

  • values – iterator to the range values, will be overwritten with the results

  • size – number of items in the input

  • op – [optional] the binary operation to apply

  • stream – [optional] HIP stream object. Default is 0 (the default stream)

  • debug_synchronous – [optional] If true, synchronization after every kernel launch is forced in order to check for errors and extra debugging info is printed to the standard output. Default value is false

Returns:

hipSuccess (0) after successful scan, otherwise the HIP runtime error of type hipError_t

left, aliased#

template<typename Config = default_config, typename InputIt, typename OutputIt, typename BinaryFunction = ::rocprim::minus<>>
hipError_t rocprim::adjacent_difference_inplace(void *const temporary_storage, std::size_t &storage_size, const InputIt input, const OutputIt output, const std::size_t size, const BinaryFunction op = BinaryFunction{}, const hipStream_t stream = 0, const bool debug_synchronous = false)#

Parallel primitive for applying a binary operation across pairs of consecutive elements in device accessible memory. Writes the output to the position of the left item.

Note

This function has to perform an extra copy due to (potentially) writing its values in-place. If it is known that input and output don’t overlap then adjacent_difference should be preferred as it avoids this extra copy.

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be adjacent_difference_config or a class derived from it.

  • InputIt – [inferred] random-access iterator type of the value range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • OutputIt – [inferred] random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • BinaryFunction – [inferred] binary operation function object that will be applied to consecutive items. The signature of the function should be equivalent to the following: U f(const T1& a, const T2& b). The signature does not need to have const &, but function object must not modify the object passed to it

Parameters:
  • temporary_storage – pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the scan operation

  • storage_size – reference to a size (in bytes) of temporary_storage

  • input – iterator to the range values

  • output – iterator to the output range. Allowed to point to the same elements as input. Only complete overlap or no overlap at all is allowed between input and output. In other words writing to output[i] is only allowed to overwrite input[i], any other element must not be changed.

  • size – number of items in the input

  • op – [optional] the binary operation to apply

  • stream – [optional] HIP stream object. Default is 0 (the default stream)

  • debug_synchronous – [optional] If true, synchronization after every kernel launch is forced in order to check for errors and extra debugging info is printed to the standard output. Default value is false

Returns:

hipSuccess (0) on success, otherwise the HIP runtime error of type hipError_t

right, inplace#

template<typename Config = default_config, typename InputIt, typename BinaryFunction = ::rocprim::minus<>>
hipError_t rocprim::adjacent_difference_right_inplace(void *const temporary_storage, std::size_t &storage_size, const InputIt values, const std::size_t size, const BinaryFunction op = BinaryFunction{}, const hipStream_t stream = 0, const bool debug_synchronous = false)#

Parallel primitive for applying a binary operation across pairs of consecutive elements in device accessible memory. Writes the output to the position of the right item in place.

Copies the last item to the output then performs calls the supplied operator with each pair of neighboring elements and writes its result to the location of the first element. Equivalent to the following code

for(std::size_t int i = 0; i < size - 1; --i)
{
    input[i] = op(input[i], input[i + 1]);
}

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be adjacent_difference_config or a class derived from it.

  • InputIt – [inferred] random-access iterator type of the value range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • BinaryFunction – [inferred] binary operation function object that will be applied to consecutive items. The signature of the function should be equivalent to the following: U f(const T1& a, const T2& b). The signature does not need to have const &, but function object must not modify the object passed to it

Parameters:
  • temporary_storage – pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the scan operation

  • storage_size – reference to a size (in bytes) of temporary_storage

  • values – iterator to the range values, will be overwritten with the results

  • size – number of items in the input

  • op – [optional] the binary operation to apply

  • stream – [optional] HIP stream object. Default is 0 (the default stream)

  • debug_synchronous – [optional] If true, synchronization after every kernel launch is forced in order to check for errors and extra debugging info is printed to the standard output. Default value is false

Returns:

hipSuccess (0) after successful scan, otherwise the HIP runtime error of type hipError_t

right, aliased#

template<typename Config = default_config, typename InputIt, typename OutputIt, typename BinaryFunction = ::rocprim::minus<>>
hipError_t rocprim::adjacent_difference_right_inplace(void *const temporary_storage, std::size_t &storage_size, const InputIt input, const OutputIt output, const std::size_t size, const BinaryFunction op = BinaryFunction{}, const hipStream_t stream = 0, const bool debug_synchronous = false)#

Parallel primitive for applying a binary operation across pairs of consecutive elements in device accessible memory. Writes the output to the position of the right item.

Note

This function has to perform an extra copy due to (potentially) writing its values in-place. If it is known that input and output don’t overlap then adjacent_difference_right should be preferred as it avoids this extra copy.

Template Parameters:
  • Config – [optional] configuration of the primitive. It has to be adjacent_difference_config or a class derived from it.

  • InputIt – [inferred] random-access iterator type of the value range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • OutputIt – [inferred] random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • BinaryFunction – [inferred] binary operation function object that will be applied to consecutive items. The signature of the function should be equivalent to the following: U f(const T1& a, const T2& b). The signature does not need to have const &, but function object must not modify the object passed to it

Parameters:
  • temporary_storage – pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the scan operation

  • storage_size – reference to a size (in bytes) of temporary_storage

  • input – iterator to the range values, will be overwritten with the results

  • output – iterator to the output range. Allowed to point to the same elements as input. Only complete overlap or no overlap at all is allowed between input and output. In other words writing to output[i] is only allowed to overwrite input[i], any other element must not be changed.

  • size – number of items in the input

  • op – [optional] the binary operation to apply

  • stream – [optional] HIP stream object. Default is 0 (the default stream)

  • debug_synchronous – [optional] If true, synchronization after every kernel launch is forced in order to check for errors and extra debugging info is printed to the standard output. Default value is false

Returns:

hipSuccess (0) on success, otherwise the HIP runtime error of type hipError_t