Adjacent difference#
Configuring the kernel#
-
template<unsigned int BlockSize, unsigned int ItemsPerThread, block_load_method BlockLoadMethod = block_load_method::block_load_transpose, block_store_method BlockStoreMethod = block_store_method::block_store_transpose, unsigned int SizeLimit = std::numeric_limits<unsigned int>::max()>
struct adjacent_difference_config : public rocprim::detail::adjacent_difference_config_params# Configuration of device-level adjacent difference primitives.
- Template Parameters:
BlockSize – - number of threads in a block.
ItemsPerThread – - number of items processed by each thread.
BlockLoadMethod – - method for loading input values.
BlockStoreMethod – - method for storing values.
SizeLimit – - limit on the number of items for a single adjacent difference kernel launch.
Subclassed by rocprim::detail::default_adjacent_difference_config< arch, value_type, enable >, rocprim::detail::default_adjacent_difference_inplace_config< arch, value_type, enable >
left#
-
template<typename Config = default_config, typename InputIt, typename OutputIt, typename BinaryFunction = ::rocprim::minus<>>
hipError_t rocprim::adjacent_difference(void *const temporary_storage, std::size_t &storage_size, const InputIt input, const OutputIt output, const std::size_t size, const BinaryFunction op = BinaryFunction{}, const hipStream_t stream = 0, const bool debug_synchronous = false)# Parallel primitive for applying a binary operation across pairs of consecutive elements in device accessible memory. Writes the output to the position of the left item.
Copies the first item to the output then performs calls the supplied operator with each pair of neighboring elements and writes its result to the location of the second element. Equivalent to the following code
output[0] = input[0]; for(std::size_t int i = 1; i < size; ++i) { output[i] = op(input[i], input[i - 1]); }
- Example
In this example a device-level adjacent_difference operation is performed on integer values.
#include <rocprim/rocprim.hpp> //or <rocprim/device/device_adjacent_difference.hpp> // custom binary function auto binary_op = [] __device__ (int a, int b) -> int { return a - b; }; // Prepare input and output (declare pointers, allocate device memory etc.) std::size_t size; // e.g., 8 int* input1; // e.g., [8, 7, 6, 5, 4, 3, 2, 1] int* output; // empty array of 8 elements std::size_t temporary_storage_size_bytes; void* temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::adjacent_difference( temporary_storage_ptr, temporary_storage_size_bytes, input, output, size, binary_op ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform adjacent difference rocprim::adjacent_difference( temporary_storage_ptr, temporary_storage_size_bytes, input, output, size, binary_op ); // output: [8, 1, 1, 1, 1, 1, 1, 1]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
adjacent_difference_config
or a class derived from it.InputIt – [inferred] random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
OutputIt – [inferred] random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
BinaryFunction – [inferred] binary operation function object that will be applied to consecutive items. The signature of the function should be equivalent to the following:
U f(const T1& a, const T2& b)
. The signature does not need to haveconst &
, but function object must not modify the object passed to it
- Parameters:
temporary_storage – pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the scan operationstorage_size – reference to a size (in bytes) of
temporary_storage
input – iterator to the input range
output – iterator to the output range, must not have any overlap with input.
size – number of items in the input
op – [optional] the binary operation to apply
stream – [optional] HIP stream object. Default is
0
(the default stream)debug_synchronous – [optional] If true, synchronization after every kernel launch is forced in order to check for errors and extra debugging info is printed to the standard output. Default value is
false
- Returns:
hipSuccess
(0) after successful scan, otherwise the HIP runtime error of typehipError_t
left, inplace#
-
template<typename Config = default_config, typename InputIt, typename BinaryFunction = ::rocprim::minus<>>
hipError_t rocprim::adjacent_difference_inplace(void *const temporary_storage, std::size_t &storage_size, const InputIt values, const std::size_t size, const BinaryFunction op = BinaryFunction{}, const hipStream_t stream = 0, const bool debug_synchronous = false)# Parallel primitive for applying a binary operation across pairs of consecutive elements in device accessible memory. Writes the output to the position of the left item in place.
Copies the first item to the output then performs calls the supplied operator with each pair of neighboring elements and writes its result to the location of the second element. Equivalent to the following code
for(std::size_t int i = size - 1; i > 0; --i) { input[i] = op(input[i], input[i - 1]); }
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
adjacent_difference_config
or a class derived from it.InputIt – [inferred] random-access iterator type of the value range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
BinaryFunction – [inferred] binary operation function object that will be applied to consecutive items. The signature of the function should be equivalent to the following:
U f(const T1& a, const T2& b)
. The signature does not need to haveconst &
, but function object must not modify the object passed to it
- Parameters:
temporary_storage – pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the scan operationstorage_size – reference to a size (in bytes) of
temporary_storage
values – iterator to the range values, will be overwritten with the results
size – number of items in the input
op – [optional] the binary operation to apply
stream – [optional] HIP stream object. Default is
0
(the default stream)debug_synchronous – [optional] If true, synchronization after every kernel launch is forced in order to check for errors and extra debugging info is printed to the standard output. Default value is
false
- Returns:
hipSuccess
(0) after successful scan, otherwise the HIP runtime error of typehipError_t
left, aliased#
-
template<typename Config = default_config, typename InputIt, typename OutputIt, typename BinaryFunction = ::rocprim::minus<>>
hipError_t rocprim::adjacent_difference_inplace(void *const temporary_storage, std::size_t &storage_size, const InputIt input, const OutputIt output, const std::size_t size, const BinaryFunction op = BinaryFunction{}, const hipStream_t stream = 0, const bool debug_synchronous = false)# Parallel primitive for applying a binary operation across pairs of consecutive elements in device accessible memory. Writes the output to the position of the left item.
Note
This function has to perform an extra copy due to (potentially) writing its values in-place. If it is known that
input
andoutput
don’t overlap then adjacent_difference should be preferred as it avoids this extra copy.- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
adjacent_difference_config
or a class derived from it.InputIt – [inferred] random-access iterator type of the value range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
OutputIt – [inferred] random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
BinaryFunction – [inferred] binary operation function object that will be applied to consecutive items. The signature of the function should be equivalent to the following:
U f(const T1& a, const T2& b)
. The signature does not need to haveconst &
, but function object must not modify the object passed to it
- Parameters:
temporary_storage – pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the scan operationstorage_size – reference to a size (in bytes) of
temporary_storage
input – iterator to the range values
output – iterator to the output range. Allowed to point to the same elements as
input
. Only complete overlap or no overlap at all is allowed betweeninput
andoutput
. In other words writing tooutput[i]
is only allowed to overwriteinput[i]
, any other element must not be changed.size – number of items in the input
op – [optional] the binary operation to apply
stream – [optional] HIP stream object. Default is
0
(the default stream)debug_synchronous – [optional] If true, synchronization after every kernel launch is forced in order to check for errors and extra debugging info is printed to the standard output. Default value is
false
- Returns:
hipSuccess
(0) on success, otherwise the HIP runtime error of typehipError_t
right#
-
template<typename Config = default_config, typename InputIt, typename OutputIt, typename BinaryFunction = ::rocprim::minus<>>
hipError_t rocprim::adjacent_difference_right(void *const temporary_storage, std::size_t &storage_size, const InputIt input, const OutputIt output, const std::size_t size, const BinaryFunction op = BinaryFunction{}, const hipStream_t stream = 0, const bool debug_synchronous = false)# Parallel primitive for applying a binary operation across pairs of consecutive elements in device accessible memory. Writes the output to the position of the right item.
Copies the last item to the output then performs calls the supplied operator with each pair of neighboring elements and writes its result to the location of the first element. Equivalent to the following code
output[size - 1] = input[size - 1]; for(std::size_t int i = 0; i < size - 1; ++i) { output[i] = op(input[i], input[i + 1]); }
- Example
In this example a device-level adjacent_difference operation is performed on integer values.
#include <rocprim/rocprim.hpp> //or <rocprim/device/device_adjacent_difference.hpp> // custom binary function auto binary_op = [] __device__ (int a, int b) -> int { return a - b; }; // Prepare input and output (declare pointers, allocate device memory etc.) std::size_t size; // e.g., 8 int* input1; // e.g., [1, 2, 3, 4, 5, 6, 7, 8] int* output; // empty array of 8 elements std::size_t temporary_storage_size_bytes; void* temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::adjacent_difference_right( temporary_storage_ptr, temporary_storage_size_bytes, input, output, size, binary_op ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform adjacent difference rocprim::adjacent_difference_right( temporary_storage_ptr, temporary_storage_size_bytes, input, output, size, binary_op ); // output: [1, 1, 1, 1, 1, 1, 1, 8]
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
adjacent_difference_config
or a class derived from it.InputIt – [inferred] random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
OutputIt – [inferred] random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
BinaryFunction – [inferred] binary operation function object that will be applied to consecutive items. The signature of the function should be equivalent to the following:
U f(const T1& a, const T2& b)
. The signature does not need to haveconst &
, but function object must not modify the object passed to it
- Parameters:
temporary_storage – pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the scan operationstorage_size – reference to a size (in bytes) of
temporary_storage
input – iterator to the input range
output – iterator to the output range, must not have any overlap with input.
size – number of items in the input
op – [optional] the binary operation to apply
stream – [optional] HIP stream object. Default is
0
(the default stream)debug_synchronous – [optional] If true, synchronization after every kernel launch is forced in order to check for errors and extra debugging info is printed to the standard output. Default value is
false
- Returns:
hipSuccess
(0) after successful scan, otherwise the HIP runtime error of typehipError_t
right, inplace#
-
template<typename Config = default_config, typename InputIt, typename BinaryFunction = ::rocprim::minus<>>
hipError_t rocprim::adjacent_difference_right_inplace(void *const temporary_storage, std::size_t &storage_size, const InputIt values, const std::size_t size, const BinaryFunction op = BinaryFunction{}, const hipStream_t stream = 0, const bool debug_synchronous = false)# Parallel primitive for applying a binary operation across pairs of consecutive elements in device accessible memory. Writes the output to the position of the right item in place.
Copies the last item to the output then performs calls the supplied operator with each pair of neighboring elements and writes its result to the location of the first element. Equivalent to the following code
for(std::size_t int i = 0; i < size - 1; --i) { input[i] = op(input[i], input[i + 1]); }
- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
adjacent_difference_config
or a class derived from it.InputIt – [inferred] random-access iterator type of the value range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
BinaryFunction – [inferred] binary operation function object that will be applied to consecutive items. The signature of the function should be equivalent to the following:
U f(const T1& a, const T2& b)
. The signature does not need to haveconst &
, but function object must not modify the object passed to it
- Parameters:
temporary_storage – pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the scan operationstorage_size – reference to a size (in bytes) of
temporary_storage
values – iterator to the range values, will be overwritten with the results
size – number of items in the input
op – [optional] the binary operation to apply
stream – [optional] HIP stream object. Default is
0
(the default stream)debug_synchronous – [optional] If true, synchronization after every kernel launch is forced in order to check for errors and extra debugging info is printed to the standard output. Default value is
false
- Returns:
hipSuccess
(0) after successful scan, otherwise the HIP runtime error of typehipError_t
right, aliased#
-
template<typename Config = default_config, typename InputIt, typename OutputIt, typename BinaryFunction = ::rocprim::minus<>>
hipError_t rocprim::adjacent_difference_right_inplace(void *const temporary_storage, std::size_t &storage_size, const InputIt input, const OutputIt output, const std::size_t size, const BinaryFunction op = BinaryFunction{}, const hipStream_t stream = 0, const bool debug_synchronous = false)# Parallel primitive for applying a binary operation across pairs of consecutive elements in device accessible memory. Writes the output to the position of the right item.
Note
This function has to perform an extra copy due to (potentially) writing its values in-place. If it is known that
input
andoutput
don’t overlap then adjacent_difference_right should be preferred as it avoids this extra copy.- Template Parameters:
Config – [optional] configuration of the primitive. It has to be
adjacent_difference_config
or a class derived from it.InputIt – [inferred] random-access iterator type of the value range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
OutputIt – [inferred] random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
BinaryFunction – [inferred] binary operation function object that will be applied to consecutive items. The signature of the function should be equivalent to the following:
U f(const T1& a, const T2& b)
. The signature does not need to haveconst &
, but function object must not modify the object passed to it
- Parameters:
temporary_storage – pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the scan operationstorage_size – reference to a size (in bytes) of
temporary_storage
input – iterator to the range values, will be overwritten with the results
output – iterator to the output range. Allowed to point to the same elements as
input
. Only complete overlap or no overlap at all is allowed betweeninput
andoutput
. In other words writing tooutput[i]
is only allowed to overwriteinput[i]
, any other element must not be changed.size – number of items in the input
op – [optional] the binary operation to apply
stream – [optional] HIP stream object. Default is
0
(the default stream)debug_synchronous – [optional] If true, synchronization after every kernel launch is forced in order to check for errors and extra debugging info is printed to the standard output. Default value is
false
- Returns:
hipSuccess
(0) on success, otherwise the HIP runtime error of typehipError_t