Adjacent difference#
Configuring the kernel#
-
template<unsigned int BlockSize, unsigned int ItemsPerThread, block_load_method LoadMethod = block_load_method::block_load_transpose, block_store_method StoreMethod = block_store_method::block_store_transpose, unsigned int SizeLimit = std::numeric_limits<unsigned int>::max()>
struct adjacent_difference_config : public rocprim::kernel_config<BlockSize, ItemsPerThread, std::numeric_limits<unsigned int>::max()># Configuration of device-level adjacent_difference primitives.
- Template Parameters:
BlockSize – - number of threads in a block.
ItemsPerThread – - number of items processed by each thread
LoadMethod – - method for loading input values
StoreMethod – - method for storing values
SizeLimit – - limit on the number of items for a single adjacent_difference kernel launch. Larger input sizes will be broken up to multiple kernel launches.
left#
-
template<typename Config = default_config, typename InputIt, typename OutputIt, typename BinaryFunction = ::rocprim::minus<>>
hipError_t rocprim::adjacent_difference(void *const temporary_storage, std::size_t &storage_size, const InputIt input, const OutputIt output, const std::size_t size, const BinaryFunction op = BinaryFunction{}, const hipStream_t stream = 0, const bool debug_synchronous = false)# Parallel primitive for applying a binary operation across pairs of consecutive elements in device accessible memory. Writes the output to the position of the left item.
Copies the first item to the output then performs calls the supplied operator with each pair of neighboring elements and writes its result to the location of the second element. Equivalent to the following code
output[0] = input[0]; for(std::size_t int i = 1; i < size; ++i) { output[i] = op(input[i], input[i - 1]); }
- Example
In this example a device-level adjacent_difference operation is performed on integer values.
#include <rocprim/rocprim.hpp> //or <rocprim/device/device_adjacent_difference.hpp> // custom binary function auto binary_op = [] __device__ (int a, int b) -> int { return a - b; }; // Prepare input and output (declare pointers, allocate device memory etc.) std::size_t size; // e.g., 8 int* input1; // e.g., [8, 7, 6, 5, 4, 3, 2, 1] int* output; // empty array of 8 elements std::size_t temporary_storage_size_bytes; void* temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::adjacent_difference( temporary_storage_ptr, temporary_storage_size_bytes, input, output, size, binary_op ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform adjacent difference rocprim::adjacent_difference( temporary_storage_ptr, temporary_storage_size_bytes, input, output, size, binary_op ); // output: [8, 1, 1, 1, 1, 1, 1, 1]
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
adjacent_difference_config
or a class with the same members.InputIt – - [inferred] random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
OutputIt – - [inferred] random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
BinaryFunction – - [inferred] binary operation function object that will be applied to consecutive items. The signature of the function should be equivalent to the following:
U f(const T1& a, const T2& b)
. The signature does not need to haveconst &
, but function object must not modify the object passed to it
- Parameters:
temporary_storage – - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the scan operationstorage_size – - reference to a size (in bytes) of
temporary_storage
input – - iterator to the input range
output – - iterator to the output range, must have any overlap with input
size – - number of items in the input
op – - [optional] the binary operation to apply
stream – - [optional] HIP stream object. Default is
0
(the default stream)debug_synchronous – - [optional] If true, synchronization after every kernel launch is forced in order to check for errors and extra debugging info is printed to the standard output. Default value is
false
- Returns:
hipSuccess
(0) after successful scan, otherwise the HIP runtime error of typehipError_t
left, inplace#
-
template<typename Config = default_config, typename InputIt, typename BinaryFunction = ::rocprim::minus<>>
hipError_t rocprim::adjacent_difference_inplace(void *const temporary_storage, std::size_t &storage_size, const InputIt values, const std::size_t size, const BinaryFunction op = BinaryFunction{}, const hipStream_t stream = 0, const bool debug_synchronous = false)# Parallel primitive for applying a binary operation across pairs of consecutive elements in device accessible memory. Writes the output to the position of the left item in place.
Copies the first item to the output then performs calls the supplied operator with each pair of neighboring elements and writes its result to the location of the second element. Equivalent to the following code
for(std::size_t int i = size - 1; i > 0; --i) { input[i] = op(input[i], input[i - 1]); }
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
adjacent_difference_config
or a class with the same members.InputIt – - [inferred] random-access iterator type of the value range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
BinaryFunction – - [inferred] binary operation function object that will be applied to consecutive items. The signature of the function should be equivalent to the following:
U f(const T1& a, const T2& b)
. The signature does not need to haveconst &
, but function object must not modify the object passed to it
- Parameters:
temporary_storage – - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the scan operationstorage_size – - reference to a size (in bytes) of
temporary_storage
values – - iterator to the range values, will be overwritten with the results
size – - number of items in the input
op – - [optional] the binary operation to apply
stream – - [optional] HIP stream object. Default is
0
(the default stream)debug_synchronous – - [optional] If true, synchronization after every kernel launch is forced in order to check for errors and extra debugging info is printed to the standard output. Default value is
false
- Returns:
hipSuccess
(0) after successful scan, otherwise the HIP runtime error of typehipError_t
right#
-
template<typename Config = default_config, typename InputIt, typename OutputIt, typename BinaryFunction = ::rocprim::minus<>>
hipError_t rocprim::adjacent_difference_right(void *const temporary_storage, std::size_t &storage_size, const InputIt input, const OutputIt output, const std::size_t size, const BinaryFunction op = BinaryFunction{}, const hipStream_t stream = 0, const bool debug_synchronous = false)# Parallel primitive for applying a binary operation across pairs of consecutive elements in device accessible memory. Writes the output to the position of the right item.
Copies the last item to the output then performs calls the supplied operator with each pair of neighboring elements and writes its result to the location of the first element. Equivalent to the following code
output[size - 1] = input[size - 1]; for(std::size_t int i = 0; i < size - 1; ++i) { output[i] = op(input[i], input[i + 1]); }
- Example
In this example a device-level adjacent_difference operation is performed on integer values.
#include <rocprim/rocprim.hpp> //or <rocprim/device/device_adjacent_difference.hpp> // custom binary function auto binary_op = [] __device__ (int a, int b) -> int { return a - b; }; // Prepare input and output (declare pointers, allocate device memory etc.) std::size_t size; // e.g., 8 int* input1; // e.g., [1, 2, 3, 4, 5, 6, 7, 8] int* output; // empty array of 8 elements std::size_t temporary_storage_size_bytes; void* temporary_storage_ptr = nullptr; // Get required size of the temporary storage rocprim::adjacent_difference_right( temporary_storage_ptr, temporary_storage_size_bytes, input, output, size, binary_op ); // allocate temporary storage hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes); // perform adjacent difference rocprim::adjacent_difference_right( temporary_storage_ptr, temporary_storage_size_bytes, input, output, size, binary_op ); // output: [1, 1, 1, 1, 1, 1, 1, 8]
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
adjacent_difference_config
or a class with the same members.InputIt – - [inferred] random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
OutputIt – - [inferred] random-access iterator type of the output range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.
BinaryFunction – - [inferred] binary operation function object that will be applied to consecutive items. The signature of the function should be equivalent to the following:
U f(const T1& a, const T2& b)
. The signature does not need to haveconst &
, but function object must not modify the object passed to it
- Parameters:
temporary_storage – - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the scan operationstorage_size – - reference to a size (in bytes) of
temporary_storage
input – - iterator to the input range
output – - iterator to the output range, must have any overlap with input
size – - number of items in the input
op – - [optional] the binary operation to apply
stream – - [optional] HIP stream object. Default is
0
(the default stream)debug_synchronous – - [optional] If true, synchronization after every kernel launch is forced in order to check for errors and extra debugging info is printed to the standard output. Default value is
false
- Returns:
hipSuccess
(0) after successful scan, otherwise the HIP runtime error of typehipError_t
right, inplace#
-
template<typename Config = default_config, typename InputIt, typename BinaryFunction = ::rocprim::minus<>>
hipError_t rocprim::adjacent_difference_right_inplace(void *const temporary_storage, std::size_t &storage_size, const InputIt values, const std::size_t size, const BinaryFunction op = BinaryFunction{}, const hipStream_t stream = 0, const bool debug_synchronous = false)# Parallel primitive for applying a binary operation across pairs of consecutive elements in device accessible memory. Writes the output to the position of the right item in place.
Copies the last item to the output then performs calls the supplied operator with each pair of neighboring elements and writes its result to the location of the first element. Equivalent to the following code
for(std::size_t int i = 0; i < size - 1; --i) { input[i] = op(input[i], input[i + 1]); }
- Template Parameters:
Config – - [optional] configuration of the primitive. It can be
adjacent_difference_config
or a class with the same members.InputIt – - [inferred] random-access iterator type of the value range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.
BinaryFunction – - [inferred] binary operation function object that will be applied to consecutive items. The signature of the function should be equivalent to the following:
U f(const T1& a, const T2& b)
. The signature does not need to haveconst &
, but function object must not modify the object passed to it
- Parameters:
temporary_storage – - pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to
storage_size
and function returns without performing the scan operationstorage_size – - reference to a size (in bytes) of
temporary_storage
values – - iterator to the range values, will be overwritten with the results
size – - number of items in the input
op – - [optional] the binary operation to apply
stream – - [optional] HIP stream object. Default is
0
(the default stream)debug_synchronous – - [optional] If true, synchronization after every kernel launch is forced in order to check for errors and extra debugging info is printed to the standard output. Default value is
false
- Returns:
hipSuccess
(0) after successful scan, otherwise the HIP runtime error of typehipError_t