Load and Store Callbacks#

rocFFT includes experimental functionality to call user-defined device functions when loading input from global memory at the start of a transform, or when storing output to global memory at the end of a transform.

These user-defined callback functions may be optionally supplied to the library using rocfft_execution_info_set_load_callback() and rocfft_execution_info_set_store_callback().

Device functions supplied as callbacks must load and store element data types that are appropriate for the transform being performed.

Transform type

Load element type

Store element type

Complex-to-complex, half-precision

_Float16_2

_Float16_2

Complex-to-complex, single-precision

float2

float2

Complex-to-complex, double-precision

double2

double2

Real-to-complex, single-precision

float

float2

Real-to-complex, half-precision

_Float16

_Float16_2

Real-to-complex, double-precision

double

double2

Complex-to-real, half-precision

_Float16_2

_Float16

Complex-to-real, single-precision

float2

float

Complex-to-real, double-precision

double2

double

The callback function signatures must match the specifications below.

T load_callback(T* buffer, size_t offset, void* callback_data, void* shared_memory);
void store_callback(T* buffer, size_t offset, T element, void* callback_data, void* shared_memory);

The parameters for the functions are defined as:

  • T: The data type of each element being loaded or stored from the input or output.

  • buffer: Pointer to the input (for load callbacks) or output (for store callbacks) in device memory that was passed to rocfft_execute().

  • offset: The offset of the location being read from or written to. This counts in elements, from the buffer pointer.

  • element: For store callbacks only, the element to be stored.

  • callback_data: A pointer value accepted by rocfft_execution_info_set_load_callback() and rocfft_execution_info_set_store_callback() which is passed through to the callback function.

  • shared_memory: A pointer to an amount of shared memory requested when the callback is set. Shared memory is not supported, and this parameter is always null.

Callback functions are called exactly once for each element being loaded or stored in a transform. Note that multiple kernels may be launched to decompose a transform, which means that separate kernels may call the load and store callbacks for a transform if both are specified.

Callbacks functions are only supported for transforms that do not use planar format for input or output.

Runtime compilation#

rocFFT includes many kernels for common FFT problems. Some plans may require additional kernels aside from what is built in to the library. In these cases, rocFFT will compile optimized kernels for the plan when the plan is created.

Compiled kernels are stored in memory by default and will be reused if they are required again for plans in the same process.

If the ROCFFT_RTC_CACHE_PATH environment variable is set to a writable file location, rocFFT will write compiled kernels to this location. rocFFT will read kernels from this location for plans in other processes that need runtime-compiled kernels. rocFFT will create the specified file if it does not already exist.