How to create a new Custom function using vx_amd_custom extension#

When to use Custom OpenVX Node#

The idea of custom Node is to allow the following

  • To allow the user to plug-in an a kernel which is not yet supported

  • User wants to write an operator based on a third-party library

  • User wants to optimize the functionality by providing a their own implementation or wants to fuse functions

In this tutorial, we will walk you through the process of writing, compiling, and loading a vx_amd_custom lib with a new custom function. For demonstration purposes we will provide a CPU and a GPU implementation for the “CustomCopy” kernel. The implementation copies the input data to the output without any modifications.

Prerequisites#

  • MIVisionX library installed from source

  • Knowledge of OpenVX and how to add a user defined node

  • C++ coding

  • Basic knowledge of CMake

Write the operator definition in custom_lib using predefined APIs in custom_api.h under custom_lib folder#

The Custom shared library (custom_lib) has four main APIs as shown below

// Create Custom Object and retun the handle
customHandle CreateCustom(CustomFunctionType function);

// Setup custom function execution
customStatus_t CustomSetup(customHandle input_handle, customTensorDesc &inputdesc, customTensorDesc &outputdesc, customBackend backend, customStream stream);

// Execute custom function
customStatus_t CustomExecute(customHandle custom_handle, void *input_handle, customTensorDesc &inputdesc, void *output_handle, customTensorDesc &outputdesc);

// Destroy custom execution instance
customStatus_t CustomShutdown(customHandle custom_handle);

  • In the header file only one function is present which is “Copy” for demonstration.

Write the implementation of the function using the custom_template base class#

custom_base class is an abstract class which exposes the API for the custom_operator implemetation. Basically this has the following declarations

class custom_base
{
protected:  
    custom_base() {};
public:
    virtual ~custom_base() {};
    /*!
     \param inputdesc => Input tensor desc
     \param outputdesc => output tensor desc
     \param backend  => backend for the impl
     \param stream  => Output command queue

    */
    virtual customStatus_t Setup(customTensorDesc &inputdesc, customTensorDesc &outputdesc,  customBackend backend, customStream stream, int num_cpu_threads) = 0;
    /*!
     \param input_handle  => memory handle of input tensor
     \param inputdesc => Input tensor desc
     \param output_handle  => memory handle of output tensor
     \param inputdesc => Input tensor desc
    */
    virtual customStatus_t Execute(void *input_handle, customTensorDesc &inputdesc, void *output_handle, customTensorDesc &outputdesc) = 0;
     
    //* Shutdown and release resources */
    virtual customStatus_t Shutdown() = 0;
};

Add new function types for supporting new custom api#

custom_base * CreateCustomClass(CustomFunctionType function) {

    switch(function)
    {
        case Copy:
            return new customCopy();
            break;
        // todo:: add new custom function types here with corresponding implemetation files
        default:
            throw std::runtime_error ("Custom function type is unsupported");
            return nullptr;
    }
}

Provide the implementation of custom function in the derived class for the Operator (here custom_copy_impl.h)#

  • The function just implements a copy operator which simply copies the content of input_tensor into output_tensor. Below code shows the implementation for both CPU and GPU backend using ROCm HIP implementation.

customStatus_t customCopy::Execute(void *input_handle, customTensorDesc &inputdesc, void *output_handle, customTensorDesc &outputdesc)
{
    unsigned size = outputdesc.dims[0] * outputdesc.dims[1] * outputdesc.dims[3] * sizeof(_output_desc.data_type);
    unsigned batch_size = outputdesc.dims[3];
    if (_backend == customBackend::CPU)
    {
        int omp_threads =  (_cpu_num_threads < batch_size)?  _cpu_num_threads: batch_size;
    #pragma omp parallel for num_threads(omp_threads)
        for (size_t i = 0; i < batch_size; i++) {
            unsigned char *src, *dst;
            src = (unsigned char *)input_handle + size*i;
            dst = (unsigned char *)output_handle + size*i;
            memcpy(dst, src, size);
        }
    }else
    {
#if ENABLE_HIP
        for (size_t i = 0; i < batch_size; i++) {
            unsigned char *src, *dst;
            src = (unsigned char *)input_handle + size*i;
            dst = (unsigned char *)output_handle + size*i;
            hipMemcpy(dst, src, size, hipMemcpyDeviceToDevice);
        }
#endif
    }
    return customStatusSuccess;
}

Adding a new operator type in the custom_lib#

  • Add enumeration for the new function type in custom_api.h

  • Add a new class for new custom function in a seperate header file following custom_copy_impl.h

  • Implement the three functions of the class in a seperate .cpp file following custom_copy_impl.cpp

  • Invoke the new class by adding it in the CreateCustomClass() custom_api.cpp

  • Modify CMakeLists.txt to include the implementation file in CUSTOM_LIB_SOURCES

  • Rebuild MIVisionX with the new custom operator. Voila! The new custom function is ready to work from vx_amd_custom extension

Test the new custom operator using runvx utility#

  • Sample gdf for runvx using the “Copy” operator can be found under Readme

  • Modify the gdf approproately to test the new custom functionality