Neural net model compiler & optimizer#

Diagram showing pretrained models compiled for use with MIVisionX runtime

The neural net model compiler & optimizer converts pre-trained neural network models to MIVisionX runtime code for optimized inference. Pre-trained models in ONNX, NNEF, & Caffe formats are supported by the model compiler & optimizer.

The model compiler first converts the pre-trained models to AMD’s internal open format called Neural Net Intermediate Representation (NNIR). Then the optimizer goes through the NNIR and applies various optimizations which allow the model to be deployed onto target hardware most efficiently. Finally, NNIR is converted into OpenVX C code which is compiled and deployed on the targeted AMD hardware.

Diagram showing NNIR conversion steps

MIVisionX runTime#

MIVisionX allows hundreds of different AMD OpenVX, and OpenCV interop vision functions to be directly added into the OpenVX C code generated by the model compiler & optimizer for preprocessing the input to the neural network model and post-processing the model results, allowing users to create an end-to-end solution to be deployed on any targeted AMD hardware.

Diagram showing NNIR conversion steps

Note

Setup the model compiler & optimizer as described in Model compiler installation and configuration

The sample applications available in samples/model_compiler_samples, demonstrate how to run inference efficiently using AMD’s open source implementation of OpenVX and OpenVX extensions. The samples review each step required to convert a pre-trained neural net model into an OpenVX graph and run this graph efficiently on the target hardware.

Model compiler & optimizer usage#

The following steps describe the use of model compiler & optimizer to convert pre-trained neural net Caffe, ONNX, and NNEF models into AMD’s intermediate NNIR format, optimize the NNIR, and then generate the OpenVX code.

Step 1 - Convert Pre-trained model to NNIR#

Caffe#

To convert a pre-trained Caffe model into NNIR model:

% python3 caffe_to_nnir.py <net.caffeModel> <nnirOutputFolder> --input-dims <n,c,h,w> [OPTIONS]

OPTIONS:
    --verbose <0|1> [defualt: 0]
    --node_type_append <0|1> [default: 0; appends node type name to output names

ONNX#

To convert an ONNX model into NNIR model:

% python3 onnx_to_nnir.py <model.onnx> <nnirModelFolder> [OPTIONS]

OPTIONS:
    --input_dims n,c,h,w
    --node_type_append <0|1> [default: 0; appends node type name to output names]

NNEF#

To convert a NNEF model into NNIR model:

% python3 nnef_to_nnir.py <nnefInputFolder> <nnirOutputFolder> [OPTIONS]

OPTIONS:
    --node_type_append <0|1> [default: 0; appends node type name to output names]

Note

If you want to create NNEF models from pre-trained caffe or tensorflow models, use NNEF Converter or try NNEF models at NNEF Model Zoo

Step 2 - Apply Optimizations#

To update batch size in AMD NNIR model:

% python3 nnir_update.py --batch-size <N> <nnirModelFolder> <nnirModelFolderN>

To fuse operations in AMD NNIR model (like batch normalization into convolution):

% python3 nnir_update.py --fuse-ops <1> <nnirModelFolderN> <nnirModelFolderFused>

To quantize the model to float 16

% python3 nnir_update.py --convert-fp16 <1> <nnirModelFolderN> <nnirModelFolderFused>

To workaround groups using slice and concat operations in AMD NNIR model:

% python3 nnir_update.py --slice-groups <1> <nnirModelFolderFused> <nnirModelFolderSliced>

Step 3 - Convert NNIR to OpenVX C code#

To convert an NNIR model into OpenVX C code:

% python3 nnir_to_openvx.py --help

Usage: python nnir_to_openvx.py [OPTIONS] <nnirInputFolder> <outputFolder>

OPTIONS:
    --argmax UINT8                    -- argmax at the end with 8-bit output
    --argmax UINT16                   -- argmax at the end with 16-bit output
    --argmax <fileNamePrefix>rgb.txt  -- argmax at the end with RGB color mapping using LUT
    --argmax <fileNamePrefix>rgba.txt -- argmax at the end with RGBA color mapping using LUT
    --help                            -- show this help message

LUT File Format (RGB): 8-bit R G B values one per each label in text format
    R0 G0 B0
    R1 G1 B1
    ...

LUT File Format (RGBA): 8-bit R G B A values one per each label in text format
    R0 G0 B0 A0
    R1 G1 B1 A1
    ...

Example of model compiler workflow#

The following demonstrates converting a trained Caffe model to NNIR, and then to OpenVX graph.

  • Step 1: Convert net.caffemodel into NNIR model using the following command

% python3 caffe_to_nnir.py <net.caffeModel> <nnirOutputFolder> --input-dims n,c,h,w [--verbose 0|1]
  • Step 2: Compile NNIR model into OpenVX C code with CMakelists.txt for compiling and building inference library

% python3 nnir_to_openvx.py <nnirModelFolder> <nnirModelOutputFolder>
  • Step 3: cmake and make the project inside the nnirModelOutputFolder

% cd nnirModelOutputFolder
% cmake .
% make
  • Step 4: Run anntest application for testing the inference with input and output tensor

% ./anntest weights.bin
  • Step 5: The shared C library (libannmodule.so) can be used in any customer application

Examples for OpenVX C code generation#

Generate OpenVX and test code that can be used dump and compare raw tensor data:

% python3 nnir_to_openvx.py nnirInputFolderFused openvxCodeFolder
% mkdir openvxCodeFolder/build
% cd openvxCodeFolder/build
% cmake ..
% make
% ./anntest

Usage: anntest <weights.bin> [<input-data-file(s)> [<output-data-file(s)>]]<--add ADD> <--multiply MULTIPLY>]

<input-data-file>: is filename to initialize tensor
    .jpg or .png: decode and initialize for 3 channel tensors
        (use %04d in fileName to when batch-size > 1: batch index starts from 0)
    other: initialize tensor with raw data from the file

<output-data-file>[,<reference-for-compare>,<maxErrorLimit>,<rmsErrorLimit>]:
    <referece-to-compare> is raw tensor data for comparision
    <maxErrorLimit> is max absolute error allowed
    <rmsErrorLimit> is max RMS error allowed
    <output-data-file> is filename for saving output tensor data
    '-' to ignore
    other: save raw tensor into the file

<add>: input preprocessing factor [optional - default:[0,0,0]]

<multiply>: input preprocessing factor [optional - default:[1,1,1]]

% ./anntest ../weights.bin input.f32 output.f32,reference.f32,1e-6,1e-9 --add -2.1,-2.07,-1.8 --multiply 0.017,0.017,0.017
...

Generate OpenVX and test code with argmax that can be used dump and compare 16-bit argmax output tensor:

% python3 nnir_to_openvx.py --argmax UINT16 nnirInputFolderFused openvxCodeFolder
% mkdir openvxCodeFolder/build
% cd openvxCodeFolder/build
% cmake ..
% make
% ./anntest

Usage: anntest <weights.bin> [<input-data-file(s)> [<output-data-file(s)>]]]

<input-data-file>: is filename to initialize tensor
    .jpg or .png: decode and initialize for 3 channel tensors
        (use %04d in fileName to when batch-size > 1: batch index starts from 0)
    other: initialize tensor with raw data from the file

<output-data-file>[,<reference-for-compare>,<percentErrorLimit>]:
    <referece-to-compare> is raw tensor data of argmax output for comparision
    <percentMismatchLimit> is max mismatch (percentage) allowed
    <output-data-file> is filename for saving output tensor data
    '-' to ignore
    other: save raw tensor into the file

% ./anntest ../weights.bin input-%04d.png output.u16,reference.u16,0.01
...

Generate OpenVX and test code with argmax and LUT that is designed for semantic segmentation use cases. You can dump output in raw format or PNGs and additionally compare with reference data in raw format.

% python3 nnir_to_openvx.py --argmax lut-rgb.txt nnirInputFolderFused openvxCodeFolder
% mkdir openvxCodeFolder/build
% cd openvxCodeFolder/build
% cmake ..
% make
% ./anntest

Usage: anntest <weights.bin> [<input-data-file(s)> [<output-data-file(s)>]]]

<input-data-file>: is filename to initialize tensor
    .jpg or .png: decode and initialize for 3 channel tensors
        (use %04d in fileName to when batch-size > 1: batch index starts from 0)
    other: initialize tensor with raw data from the file

<output-data-file>[,<reference-for-compare>,<percentErrorLimit>]:
    <referece-to-compare> is raw tensor data of LUT output for comparision
    <percentMismatchLimit> is max mismatch (percentage) allowed
    <output-data-file> is filename for saving output tensor data
    .png: save LUT output as PNG file(s)
        (use %04d in fileName when batch-size > 1: batch index starts from 0)
    '-' to ignore
    other: save raw tensor into the file

% ./anntest ../weights.bin input-%04d.png output.rgb,reference.rgb,0.01
...
% ./anntest ../weights.bin input-%04d.png output-%04d.png,reference.rgb,0.01
...

Test code with preprocessing add / multiply values to normalize the input tensor. Some models(e.g. Inception v4) require input tensor to be normalized. You can pass the preprocessing values using –add & –multiply option.

% ./anntest ../weights.bin input.f32 output.f32 --add -2.1,-2.07,-1.8 --multiply 0.017,0.017,0.017
...

Supported models & operators#

The following tables list the models and operators supported by different frameworks in the current release of MIVisionX.

Models#

Shows list of neural net models

Networks

Caffe

ONNX

NNEF

AlexNet

Blue Square

Blue Square

Caffenet

Blue Square

DenseNet

Blue Square

Googlenet

Blue Square

Blue Square

Blue Square

Inception-V1

Blue Square

Blue Square

Inception-V2

Blue Square

Blue Square

Inception-V3

Inception-V4

Blue Square

MNIST

Blue Square

Blue Square

Mobilenet

Blue Square

Blue Square

MobilenetV2

Blue Square

ResNet-18

Blue Square

ResNet-34

Blue Square

ResNet-50

Blue Square

Blue Square

Blue Square

ResNet-101

Blue Square

Blue Square

ResNet-152

Blue Square

Blue Square

ResNetV2-18

Blue Square

ResNetV2-34

Blue Square

ResNetV2-50

Blue Square

ResNetV2-101

Blue Square

Squeezenet

Blue Square

Blue Square

Tiny-Yolo-V2

Blue Square

VGGNet-16

Blue Square

Blue Square

VGGNet-19

Blue Square

Blue Square

Blue Square

Yolo-V3

Blue Square

ZFNet

Blue Square

Note

MIVisionX supports ONNX models with release 1.1 and release 1.3 tags

Operators#

Layers

Caffe

ONNX

NNEF

Add

Blue Square

Blue Square

Argmax

Blue Square

Blue Square

AveragePool

Blue Square

Blue Square

BatchNormalization

Blue Square

Blue Square

Blue Square

Cast

Blue Square

Clamp

Blue Square

Clip

Blue Square

Concat

Blue Square

Blue Square

Blue Square

Constant

Blue Square

Conv

Blue Square

Blue Square

Blue Square

ConvTranspose

Blue Square

Blue Square

Blue Square

Copy

Blue Square

Blue Square

Crop

Blue Square

CropAndResize

Deconv

Blue Square

Blue Square

Blue Square

DetectionOutput

Blue Square

Div

Blue Square

Blue Square

Dropout

Eltwise

Blue Square

Exp

Blue Square

Blue Square

Equal

Blue Square

Flatten

Blue Square

Gather

Blue Square

GEMM

Blue Square

Blue Square

Blue Square

GlobalAveragePool

Blue Square

Blue Square

Greater

Blue Square

GreaterOrEqual

Blue Square

InnerProduct

Blue Square

Interp

Blue Square

LeakyRelu

Blue Square

Blue Square

Less

Blue Square

LessOrEqual

Blue Square

Linear

Blue Square

Log

Blue Square

Blue Square

LRN

Blue Square

Blue Square

Blue Square

Matmul

Blue Square

Blue Square

Max

Blue Square

Blue Square

MaxPool

Blue Square

Blue Square

MeanReduce

Blue Square

Min

Blue Square

Blue Square

Mul

Blue Square

Blue Square

MulAdd

NonMaxSuppression

Blue Square

Permute

Blue Square

Blue Square

PriorBox

Blue Square

ReduceMin

Blue Square

Relu

Blue Square

Blue Square

Blue Square

Reshape

Blue Square

Blue Square

Blue Square

Shape

Blue Square

Sigmoid

Blue Square

Blue Square

Slice

Blue Square

Blue Square

Split

Blue Square

Softmax

Blue Square

Blue Square

Blue Square

SoftmaxWithLoss||blue-sq| ||

Squeeze

Blue Square

Blue Square

Sub

Blue Square

Blue Square

Sum

Blue Square

Tile

Blue Square

TopK

Blue Square

Transpose

Blue Square

Blue Square

Unsqueeze

Blue Square

Blue Square

Upsample

Blue Square

Blue Square