MIGraphX Driver#
2024-03-08
6 min read time
read#
Loads and prints input graph.
- <input file>#
File to load
- --model [resnet50|inceptionv3|alexnet]#
Load model
- --onnx#
Load as onnx
- --tf#
Load as tensorflow
- --migraphx#
Load as MIGraphX
- --migraphx-json#
Load as MIGraphX JSON
- --batch [unsigned int] (Default: 1)#
For a static model, set batch size. For a dynamic batch model, sets the batch size at runtime.
- --nhwc#
Treat tensorflow format as nhwc
- --skip-unknown-operators#
Skip unknown operators when parsing and continue to parse.
- --nchw#
Treat tensorflow format as nchw
- --trim, -t [unsigned int]#
Trim instructions from the end (Default: 0)
- --input-dim [std::vector<std::string>]#
Dim of a parameter (format: “@name d1 d2 dn”)
- --dyn-input-dim [std::vector<std::string>]#
Set dynamic dimensions of a parameter using JSON formatting (format “@name” “dynamic_dimension_json”)
- --default-dyn-dim#
Set the default dynamic dimension (format {min:x, max:y, optimals:[o1,o2,…]})
- --optimize, -O#
Optimize when reading
- --apply-pass, -p#
Passes to apply to model
- --graphviz, -g#
Print out a graphviz representation.
- --brief#
Make the output brief.
- --cpp#
Print out the program as cpp program.
- --json#
Print out program as json.
- --text#
Print out program in text format.
- --binary#
Print out program in binary format.
- --py#
Print out program using python API.
- --output, -o [std::string]#
Output to file.
compile#
Compiles and prints input graph.
- --fill0 [std::vector<std::string>]#
Fill parameter with 0s
- --fill1 [std::vector<std::string>]#
Fill parameter with 1s
- --gpu#
Compile on the gpu
- --cpu#
Compile on the cpu
- --ref#
Compile on the reference implementation
- --enable-offload-copy#
Enable implicit offload copying
- --disable-fast-math#
Disable fast math optimization
- --exhaustive-tune#
Perform an exhaustive search to find the fastest version of generated kernels for selected backend
- --fp16#
Quantize for fp16
- --int8#
Quantize for int8
- --fp8#
Quantize for Float8E4M3FNUZ type
run#
Loads and prints input graph.
- --fill0 [std::vector<std::string>]#
Fill parameter with 0s
- --fill1 [std::vector<std::string>]#
Fill parameter with 1s
- --gpu#
Compile on the gpu
- --cpu#
Compile on the cpu
- --ref#
Compile on the reference implementation
- --enable-offload-copy#
Enable implicit offload copying
- --disable-fast-math#
Disable fast math optimization
- --exhaustive-tune#
Perform an exhaustive search to find the fastest version of generated kernels for selected backend
- --fp16#
Quantize for fp16
- --int8#
Quantize for int8
- --fp8#
Quantize for Float8E4M3FNUZ type
perf#
Compiles and runs input graph then prints performance report.
- --fill0 [std::vector<std::string>]#
Fill parameter with 0s
- --fill1 [std::vector<std::string>]#
Fill parameter with 1s
- --gpu#
Compile on the gpu
- --cpu#
Compile on the cpu
- --ref#
Compile on the reference implementation
- --enable-offload-copy#
Enable implicit offload copying
- --disable-fast-math#
Disable fast math optimization
- --exhaustive-tune#
Perform an exhaustive search to find the fastest version of generated kernels for selected backend
- --fp16#
Quantize for fp16
- --int8#
Quantize for int8
- --fp8#
Quantize for Float8E4M3FNUZ type
- --iterations, -n [unsigned int]#
Number of iterations to run for perf report (Default: 100)
verify#
Runs reference and CPU or GPU implementations and checks outputs for consistency.
- --fill0 [std::vector<std::string>]#
Fill parameter with 0s
- --fill1 [std::vector<std::string>]#
Fill parameter with 1s
- --gpu#
Compile on the gpu
- --cpu#
Compile on the cpu
- --ref#
Compile on the reference implementation
- --enable-offload-copy#
Enable implicit offload copying
- --disable-fast-math#
Disable fast math optimization
- --exhaustive-tune#
Perform an exhaustive search to find the fastest version of generated kernels for selected backend
- --fp16#
Quantize for fp16
- --int8#
Quantize for int8
- --fp8#
Quantize for Float8E4M3FNUZ type
- --rms-tol [double]#
Tolerance for RMS error (Default: 0.001)
- --atol [double]#
Tolerance for elementwise absolute difference (Default: 0.001)
- --rtol [double]#
Tolerance for elementwise relative difference (Default: 0.001)
- -i, --per-instruction#
Verify each instruction
- -r, --reduce#
Reduce program and verify
roctx#
Provides marker information for each operation, allowing MIGraphX to be used with rocprof
for performance analysis.
This allows user to get GPU-level kernel timing information.
An example command line combined with rocprof
for tracing purposes is given below:
/opt/rocm/bin/rocprof --hip-trace --roctx-trace --flush-rate 1ms --timestamp on -d <OUTPUT_PATH> --obj-tracking on /opt/rocm/bin/migraphx-driver roctx <ONNX_FILE> <MIGRAPHX_OPTIONS>
After rocprof
is run, the output directory will contain trace information for HIP, HCC and ROCTX in seperate .txt
files.
To understand the interactions between API calls, it is recommended to utilize roctx.py
helper script as desribed in dev/tools:rocTX section.
- --fill0 [std::vector<std::string>]#
Fill parameter with 0s
- --fill1 [std::vector<std::string>]#
Fill parameter with 1s
- --gpu#
Compile on the gpu
- --cpu#
Compile on the cpu
- --ref#
Compile on the reference implementation
- --enable-offload-copy#
Enable implicit offload copying
- --disable-fast-math#
Disable fast math optimization
- --exhaustive-tune#
Perform an exhaustive search to find the fastest version of generated kernels for selected backend
- --fp16#
Quantize for fp16
- --int8#
Quantize for int8
- --fp8#
Quantize for Float8E4M3FNUZ type