Debugging and Logging#
Logging#
All logging messages output to standard error stream (stderr
). The following environment variables can be used to control logging:
MIOPEN_ENABLE_LOGGING
- Enables printing the basic layer by layer MIOpen API call information with actual parameters (configurations). Important for debugging. Disabled by default.MIOPEN_ENABLE_LOGGING_CMD
- A user can use this environmental variable to output the associatedMIOpenDriver
command line(s) onto console. Disabled by default.
NOTE 1: These two and other two-state (“boolean”) environment variables can be set to the following values:
1, on, yes, true, enable, enabled - to enable feature 0, off, no, false, disable, disabled - to disable feature
MIOPEN_LOG_LEVEL
- In addition to API call information and driver commands, MIOpen prints various information related to the progress of its internal operations. This information can be useful both for debugging and for understanding the principles of operation of the library. TheMIOPEN_LOG_LEVEL
environment variable controls the verbosity of these messages. Allowed values are:0 - Default. Works as level 4 for Release builds, level 5 for Debug builds.
1 - Quiet. No logging messages.
2 - Fatal errors only (not used yet).
3 - Errors and fatals.
4 - All errors and warnings.
5 - Info. All the above plus information for debugging purposes.
6 - Detailed info. All the above plus more detailed information for debugging.
7 - Trace: the most detailed debugging info plus all above.
NOTE 2: When asking for technical support, please include the console log obtained with the following settings:
export MIOPEN_ENABLE_LOGGING=1 export MIOPEN_ENABLE_LOGGING_CMD=1 export MIOPEN_LOG_LEVEL=6
MIOPEN_ENABLE_LOGGING_MPMT
- When enabled, each log line is prefixed with information which allows the user to identify records printed from different processes and/or threads. Useful for debugging multi-process/multi-threaded apps.MIOPEN_ENABLE_LOGGING_ELAPSED_TIME
- Adds a timestamp to each log line. Indicates the time elapsed since the previous log message, in milliseconds.
Layer Filtering#
The following list of environment variables allow for enabling/disabling various kinds of kernels and algorithms. This can be helpful for both debugging MIOpen and integration with frameworks.
NOTE 3: These variables can be set to the following values:
1, yes, true, enable, enabled - to enable kernels/algorithm 0, no, false, disable, disabled - to disable kernels/algorithm
If a variable is not set, then MIOpen behaves as if it is set to enabled
, unless otherwise specified. So all kinds of kernels/algorithms are enabled by default and the below variables can be used for disabling them.
WARNING: When the library is used with layer filtering, the results of
Find()
calls become narrower than during normal operation. This means that relevant find-db entries would not include some solutions that normally should be there. Therefore the subsequent Immediate modeGet()
calls may return incomplete information or even run into Fallback path.
In order to rehabilitate the Immediate mode, the user can:
Re-enable all solvers and re-run the same
Find()
calls that have been run before,Or, completely remove the User find-db.
Filtering by algorithm#
These variables control the sets (families) of convolution Solutions. For example, Direct algorithm is implemented in several Solutions that use OpenCL, GCN assembly etc. The corresponding variable can disable them all.
MIOPEN_DEBUG_CONV_FFT
- FFT convolution algorithm.MIOPEN_DEBUG_CONV_DIRECT
- Direct convolution algorithm.MIOPEN_DEBUG_CONV_GEMM
- GEMM convolution algorithm.MIOPEN_DEBUG_CONV_WINOGRAD
- Winograd convolution algorithm.MIOPEN_DEBUG_CONV_IMPLICIT_GEMM
- Implicit GEMM convolution algorithm.
Filtering by build method#
MIOPEN_DEBUG_GCN_ASM_KERNELS
- Kernels written in assembly language. Currently these used in many convolutions (some Direct solvers, Winograd kernels, fused convolutions), batch normalization.MIOPEN_DEBUG_HIP_KERNELS
- Convoluton kernels written in HIP (today, all these implement ImplicitGemm algorithm).MIOPEN_DEBUG_OPENCL_CONVOLUTIONS
- Convolution kernels written in OpenCL (note that only convolutions affected).MIOPEN_DEBUG_AMD_ROCM_PRECOMPILED_BINARIES
- Binary kernels. Right now the library does not use binaries.
Filtering out all Solutions except one#
MIOPEN_DEBUG_FIND_ONLY_SOLVER=solution_id
, wheresolution_id
should be either numeric or string identifier of some Solution. Directly affects onlyFind()
calls (however there is some indirect connection to Immediate mode; please see the “Warning” above.)If
solution_id
denotes some applicable Solution, then only that Solution will be found (plus GEMM and FFT, if these applicable, see Note 4).Else, if
solution_id
is valid but not applicable, thenFind()
would fail with all algorithms (again, except GEMM and FFT, see Note 4)Otherwise the
solution_id
is invalid (i.e. it doesn’t match any existing Solution), and theFind()
call would fail.
NOTE 4: This env. variable does not affect the “gemm” and “fft” solutions. For now, GEMM and FFT can be disabled only at algorithm level (see above).
Filtering the Solutions on individual basis#
Some of the Solutions have individual controls available. These affect both Find and Immediate modes. Note the “Warning” above.
Direct Solutions:
MIOPEN_DEBUG_CONV_DIRECT_ASM_3X3U
-ConvAsm3x3U
.MIOPEN_DEBUG_CONV_DIRECT_ASM_1X1U
-ConvAsm1x1U
.MIOPEN_DEBUG_CONV_DIRECT_ASM_1X1UV2
-ConvAsm1x1UV2
.MIOPEN_DEBUG_CONV_DIRECT_ASM_5X10U2V2
-ConvAsm5x10u2v2f1
,ConvAsm5x10u2v2b1
.MIOPEN_DEBUG_CONV_DIRECT_ASM_7X7C3H224W224
-ConvAsm7x7c3h224w224k64u2v2p3q3f1
.MIOPEN_DEBUG_CONV_DIRECT_ASM_WRW3X3
-ConvAsmBwdWrW3x3
.MIOPEN_DEBUG_CONV_DIRECT_ASM_WRW1X1
-ConvAsmBwdWrW1x1
.MIOPEN_DEBUG_CONV_DIRECT_OCL_FWD11X11
-ConvOclDirectFwd11x11
.MIOPEN_DEBUG_CONV_DIRECT_OCL_FWDGEN
-ConvOclDirectFwdGen
.MIOPEN_DEBUG_CONV_DIRECT_OCL_FWD
-ConvOclDirectFwd
.MIOPEN_DEBUG_CONV_DIRECT_OCL_FWD1X1
-ConvOclDirectFwd
.MIOPEN_DEBUG_CONV_DIRECT_OCL_WRW2
-ConvOclBwdWrW2<n>
(where n ={1,2,4,8,16}
), andConvOclBwdWrW2NonTunable
.MIOPEN_DEBUG_CONV_DIRECT_OCL_WRW53
-ConvOclBwdWrW53
.MIOPEN_DEBUG_CONV_DIRECT_OCL_WRW1X1
-ConvOclBwdWrW1x1
Winograd Solutions:
MIOPEN_DEBUG_AMD_WINOGRAD_3X3
-ConvBinWinograd3x3U
, FP32 Winograd Fwd/Bwd, filter size fixed to 3x3.MIOPEN_DEBUG_AMD_WINOGRAD_RXS
-ConvBinWinogradRxS
, FP32/FP16 F(3,3) Fwd/Bwd and FP32 F(3,2) WrW Winograd. Subsets:MIOPEN_DEBUG_AMD_WINOGRAD_RXS_WRW
- FP32 F(3,2) WrW convolutions only.MIOPEN_DEBUG_AMD_WINOGRAD_RXS_FWD_BWD
- FP32/FP16 F(3,3) Fwd/Bwd.
MIOPEN_DEBUG_AMD_WINOGRAD_RXS_F3X2
-ConvBinWinogradRxSf3x2
, FP32/FP16 Fwd/Bwd F(3,2) Winograd.MIOPEN_DEBUG_AMD_WINOGRAD_RXS_F2X3
-ConvBinWinogradRxSf2x3
, FP32/FP16 Fwd/Bwd F(2,3) Winograd, serves group convolutions only.MIOPEN_DEBUG_AMD_WINOGRAD_RXS_F2X3_G1
-ConvBinWinogradRxSf2x3g1
, FP32/FP16 Fwd/Bwd F(2,3) Winograd, for non-group convolutions.Multi-pass Winograd:
MIOPEN_DEBUG_AMD_WINOGRAD_MPASS_F3X2
-ConvWinograd3x3MultipassWrW<3-2>
, WrW F(3,2), stride 2 only.MIOPEN_DEBUG_AMD_WINOGRAD_MPASS_F3X3
-ConvWinograd3x3MultipassWrW<3-3>
, WrW F(3,3), stride 2 only.MIOPEN_DEBUG_AMD_WINOGRAD_MPASS_F3X4
-ConvWinograd3x3MultipassWrW<3-4>
, WrW F(3,4).MIOPEN_DEBUG_AMD_WINOGRAD_MPASS_F3X5
-ConvWinograd3x3MultipassWrW<3-5>
, WrW F(3,5).MIOPEN_DEBUG_AMD_WINOGRAD_MPASS_F3X6
-ConvWinograd3x3MultipassWrW<3-6>
, WrW F(3,6).MIOPEN_DEBUG_AMD_WINOGRAD_MPASS_F5X3
-ConvWinograd3x3MultipassWrW<5-3>
, WrW F(5,3).MIOPEN_DEBUG_AMD_WINOGRAD_MPASS_F5X4
-ConvWinograd3x3MultipassWrW<5-4>
, WrW F(5,4).MIOPEN_DEBUG_AMD_WINOGRAD_MPASS_F7X2
:ConvWinograd3x3MultipassWrW<7-2>
, WrW F(7,2)ConvWinograd3x3MultipassWrW<7-2-1-1>
, WrW F(7x1,2x1)ConvWinograd3x3MultipassWrW<1-1-7-2>
, WrW F(1x7,1x2)
MIOPEN_DEBUG_AMD_WINOGRAD_MPASS_F7X3
:ConvWinograd3x3MultipassWrW<7-3>
, WrW F(7,3)ConvWinograd3x3MultipassWrW<7-3-1-1>
, WrW F(7x1,3x1)ConvWinograd3x3MultipassWrW<1-1-7-3>
, WrW F(1x7,1x3)
MIOPEN_DEBUG_AMD_MP_BD_WINOGRAD_F2X3
-ConvMPBidirectWinograd<2-3>
, FWD/BWD F(2,3)MIOPEN_DEBUG_AMD_MP_BD_WINOGRAD_F3X3
-ConvMPBidirectWinograd<3-3>
, FWD/BWD F(3,3)MIOPEN_DEBUG_AMD_MP_BD_WINOGRAD_F4X3
-ConvMPBidirectWinograd<4-3>
, FWD/BWD F(4,3)MIOPEN_DEBUG_AMD_MP_BD_WINOGRAD_F5X3
-ConvMPBidirectWinograd<5-3>
, FWD/BWD F(5,3)MIOPEN_DEBUG_AMD_MP_BD_WINOGRAD_F6X3
-ConvMPBidirectWinograd<6-3>
, FWD/BWD F(6,3)MIOPEN_DEBUG_AMD_MP_BD_XDLOPS_WINOGRAD_F2X3
-ConvMPBidirectWinograd_xdlops<2-3>
, FWD/BWD F(2,3)MIOPEN_DEBUG_AMD_MP_BD_XDLOPS_WINOGRAD_F3X3
-ConvMPBidirectWinograd_xdlops<3-3>
, FWD/BWD F(3,3)MIOPEN_DEBUG_AMD_MP_BD_XDLOPS_WINOGRAD_F4X3
-ConvMPBidirectWinograd_xdlops<4-3>
, FWD/BWD F(4,3)MIOPEN_DEBUG_AMD_MP_BD_XDLOPS_WINOGRAD_F5X3
-ConvMPBidirectWinograd_xdlops<5-3>
, FWD/BWD F(5,3)MIOPEN_DEBUG_AMD_MP_BD_XDLOPS_WINOGRAD_F6X3
-ConvMPBidirectWinograd_xdlops<6-3>
, FWD/BWD F(6,3)MIOPEN_DEBUG_AMD_MP_BD_WINOGRAD_EXPEREMENTAL_FP16_TRANSFORM -
ConvMPBidirectWinograd*`, FWD/BWD FP16 experemental mode. Disabled by default. This mode is experimental. Use it at your own risk.
MIOPEN_DEBUG_AMD_FUSED_WINOGRAD
- Fused FP32 F(3,3) Winograd, variable filter size.
Implicit GEMM Solutions:
ASM Implicit GEMM
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_ASM_FWD_V4R1
-ConvAsmImplicitGemmV4R1DynamicFwd
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_ASM_FWD_V4R1_1X1
-ConvAsmImplicitGemmV4R1DynamicFwd_1x1
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_ASM_BWD_V4R1
-ConvAsmImplicitGemmV4R1DynamicBwd
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_ASM_WRW_V4R1
-ConvAsmImplicitGemmV4R1DynamicWrw
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_ASM_FWD_GTC_XDLOPS
-ConvAsmImplicitGemmGTCDynamicFwdXdlops
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_ASM_BWD_GTC_XDLOPS
-ConvAsmImplicitGemmGTCDynamicBwdXdlops
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_ASM_WRW_GTC_XDLOPS
-ConvAsmImplicitGemmGTCDynamicWrwXdlops
HIP Implicit GEMM
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_HIP_FWD_V4R1
-ConvHipImplicitGemmV4R1Fwd
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_HIP_FWD_V4R4
-ConvHipImplicitGemmV4R4Fwd
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_HIP_BWD_V1R1
-ConvHipImplicitGemmBwdDataV1R1
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_HIP_BWD_V4R1
-ConvHipImplicitGemmBwdDataV4R1
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_HIP_WRW_V4R1
-ConvHipImplicitGemmV4R1WrW
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_HIP_WRW_V4R4
-ConvHipImplicitGemmV4R4WrW
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_HIP_FWD_V4R4_XDLOPS
-ConvHipImplicitGemmForwardV4R4Xdlops
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_HIP_FWD_V4R5_XDLOPS
-ConvHipImplicitGemmForwardV4R5Xdlops
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_HIP_BWD_V1R1_XDLOPS
-ConvHipImplicitGemmBwdDataV1R1Xdlops
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_HIP_BWD_V4R1_XDLOPS
-ConvHipImplicitGemmBwdDataV4R1Xdlops
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_HIP_WRW_V4R4_XDLOPS
-ConvHipImplicitGemmWrwV4R4Xdlops
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_HIP_FWD_V4R4_PADDED_GEMM_XDLOPS
-ConvHipImplicitGemmForwardV4R4Xdlops_Padded_Gemm
MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_HIP_WRW_V4R4_PADDED_GEMM_XDLOPS
-ConvHipImplicitGemmWrwV4R4Xdlops_Padded_Gemm
rocBlas Logging and Behavior#
The ROCBLAS_LAYER
environmental variable can be set to output GEMM information:
ROCBLAS_LAYER=
- is not set, there is no loggingROCBLAS_LAYER=1
- is set to 1, then there is trace loggingROCBLAS_LAYER=2
- is set to 2, then there is bench loggingROCBLAS_LAYER=3
- is set to 3, then there is both trace and bench logging
Additionally, using environment variable “MIOPEN_GEMM_ENFORCE_BACKEND”, can override the default behavior. The default behavior which is to use both MIOpenGEMM and rocBlas depending on the input configuration:
MIOPEN_GEMM_ENFORCE_BACKEND=1
, use rocBLAS if enabledMIOPEN_GEMM_ENFORCE_BACKEND=2
, use MIOpenGEMM for FP32, use rocBLAS for FP16 if enabledMIOPEN_GEMM_ENFORCE_BACKEND=3
, no gemm will be calledMIOPEN_GEMM_ENFORCE_BACKEND=4
, use MIOpenTensile for FP32, use rocBLAS for FP16 if enabledMIOPEN_GEMM_ENFORCE_BACKEND=<any other value>
, use default behavior
To disable using rocBlas entirely, set the configuration flag -DMIOPEN_USE_ROCBLAS=Off
during MIOpen configuration.
More information on logging with rocBlas can be found here.
Numerical Checking#
MIOpen provides the environmental variable MIOPEN_CHECK_NUMERICS
to allow users to debug potential numerical abnormalities. Setting this variable will scan all inputs and outputs of each kernel called and attempt to detect infinities (infs), not-a-number (NaN), or all zeros. The environment variable has several settings that will help with debugging:
MIOPEN_CHECK_NUMERICS=0x01
: Fully informative, prints results from all checks to consoleMIOPEN_CHECK_NUMERICS=0x02
: Warning information, prints results only if abnormality detectedMIOPEN_CHECK_NUMERICS=0x04
: Throw error on detection, MIOpen execute MIOPEN_THROW on abnormal resultMIOPEN_CHECK_NUMERICS=0x08
: Abort on abnormal result, this will allow users to drop into a debugging sessionMIOPEN_CHECK_NUMERICS=0x10
: Print stats, this will compute and print mean/absmean/min/max (note, this is much slower)
Controlling Parallel Compilation#
MIOpen’s Convolution Find() calls will compile and benchmark a set of solvers
contained in miopenConvAlgoPerf_t
this is done in parallel per miopenConvAlgorithm_t
. Parallelism per algorithm is set to 20 threads. Typically there are far fewer threads spawned due to the limited number of kernels under any given algorithm. The level of parallelism can be controlled using the environment variable MIOPEN_COMPILE_PARALLEL_LEVEL
.
For example, to disable multi-threaded compilation:
export MIOPEN_COMPILE_PARALLEL_LEVEL=1
Experimental controls#
NOTE 5: Using experimental controls may result in:
Performance drops
Computation inaccuracies
Run-time errors
Other kinds of unexpected behavior
It is strongly recommended to use them only with the explicit permission or request of the library developers.
Code Object (CO) version selection (EXPERIMENTAL)#
Different ROCm versions use Code Object files of different versions (or, in other words, formats). The library uses suitable version automatically. The following variables allow for experimenting and triaging possible problems related to CO version:
MIOPEN_DEBUG_AMD_ROCM_METADATA_ENFORCE
- Affects kernels written in GCN assembly language.0
or unset - Automatically detect the required CO version and assemble to that version. This is the default.1
- Do not auto-detect Code Object version, always assemble v2 Code Objects.2
- Behave as if both CO v2 and v3 are supported (seeMIOPEN_DEBUG_AMD_ROCM_METADATA_PREFER_OLDER
).3
- Always assemble v3 Code Objects.
MIOPEN_DEBUG_AMD_ROCM_METADATA_PREFER_OLDER
- This variable affects only assembly kernels, and only when ROCm supports both CO v2 and CO v3 (like ROCm 2.10). By default, the newer format is used (CO v3). When this variable is enabled, the behavior is reversed.MIOPEN_DEBUG_OPENCL_ENFORCE_CODE_OBJECT_VERSION
- Enforces Code Object format for OpenCL kernels. Works with HIP backend only (cmake ... -DMIOPEN_BACKEND=HIP...
).Unset - Automatically detect the required CO version. This is the default.
2
- Always build to CO v2.3
- Always build to CO v3.4
- Always build to CO v4.
Winograd Multi-pass Maximum Workspace throttling#
MIOPEN_DEBUG_AMD_WINOGRAD_MPASS_WORKSPACE_MAX
- ConvWinograd3x3MultipassWrW
, WrW
MIOPEN_DEBUG_AMD_MP_BD_WINOGRAD_WORKSPACE_MAX
- ConvMPBidirectWinograd*
, FWD BWD
Syntax of value:
decimal or hex (with
0x
prefix) value that should fit into 64-bit unsigned integer.If syntax is violated, then the behavior is unspecified.
Semantics:
Sets the limit (max allowed workspace size) for Multi-pass (MP) Winograd Solutions, in bytes.
Affects all MP Winograd Solutions. If a Solution needs more workspace than the limit, then it does not apply.
If unset, then the default limit is used. Current default is
2000000000
(~1.862 GiB) for gfx900 and gfx906/60 (or less CUs). No default limit is set for other GPUs.Special values:
0 - Use the default limit, as if the variable is unset.
1 - Completely prohibit the use of workspace.
-1 - Remove the default limit.