MI200 matrix fused multiply-add (MFMA) behavior specifics

MI200 matrix fused multiply-add (MFMA) behavior specifics#

The MI200 MFMA_F16, MFMA_BF16, and MFMA_BF16_1K flush subnormal input/output data to zero. This behavior might affect the convolution operation in certain workloads due to the limited exponent range of the half-precision floating-point datatypes.

MIOpen offers an alternate implementation for the half-precision datatype via conversion instructions to utilize the BFloat16 datatype’s larger exponent range, albeit with reduced accuracy. The following salients apply to this alternate implementation:

It’s disabled by default in the forward convolution operations.
It’s enabled by default in the backward data and backward weights convolution operations.
You can override the default MIOpen behaviors by using the miopenSetConvolutionAttribute API call: Pass the convolution descriptor for the appropriate convolution operation, and the MIOPEN_CONVOLUTION_ATTRIB_FP16_ALT_IMPL convolution attribute (with a non-zero value), to engage the alternate implementation.
You can also override the behavior using the MIOPEN_DEBUG_CONVOLUTION_ATTRIB_FP16_ALT_IMPL environment variable. When set to 1, MIOPEN_DEBUG_CONVOLUTION_ATTRIB_FP16_ALT_IMPL engages the alternate implementation; when set to 0, it’s disabled. Keep in mind that the environment variable impacts the convolution operation in all directions.