MI200 matrix fused multiply-add (MFMA) behavior specifics

MI200 matrix fused multiply-add (MFMA) behavior specifics#

On the MI200, MFMA_F16, MFMA_BF16, and MFMA_BF16_1K flush subnormal input/output data to zero. This behavior might affect the convolution operation in certain workloads due to the limited exponent range of the half-precision floating-point datatypes.

MIOpen offers an alternate implementation for the half-precision datatype via conversion instructions to utilize the larger exponent range of the BFloat16 data type, albeit with reduced accuracy. The following caveats apply to this alternate implementation:

  • It’s disabled by default in the forward convolution operations.

  • It’s enabled by default in the backward data and backward weights convolution operations.

  • You can override the default MIOpen behavior by using the miopenSetConvolutionAttribute API call. To use the alternate implementation, pass the convolution descriptor for the appropriate convolution operation and the MIOPEN_CONVOLUTION_ATTRIB_FP16_ALT_IMPL convolution attribute (with a non-zero value).

  • You can also override the behavior using the MIOPEN_DEBUG_CONVOLUTION_ATTRIB_FP16_ALT_IMPL environment variable. When set to 1, MIOPEN_DEBUG_CONVOLUTION_ATTRIB_FP16_ALT_IMPL engages the alternate implementation. When set to 0, it’s disabled. This environment variable impacts the convolution operation in all directions.