MIOpen Release notes#
08/03/2021 [2.12.0]#
This release includes support for Navi21 and various other bug fixes and performance improvements
MIOpen now supports Navi21!! (via MIOpen PRs 973, 780, 764, 740, 739, 677, 660, 653, 493, 498)
Fixed a correctness issue with ImplicitGemm algorithm
Updated the performance data for new kernel versions
Improved MIOpen build time by splitting large kernel header files
Fixed an issue in reduction kernels for padded tensors
Various other bug fixes and performance improvements
05/17/2021 [2.11.0]#
This release contains various bug fixes and performance improvements.
Updates for Target ID features in ROCm stack
Correctness fix in Batchnorm kernels
Various bug fixes for MIOpenGEMM on the OpenCL backend
Various bug fixes in 3x3 assembly kernels
03/25/2021 [2.10.0]#
This release contains new reduction operations, Winograd algorithm performance improvements as well as bug fixes. Various host side performance improvements have been added as well.
Added a GPU reference kernel implementation for faster testing.
Add TargetID support for new AMD GPU architectures.
Implementation of four additional generic tensor reduction operations (AVG, AMAX, NORM1, NORM2).
Fixed a bug where Batchnorm would give incorrect results when the product of image height and image width is not a factor of four.
Various host side improvements for better find and tuning performance.
Added support for AMD Code Object V4.
12/01/2020 [ 2.9.0 ]#
This release contains implicit GEMM algorithm performance updates and bug fixes. Additional performance improvements have been implemented for batch normalization.
Added new assembly implicit GEMM kernels
Added batch normalization optimizations
Added missing tunings from 2.8.0 release cycle
Fixed issue where miopen-hip backend install would not search for rocBLAS dependency
Removed deprecated implicit GEMM xDLOPs solvers
Removed incorrect error messages from implicit GEMM solvers
Disabled ConvAsmBwdWrW3x3 solver for stride > 1 cases
Disabled bidirectional multi-pass Winograd kernels due to stability issues
10/28/2020 [ 2.8.0 ]#
This release provides additional bug fixes and support for embedded build using MIOpen as a static library.
Fixed workspace size calculation for GEMM group convolutions
Fixed performance regression for M/N
Fixed issue with faulty compiler option
Fixed typo in components dependency variable in CMakeLists.txt
Fixed issues with COMgr backed online compilation for HIP kernels
Added cmake flag for embedding system databases when building a static library
Added a way to disable building MIOpenDriver when building a static library
Added CC compiler detection in ROCm environment
Known issue: This release may show warnings for “obsolete configs” in the performance database. This can be fixed by rerunning tuning on a specfic network; see tuning documentation
09/18/2020 [ 2.7.0 ]#
This release contains a new reduction API; see API documentation for more information. Additional features for embedded builds have been added, and further support for 3D convolutional networks.
Added additional tunings into performance database
Added general reduction API
Added cmake flag for embedding binary database into a static MIOpen build
Added cmake flag for embedding system find-db text files into static MIOpen build
Fixed issue with GEMM workspace size calculation for backwards data convolutions #381
Fixed issue with 3D pooling indexing #365
08/20/2020 [ 2.6.0 ]#
This release contains convolution performance improvements, improved multi-threading behavior, and improved stability for half precision convolutions. Initial iteration time has been reduced with the introduction of hybrid find mode. Builds for a static library have been refined for this release.
Added MIOPEN_FIND_MODE=3 as the new default convolution Find mode; see documentation here for details
Added a more runtime-parameterized version of pooling to reduce the number of online compilations
Improved the performance of backwards spatial batch normalization for small images
Fixed issue with std::logic_error in SQLite deleter #306
Fixed issues with half precision stability for convolutions
Fixed issues with multi-threaded SQLite database accesses
Fixed issues with 3-D convolutions and incorrect parameters
Fixed various issues with implicit GEMM static assert failures
Removed inactive implicit GEMM convolution solvers
Removed SCGEMM convolutional algorithm from MIOpen
07/10/2020 [ 2.5.0 ]#
This release contains convolution performance improvements, various minor fixes and documentation updates.
Added a script to detect and install appropriate precompiled kernels
Added 3D convolution backwards weights implicit GEMM implementation
Improve performance of convolution implicit GEMM algorithm
Improved database coverage for batch size 1
Improved logging and error reporting
Improved documentation for debugging with numeric checks
Fixed issue with potential infinities and NaNs appearing during low precision training on CNNs
06/02/2020 [ 2.4.0 ]#
This release contains new implementations of 3D convolutions using implicitGEMM, general performance improvements for convolutions, bug fixes, better versioning in directories, integration with the new rocclr, and dropout support in RNNs.
Added 3D convolutions for the implicitGEMM algorithm in the forward and backward-data passes
Added dropout support for RNN layer; e.g., RNN-vanilla, GRU, and LSTM
Added support for AMD’s rocclr runtime and compiler
Improved performance for implicitGEMM and Winograd algorithms
Improved database locking
Fixed issue with GPU memory segmentation fault on asymmetric padding #142
03/01/2020 [ 2.3.0 ]#
This release contains new implementations of the implicitGEMM and Winograd algorithms, performance improvements for convolutions, further support for 3D convolutional networks, and various bug fixes.
Added 3D Pooling layers
Added backwards data algorithm for implicitGEMM
Added GEMM performance improvements via relaxed constraints in rocBLAS-Tensile
Added full CO v3 support for all kernels in MIOpen
Added new Winograd group convolution kernels
Added an API to query MIOpen’s version
Added parallel compilation in initial convolutional algorithm search; partial solution to #130
Added SQLite binary program cache
Improved logging across all layers
Improved MIOpen’s internal design for calling convolutional solvers
Fixed various bugs for the implicitGEMM algorithm
01/24/2020 [ 2.2.1 ]#
This release contains bug fixes, documentation updates, and further code object version 3 support
Changes:
Added support for multiple ROCm installations
Added additional support for code object v3
Fixed issue with incorrect LRN calculation #127
Fixed incorrect performance database documentation
Fixed issue with incorrect workspace calculation in group convolutions
Fixed issue with unsupported hardware instructions used with inline assembly
12/19/2019 [ 2.2.0 ]#
This release contains bug fixes, performance improvements, and expanded applicability for specific convolutional algorithms.
MIOpen has posted a citable paper on ArXiv here.
An SQLite database has been added to replace the text-based performance database. While the text file still exists, by default SQLite is used over the text-based performance database; see documentation from more details.
Changes:
Added per solution algorithm filtering environmental variable for debugging
Added SQLite3 database and build dependency. The text-based performance database support is deprecated and will be removed in the next release.
Added citation page to documentation pointing to MIOpen’s paper
Added to the overall documentation
Fixed fusion compilation check issue
Fixed fusion group convolution warning
Improved performance of forward pooling
Improved performance of convolutions
Improved performance of spatial training batch normalization for some large batch size input configurations
Improved applicability of implicit GEMM convolution algorithm
Improved performance of calls to miopenConvolutionXXXGetWorkSpaceSize() functions
Improved conformance to code object version 3
Removed SCGEMM convolution algorithm by default; this algorithm is deprecated and will be removed in future releases
Changed “hip_hcc” to “hip-hcc” for the MIOpen package requirements in CMakeLists.txt
09/25/2019 [ 2.1.0 ]#
This release contains new layers, bug fixes, and a new convolution algorithm.
Changes:
Added a dropout layer API for training
Added a new SCGEMM algorithm for convolutions
Added further support for bfp16 in convolutions
Added a docker hub link for MIOpen docker images.
Fixed issue with NaN appearing on batch normalization backwards pass in fp16
Fixed softmax kernel bug in log mode #112
Fixed ROCm gfx803 support issue #869
Improved performance of batch normalization fp16 forward training layers
Improved performance of convolutions layers
Removed MIOpenGEMM as a requirement for the HIP backend. It is now optional.
08/13/2019 [ 2.0.1 ]#
This release contains bug fixes and performance improvements.
Additionally, the convolution algorithm Implicit GEMM is now enabled by default
Known issues:
Backward propagation for batch normalization in fp16 mode may trigger NaN in some cases
Softmax Log mode may produce an incorrect result in back propagation
Changes:
Added Winograd multi-pass convolution kernel
Fixed issue with hip compiler paths
Fixed immediate mode behavior with auto-tuning environment variable
Fixed issue with system find-db in-memory cache, the fix enable the cache by default
Improved logging
Improved how symbols are hidden in the library
Updated default behavior to enable implicit GEMM
07/08/2019 [ 2.0.0 ]#
This release contains several new features including an immediate mode for selecting convolutions, bfloat16 support, new layers, modes, and algorithms.
MIOpenDriver, a tool for benchmarking and developing kernels is now shipped with MIOpen.
BFloat16 now supported in HIP requires an updated rocBLAS as a GEMM backend.
Immediate mode API now provides the ability to quickly obtain a convolution kernel.
MIOpen now contains HIP source kernels and implements the ImplicitGEMM kernels. This is a new feature and is currently disabled by default. Use the environmental variable “MIOPEN_DEBUG_CONV_IMPLICIT_GEMM=1” to activation this feature. ImplicitGEMM requires an up to date HIP version of at least 1.5.9211.
A new “loss” catagory of layers has been added, of which, CTC loss is the first. See the API reference for more details.
2.0 is the last release of active support for gfx803 architectures. In future releases, MIOpen will not actively debug and develop new features specifically for gfx803.
System Find-Db in memory cache is disabled by default. Please see build instructions to enable this feature.
Changes:
Added support for bfloat16 datatype in convolutions
Added softmax channel mode and new softmax version 2 API
Added fast / accurate / log softmax algorithms
Added new implicit GEMM convolution algorithm for forward and backwards data passes, disabled by default
Added int32 datatype support for output tensors in int8 convolutions
Added immediate mode for finding the best convolution kernel for a given configuration
Added a Find-Db infrastructure which stashes results of find on a user’s system
Added a shipped System Find-Db containing offline run Find() results
Added an additional, faster batch norm assembly kernel for fp16
Added CTC loss layer
Added MIOpenDriver as a default component in MIOpen’s build #34
Fixed C compatability for boolean types in C API #103
Fixed incorrect calculation in per-activation batch norm backwards pass #104
Fixed bug #95 with asm batch norm ISA
Fixed IsApplicable bug in Conv3x3Asm for group convolutions
Improved performance of 1x1 stride 2 fp32 convolutions in the forward and backwards data passes
Improved 3-D convolution stability
Improved applicability of direct convolution backwards weights for 2x2, 5x10, and 5x20 filter sizes
Improved maintainability in kernels and cpp code
Updated rocBLAS minimum version to branch master-rocm-2.6
05/03/2019 [ 1.8.1 ]#
This release contains minor bug fixes and additional performance database improvements.
Changes:
Fixed accuracy issue with backwards weights
Fixed issue with name parsing for newer architectures
Added narrow workaround for 5x10 and 5x20 filter performance regression
Improved support in performance database for Radeon VII
04/11/2019 [ 1.8.0 ]#
This release contaings full 3-D convolution support and int8 support for interfence.
Additionally, there are major updates in the performance database for major models including those found in Torchvision.
This release contains full 3-D convolution support and int8 support for inference.
Additionally, there are updates in the performance database for major models including those found in Torchvision.
An assortment of bugs have been resolved in this release.
Changes:
Fixed various issues in assembly kernels
Fixed issue #92 and #79 for miopenOpTensor
Fixed issue #88 for bzip2
Fixed issue #77 algorithm mismatch
Added Winograd suport for fp32 backwards weights
Added Winograd support for fp32 backwards weights
Added pooling inclusive mode
Added tuning for direct group convolution algorithms
Added additional kernel supoort for group convolutions
Added additional kernel support for group convolutions
Added API for 3-D convolutions
Added support for int8 inference convolutions
Added integer selection for pooling indexing
Added minimum dependencies support
Added RNN fp16 support on the MIOpen-HIP backend
Added 1x1 convolution + bias + activation fusions
Added workaround for issue #84 GPU memory access fault
Added performance tuning for direct backwards weights
Improved performance database coverage
Improved internal quality by reducing redunant code
Improved build instructions in README.md
Improved performance database coverage for fusions
Updated Docker components and requirements
Known Issues:
RNNs do not support fp16 on the MIOpen-OpenCL backend
OpenCL backend does not support GEMM convolutions in fp16
02/06/2019 [ 1.7.1 ]#
This release contains minor bug fixes and performance improvements.
Changes:
Fixed corrupt and obsolete performance database entries
Fixed issue #70, “SIGFPE (DIV/0) in ConvOclBwdWrW2::GetSolution()”
Fixed issue #72, “workSpaceSize check assertion fails in ConvolutionBackwardWeights() - DEBUG builds only”
Fixed issue #77, “Results of ConvBwdWeightsAlgoDirect and ConvBwdWeightsAlgoGEMM mismatch for some specific parameters”
Removed default dependency of RNNs on rocBLAS
Added a workaround for softmax fp16 correctness issue
Added check to only make MIOpen with static boost libraries
Improved performance database coverage
Known Issues:
RNNs do not support fp16
OpenCL backend does not support GEMM convolutions in fp16
Layer fusions for convolution 1x1 fp16 are not supported
Layer fusions for large image 1x1 convolutions may cause an exception instead of a warning during compile phase if plan is not supported
12/19/2018 [ 1.7.0 ]#
This release contains general bug fixes and an updated performance database
Group convolutions backwards weights performance has been improved
Logging across the library has been improved
Performance database has been updated
Changes:
Fixed logging issues with group convolution and pooling
Fixed sphinx version issue in document generation
Fixed issues with corrupt entries in performance database
Removed external dependency on libSSL and libCrypto
Added support for large image backwards weights in direct convolution
Added fp16 support for RNNs on the HIP backend
Improved performance database coverage
Known Issues:
RNNs do not support fp16
OpenCL backend does not support GEMM convolutions in fp16
Layer fusions for convolution 1x1 fp16 are not supported
Layer fusions for large image 1x1 convolutions may cause an exception instead of a warning during compile phase if plan is not supported
11/18/2018 [ 1.6.0 ]#
Training in fp16 (half precision) including mixed-precision is now fully supported
Batch Normalization in fp16 (half precision) including mixed-precision are now available
Performance improvements for 3x3 and 1x1 single-precision convolutions
Layer fusions for BatchNorm+Activation are now available
Layer fusions with convolutions now support varying strides and padding configurations
Changes:
rocBLAS is now used as the default BLAS library for the HIP backend (minimum version 14.3.0)
Fixed various bugs in convolution kernels
Fixed issues with bad references in layer fusion
Fixed gfx803 assembily issues
Added support fp16 Winograd convolutions
Added support for fp16 pooling
Improved error reporting for convolutions and layer fusions
Improved documentation
Known Issues:
RNNs do not support fp16
OpenCL backend does not have full fp16 support
Layer fusions for convolution 1x1 fp16 are not supported
09/14/2018 [ 1.5.0 ]#
Notes:
A new kernel fusion API is now available for inference for convolution, bias, batch normalization, and activations.
This release includes new features and bug fixes
Group and Depthwise convolutions are now available
3D Batch Normalization has been implemented for fully packed tensors
Dilation for convolutions have been implemented
Changes:
Fixed bugs in direct convolutions
Fixed issue with paths when $HOME variable is not set
Fixed padding issues with 1x1 convolutions
Added incremental support for fp16
Added fused kernels for Winograd and direct with bias and activations
Added a getting started guide for kernel fusion.
Added group and depthwise API for convolutions
Added 3-D batch normalization support with 5-D tensors
Improved max pooling performance
Improved debug and error reporting information
Improved documentation for convolutions
Known Issues:
RNNs do not support fp16
Training with CNNs does not support fp16
07/30/2018 [ 1.4.2 ]#
Notes:
This release is a hot-fix to enable ICNet and PSPNet
Known Issues:
RNNs do not support fp16
Training with CNNs does not support fp16
Users may encounter a warning that their performance database is out of date. The performance database can be updated by setting the environment variable for just the initial run of an application:
MIOPEN_FIND_ENFORCE=search
For more information on the performance database, see: https://rocmsoftwareplatform.github.io/MIOpen/doc/html/perfdatabase.html#
07/19/2018 [ 1.4.1 ]#
Notes:
This release includes a bug fix for 3x3 convolutions
Updated README file configuration instructions
Known Issues:
RNNs do not support fp16
Training with CNNs does not support fp16
Users may encounter a warning that their performance database is out of date. The performance database can be updated by setting the environment variable for just the initial run of an application:
MIOPEN_FIND_ENFORCE=search
For more information on the performance database, see: https://rocmsoftwareplatform.github.io/MIOpen/doc/html/perfdatabase.html#
07/06/2018 [ 1.4.0 ]#
Notes:
This release includes a number of performance improvements and bug fixes
New features have been added to convolutions for auto-tuning kernels
Activations now have new modes available
Documentation has been updated and corrected
Changes:
Fixed documentation errors
Fixed bug in activations with pass-through mode
Fixed performance database locking issues
Fixed Winograd kernel behavior for stride 2 backwards data
Fixed a bug in OpTensor layer
Fixed a timing issue with batch normalization inline assembly
Fixed issue with an unnecessary binary creation in assembly bug detection
Fixed issue with disk program cache directory not being created
Fixed a bug with convolution+bias
Added to performance database functionality
Added leaky-ReLU, clipped, and exponential-ReLU modes to activation
Added documentation for performance database usage
Added support for 1x1 convolutions with non-zero padding
Added API for printing status codes as strings
Added auto-tuning feature for convolutions
Improved LSTM and GRU backwards pass performance
Improved debug and error reporting information
Improved performance of batch normalization spatial mode
Improved find stage for convolutions
Improved readability for user database file
Known Issues:
RNNs do not support fp16
Training with CNNs does not support fp16
03/30/2018 [ 1.3.0 ]#
Notes:
Performance improvements for RNNs
Performance improvements for convolutions using 1x1 filters
Performance improvement for Batch Normalization
This release adds preliminary fp16 support for Inference using CNNs
Bug fixes for various components of MIOpen
Changes:
Added 2 new API for RNNs: miopenGetRNNLayerParamOffset and miopenGetRNNLayerBiasOffset
Added support for uninitialized hidden states and nullptr outputs in RNNs
Added support for Set and Scale operations for strided tensors with dimensions 1 to 5
Added multi-thread and multi-process support for the performance database
Improved performance for OpTensor
Fixed bug in convolutions for backward bias
Fixed logic issues in get and set layer functions and related w_supertensor test
Fixed hang in batch norm with batch sizes greater than 256
Known Issues:
RNNs do not support fp16
Training with CNNs does not support fp16
03/08/2018 [ 1.2.1 ]#
Notes:
This release adds support for ROCm 1.7.1.
12/15/2017 [ 1.2.0 ]#
Notes:
This release adds the support for recurrent neural networks (RNNs) for three flavors - Vanilla, LSTMs, and GRU
Users can now themselves update the perf-db file, which hosts the tuning parameters for convolutions, by setting appropriate environment variables
Changes:
Over 50% improvement in ResNet performance since the last release
Multiple padding modes like Same and Valid added
Winograd convolution kernels added for strided bwd-data convolutions
Tensor Ops allow for beta and alpha scaling values and support up to 5 dimensions with strides and offsets
Tensor Copy supports up to 5 dimesnional copies with strides and offsets
Unit-tests for LRN are added
Several bug fixes for all the layers of the library
Known issues:
RNNs may give incorrect result due to a known compiler bug; issue may particulary arise during some RNNs configs with GEMM of size power of 4
Potential issue where OpenCL resources will be exhausted for large RNN
09/08/2017 [ 1.1.0 ]#
Notes:
The scaling parameter alpha and shift parameter beta for layers kernels are only supported for alpha = 1 and beta = 0. The exceptions to this are for miopenOptTensor, miopenConvolutionForwardBias, and miopenConvolutionBackwardBias.
Currently, only 32-bit floats are supported in MIOpen.
MIOpen only supports tensor layout NCHW.
Changes:
Added persistent cache for compiled GPU kernels
Performance improvements for batch normalization kernels
Performance improvements for all types of convolutions for 1x1 filters
Performance improvements for all types of convolutions with non-unit strides
Performance improvements for backward-weights convolutions for 3x3 filters
Performance improvements for the AddTensor operation
Various bug fixes for Winograd convolutions
08/27/2017 [ 1.0.2 ]#
Fixed 1x1 forward and backward convolutions for large input
Fixed pooling MIOpendriver
Disabled 1x1 Winograd convolution for HIP
Disabled asm. backward-weights convolutions for input width == 175
07/26/2017 [ 1.0.1 ]#
Added dilation support for convolutions
Added unit-tests for Softmax
Added miopengemm as a required dependency for MIOpen build
Performance improvements for batch normalization via activation of data-parallel primitives (DPP) hardware instructions
Fixed documentation to remove GEMM API interface
Fixed Bwd-Weights Convolutions with 1x1 filters with stride=2
Fixed Softmax grid-size selection
Fixed debug prints of kernel launch parameters.
Removed GEMM interface from the MIOpen API