SGD#
Functions | |
miopenStatus_t | miopenFusedAdam (miopenHandle_t handle, const miopenTensorDescriptor_t paramDesc, void *param, const miopenTensorDescriptor_t gradDesc, const void *grad, const miopenTensorDescriptor_t expAvgDesc, void *expAvg, const miopenTensorDescriptor_t expAvgSqDesc, void *expAvgSq, const miopenTensorDescriptor_t maxExpAvgSqDesc, void *maxExpAvgSq, const miopenTensorDescriptor_t stateStepDesc, void *stateStep, const unsigned int state_step, const float lr, const float beta1, const float beta2, const float weight_decay, const float eps, const bool amsgrad, const bool maximize, const bool adamw, const miopenTensorDescriptor_t gradScaleDesc, const void *gradScale, const miopenTensorDescriptor_t foundInfDesc, const void *foundInf) |
Perform Fused Adam optimization for a single tensor (Adaptive Moment Estimation). More... | |
miopenStatus_t | miopenFusedAdamWithOutput (miopenHandle_t handle, const miopenTensorDescriptor_t paramInDesc, void *paramIn, const miopenTensorDescriptor_t paramOutDesc, void *paramOut, const miopenTensorDescriptor_t paramOutFloat16Desc, void *paramOutFloat16, const miopenTensorDescriptor_t gradInDesc, const void *gradIn, const miopenTensorDescriptor_t expAvgInDesc, void *expAvgIn, const miopenTensorDescriptor_t expAvgOutDesc, void *expAvgOut, const miopenTensorDescriptor_t expAvgSqInDesc, void *expAvgSqIn, const miopenTensorDescriptor_t expAvgSqOutDesc, void *expAvgSqOut, const miopenTensorDescriptor_t maxExpAvgSqInDesc, void *maxExpAvgSqIn, const miopenTensorDescriptor_t maxExpAvgSqOutDesc, void *maxExpAvgSqOut, const miopenTensorDescriptor_t stateStepInDesc, void *stateStepIn, const miopenTensorDescriptor_t stateStepOutDesc, void *stateStepOut, const unsigned int state_step, const float lr, const float beta1, const float beta2, const float weight_decay, const float eps, const bool amsgrad, const bool maximize, const bool adamw, const miopenTensorDescriptor_t gradScaleDesc, const void *gradScale, const miopenTensorDescriptor_t foundInfDesc, const void *foundInf) |
Execute single tensor Adam optimization and receive the result in a separate output tensor. More... | |
miopenStatus_t | miopenTransformersAdamW (miopenHandle_t handle, const miopenTensorDescriptor_t paramDesc, void *param, const miopenTensorDescriptor_t gradDesc, const void *grad, const miopenTensorDescriptor_t expAvgDesc, void *expAvg, const miopenTensorDescriptor_t expAvgSqDesc, void *expAvgSq, const miopenTensorDescriptor_t stateStepDesc, void *stateStep, const unsigned int state_step, const float lr, const float beta1, const float beta2, const float weight_decay, const float eps, const bool correct_bias, const miopenTensorDescriptor_t gradScaleDesc, const void *gradScale, const miopenTensorDescriptor_t foundInfDesc, const void *foundInf) |
Implements Adam algorithm with weight decay fix as introduced in Decoupled Weight Decay Regularization. This is the fused kernel version of AdamW included in the Hugging Face Transformers module. More... | |
miopenStatus_t | miopenTransformersAdamWWithOutput (miopenHandle_t handle, const miopenTensorDescriptor_t paramInDesc, void *paramIn, const miopenTensorDescriptor_t paramOutDesc, void *paramOut, const miopenTensorDescriptor_t paramOutFloat16Desc, void *paramOutFloat16, const miopenTensorDescriptor_t gradInDesc, const void *gradIn, const miopenTensorDescriptor_t expAvgInDesc, void *expAvgIn, const miopenTensorDescriptor_t expAvgOutDesc, void *expAvgOut, const miopenTensorDescriptor_t expAvgSqInDesc, void *expAvgSqIn, const miopenTensorDescriptor_t expAvgSqOutDesc, void *expAvgSqOut, const miopenTensorDescriptor_t stateStepInDesc, void *stateStepIn, const miopenTensorDescriptor_t stateStepOutDesc, void *stateStepOut, const unsigned int state_step, const float lr, const float beta1, const float beta2, const float weight_decay, const float eps, const float step_size, const bool correct_bias, const miopenTensorDescriptor_t gradScaleDesc, const void *gradScale, const miopenTensorDescriptor_t foundInfDesc, const void *foundInf) |
Execute single tensor Adam optimization and receive the result in a separate output tensor. More... | |
Detailed Description
Function Documentation
◆ miopenFusedAdam()
miopenStatus_t miopenFusedAdam | ( | miopenHandle_t | handle, |
const miopenTensorDescriptor_t | paramDesc, | ||
void * | param, | ||
const miopenTensorDescriptor_t | gradDesc, | ||
const void * | grad, | ||
const miopenTensorDescriptor_t | expAvgDesc, | ||
void * | expAvg, | ||
const miopenTensorDescriptor_t | expAvgSqDesc, | ||
void * | expAvgSq, | ||
const miopenTensorDescriptor_t | maxExpAvgSqDesc, | ||
void * | maxExpAvgSq, | ||
const miopenTensorDescriptor_t | stateStepDesc, | ||
void * | stateStep, | ||
const unsigned int | state_step, | ||
const float | lr, | ||
const float | beta1, | ||
const float | beta2, | ||
const float | weight_decay, | ||
const float | eps, | ||
const bool | amsgrad, | ||
const bool | maximize, | ||
const bool | adamw, | ||
const miopenTensorDescriptor_t | gradScaleDesc, | ||
const void * | gradScale, | ||
const miopenTensorDescriptor_t | foundInfDesc, | ||
const void * | foundInf | ||
) |
Perform Fused Adam optimization for a single tensor (Adaptive Moment Estimation).
This function implements the Fused Adam optimization algorithm. Adam, short for Adaptive Moment Estimation, extends the RMSProp optimizer. It combines the advantages of AdaGrad and RMSProp by adaptively adjusting learning rates for each parameter using the first and second moments of gradients. Fused Adam optimization efficiently combines multiple operations into a single kernel, reducing memory access overhead and improving performance.
Additionally, Fused Adam can be utilized in both adam w and Automatic Mixed Precision (AMP), enabling accelerated model training and reduced memory consumption. AMP supports FP16 computation, optimizing model calculations using a mixture of FP32 and FP16 precision to enhance training speed. When utilizing AMP, FoundInf, ScaleGrad, and step tensors should be employed. In AMP mode, the execution of Adam is determined based on the FoundInf value. State Step accepts both int values and int tensors. If a Step tensor is employed, the step received as an int is disregarded, and if Adam is executed, the step tensor is incremented by 1.
- Parameters
-
handle MIOpen handle (input) paramDesc Tensor descriptor for the input parameter tensor (input) param Input parameter tensor (input) gradDesc Tensor descriptor for the input gradient tensor (input) grad Input gradient tensor (input) expAvgDesc Tensor descriptor for the input exponential moving average tensor (input) expAvg Input exponential moving average tensor (input) expAvgSqDesc Tensor descriptor for the input exponential moving average squared tensor (input) expAvgSq Input exponential moving average squared tensor (input) maxExpAvgSqDesc Tensor descriptor for the input maximum exponential moving average squared tensor. Used when amsgrad is true (input, optional) maxExpAvgSq Input maximum exponential moving average squared tensor. Used when amsgrad is true (input, optional) stateStepDesc Tensor descriptor for the input state step tensor (input) stateStep Input state step tensor (input) state_step Input state step. used when the step tensor is null (input) lr Learning rate (input) beta1 Coefficient used for computing the first moment running average of gradient (input) beta2 Coefficient used for computing the second moment running average of gradient (input) weight_decay Weight decay (input) eps Term added to the denominator to improve numerical stability (input) amsgrad Flag indicating whether to use the AMSGrad variant of Adam (input) maximize Flag indicating whether to maximize the objective with respect to the parameters (input) adamw If true, the operation becomes AdamW (input) gradScaleDesc Tensor descriptor for the input grad scale tensor (input, optional) gradScale Input grad scale tensor (input, optional) foundInfDesc Tensor descriptor for the input found inf tensor (input, optional) foundInf Tensor indicating the presence of inf or NaN in gradients. If true, skips operation and step update (input, optional)
- Returns
- miopenStatus_t
◆ miopenFusedAdamWithOutput()
miopenStatus_t miopenFusedAdamWithOutput | ( | miopenHandle_t | handle, |
const miopenTensorDescriptor_t | paramInDesc, | ||
void * | paramIn, | ||
const miopenTensorDescriptor_t | paramOutDesc, | ||
void * | paramOut, | ||
const miopenTensorDescriptor_t | paramOutFloat16Desc, | ||
void * | paramOutFloat16, | ||
const miopenTensorDescriptor_t | gradInDesc, | ||
const void * | gradIn, | ||
const miopenTensorDescriptor_t | expAvgInDesc, | ||
void * | expAvgIn, | ||
const miopenTensorDescriptor_t | expAvgOutDesc, | ||
void * | expAvgOut, | ||
const miopenTensorDescriptor_t | expAvgSqInDesc, | ||
void * | expAvgSqIn, | ||
const miopenTensorDescriptor_t | expAvgSqOutDesc, | ||
void * | expAvgSqOut, | ||
const miopenTensorDescriptor_t | maxExpAvgSqInDesc, | ||
void * | maxExpAvgSqIn, | ||
const miopenTensorDescriptor_t | maxExpAvgSqOutDesc, | ||
void * | maxExpAvgSqOut, | ||
const miopenTensorDescriptor_t | stateStepInDesc, | ||
void * | stateStepIn, | ||
const miopenTensorDescriptor_t | stateStepOutDesc, | ||
void * | stateStepOut, | ||
const unsigned int | state_step, | ||
const float | lr, | ||
const float | beta1, | ||
const float | beta2, | ||
const float | weight_decay, | ||
const float | eps, | ||
const bool | amsgrad, | ||
const bool | maximize, | ||
const bool | adamw, | ||
const miopenTensorDescriptor_t | gradScaleDesc, | ||
const void * | gradScale, | ||
const miopenTensorDescriptor_t | foundInfDesc, | ||
const void * | foundInf | ||
) |
Execute single tensor Adam optimization and receive the result in a separate output tensor.
This function is equivalent to miopenFusedAdam but receives the result in a separate output tensor.
- See also
- miopenFusedAdam
- Parameters
-
handle MIOpen handle (input) paramInDesc Tensor descriptor for the input parameter tensor (input) paramIn Input parameter tensor (input) paramOutDesc Tensor descriptor for the output parameter tensor (input) paramOut Output parameter tensor (output) paramOutFloat16Desc Tensor descriptor for the output parameter tensor float16 (input, optional) paramOutFloat16 Output parameter tensor (output, optional) gradInDesc Tensor descriptor for the input gradient tensor (input) gradIn Input gradient tensor (input) expAvgInDesc Tensor descriptor for the input exponential moving average tensor (input) expAvgIn Input exponential moving average tensor (input) expAvgOutDesc Tensor descriptor for the output exponential moving average tensor (input) expAvgOut Output exponential moving average tensor (output) expAvgSqInDesc Tensor descriptor for the input exponential moving average squared tensor (input) expAvgSqIn Input exponential moving average squared tensor (input) expAvgSqOutDesc Tensor descriptor for the output exponential moving average squared tensor (input) expAvgSqOut Output exponential moving average squared tensor (output) maxExpAvgSqInDesc Tensor descriptor for the input maximum exponential moving average squared tensor. Used when amsgrad is true (input, optional) maxExpAvgSqIn Input maximum exponential moving average squared tensor. Used when amsgrad is true (input, optional) maxExpAvgSqOutDesc Tensor descriptor for the output maximum exponential moving average squared tensor. Used when amsgrad is true (input, optional) maxExpAvgSqOut Output maximum exponential moving average squared tensor. Used when amsgrad is true (output, optional) stateStepInDesc Tensor descriptor for the input state step tensor (input, optional) stateStepIn Input state step tensor (input, optional) stateStepOutDesc Tensor descriptor for the output state step tensor (input, optional) stateStepOut Output state step tensor that stores the updated step value. (output, optional) state_step Input state step, It is used when the step tensor is null. (input) lr Learning rate (input) beta1 Coefficient used for computing the first moment running average of gradient (input) beta2 Coefficient used for computing the second moment running average of gradient (input) weight_decay Weight decay (input) eps Term added to the denominator to improve numerical stability (input) amsgrad Flag indicating whether to use the AMSGrad variant of Adam (input) maximize Flag indicating whether to maximize the objective with respect to the parameters (input) adamw If it is true, the operation becomes AdamW (input) gradScaleDesc Tensor descriptor for the input grad scale tensor (input, optional) gradScale Input grad scale tensor (input, optional) foundInfDesc Tensor descriptor for the input found inf tensor (input, optional) foundInf Tensor indicating presence of inf or nan in gradients. If true, skips operation and step update. (input, optional)
- Returns
- miopenStatus_t
◆ miopenTransformersAdamW()
miopenStatus_t miopenTransformersAdamW | ( | miopenHandle_t | handle, |
const miopenTensorDescriptor_t | paramDesc, | ||
void * | param, | ||
const miopenTensorDescriptor_t | gradDesc, | ||
const void * | grad, | ||
const miopenTensorDescriptor_t | expAvgDesc, | ||
void * | expAvg, | ||
const miopenTensorDescriptor_t | expAvgSqDesc, | ||
void * | expAvgSq, | ||
const miopenTensorDescriptor_t | stateStepDesc, | ||
void * | stateStep, | ||
const unsigned int | state_step, | ||
const float | lr, | ||
const float | beta1, | ||
const float | beta2, | ||
const float | weight_decay, | ||
const float | eps, | ||
const bool | correct_bias, | ||
const miopenTensorDescriptor_t | gradScaleDesc, | ||
const void * | gradScale, | ||
const miopenTensorDescriptor_t | foundInfDesc, | ||
const void * | foundInf | ||
) |
Implements Adam algorithm with weight decay fix as introduced in Decoupled Weight Decay Regularization. This is the fused kernel version of AdamW included in the Hugging Face Transformers module.
- See also
- miopenFusedAdam
- Parameters
-
handle MIOpen handle (input) paramDesc Tensor descriptor for the input parameter tensor (input) param Input parameter tensor (input) gradDesc Tensor descriptor for the input gradient tensor (input) grad Input gradient tensor (input) expAvgDesc Tensor descriptor for the input exponential moving average tensor (input) expAvg Input exponential moving average tensor (input) expAvgSqDesc Tensor descriptor for the input exponential moving average squared tensor (input) expAvgSq Input exponential moving average squared tensor (input) stateStepDesc Tensor descriptor for the input state step tensor (input) stateStep Input state step tensor (input) state_step Input state step. used when the step tensor is null (input) lr Learning rate (input) beta1 Coefficient used for computing the first moment running average of gradient (input) beta2 Coefficient used for computing the second moment running average of gradient (input) weight_decay Weight decay (input) eps Term added to the denominator to improve numerical stability (input) correct_bias Whether or not to correct bias in Adam (for instance, in Bert TF repository they use False). gradScaleDesc Tensor descriptor for the input grad scale tensor (input, optional) gradScale Input grad scale tensor (input, optional) foundInfDesc Tensor descriptor for the input found inf tensor (input, optional) foundInf Tensor indicating the presence of inf or NaN in gradients. If true, skips operation and step update (input, optional)
- Returns
- miopenStatus_t
◆ miopenTransformersAdamWWithOutput()
miopenStatus_t miopenTransformersAdamWWithOutput | ( | miopenHandle_t | handle, |
const miopenTensorDescriptor_t | paramInDesc, | ||
void * | paramIn, | ||
const miopenTensorDescriptor_t | paramOutDesc, | ||
void * | paramOut, | ||
const miopenTensorDescriptor_t | paramOutFloat16Desc, | ||
void * | paramOutFloat16, | ||
const miopenTensorDescriptor_t | gradInDesc, | ||
const void * | gradIn, | ||
const miopenTensorDescriptor_t | expAvgInDesc, | ||
void * | expAvgIn, | ||
const miopenTensorDescriptor_t | expAvgOutDesc, | ||
void * | expAvgOut, | ||
const miopenTensorDescriptor_t | expAvgSqInDesc, | ||
void * | expAvgSqIn, | ||
const miopenTensorDescriptor_t | expAvgSqOutDesc, | ||
void * | expAvgSqOut, | ||
const miopenTensorDescriptor_t | stateStepInDesc, | ||
void * | stateStepIn, | ||
const miopenTensorDescriptor_t | stateStepOutDesc, | ||
void * | stateStepOut, | ||
const unsigned int | state_step, | ||
const float | lr, | ||
const float | beta1, | ||
const float | beta2, | ||
const float | weight_decay, | ||
const float | eps, | ||
const float | step_size, | ||
const bool | correct_bias, | ||
const miopenTensorDescriptor_t | gradScaleDesc, | ||
const void * | gradScale, | ||
const miopenTensorDescriptor_t | foundInfDesc, | ||
const void * | foundInf | ||
) |
Execute single tensor Adam optimization and receive the result in a separate output tensor.
This function is equivalent to miopenTransformersAdam but receives the result in a separate output tensor.
- Parameters
-
handle MIOpen handle (input) paramInDesc Tensor descriptor for the input parameter tensor (input) paramIn Input parameter tensor (input) paramOutDesc Tensor descriptor for the output parameter tensor (input) paramOut Output parameter tensor (output) paramOutFloat16Desc Tensor descriptor for the output parameter tensor float16 (input, optional) paramOutFloat16 Output parameter tensor (output, optional) gradInDesc Tensor descriptor for the input gradient tensor (input) gradIn Input gradient tensor (input) expAvgInDesc Tensor descriptor for the input exponential moving average tensor (input) expAvgIn Input exponential moving average tensor (input) expAvgOutDesc Tensor descriptor for the output exponential moving average tensor (input) expAvgOut Output exponential moving average tensor (output) expAvgSqInDesc Tensor descriptor for the input exponential moving average squared tensor (input) expAvgSqIn Input exponential moving average squared tensor (input) expAvgSqOutDesc Tensor descriptor for the output exponential moving average squared tensor (input) expAvgSqOut Output exponential moving average squared tensor (output) stateStepInDesc Tensor descriptor for the input state step tensor (input, optional) stateStepIn Input state step tensor (input, optional) stateStepOutDesc Tensor descriptor for the output state step tensor (input, optional) stateStepOut Output state step tensor that stores the updated step value. (output, optional) state_step Input state step, It is used when the step tensor is null. (input) lr Learning rate (input) beta1 Coefficient used for computing the first moment running average of gradient (input) beta2 Coefficient used for computing the second moment running average of gradient (input) weight_decay Weight decay (input) eps Term added to the denominator to improve numerical stability (input) step_size Pre-calculated step_size, used for performance enhancement (input) correct_bias Whether or not to correct bias in Adam (for instance, in Bert TF repository they use False) (input) gradScaleDesc Tensor descriptor for the input grad scale tensor (input, optional) gradScale Input grad scale tensor (input, optional) foundInfDesc Tensor descriptor for the input found inf tensor (input, optional) foundInf Tensor indicating presence of inf or nan in gradients. If true, skips operation and step update. (input, optional)
- Returns
- miopenStatus_t