SGD

SGD#

MIOpen: SGD
SGD

Functions

miopenStatus_t miopenFusedAdam (miopenHandle_t handle, const miopenTensorDescriptor_t paramDesc, void *param, const miopenTensorDescriptor_t gradDesc, const void *grad, const miopenTensorDescriptor_t expAvgDesc, void *expAvg, const miopenTensorDescriptor_t expAvgSqDesc, void *expAvgSq, const miopenTensorDescriptor_t maxExpAvgSqDesc, void *maxExpAvgSq, const miopenTensorDescriptor_t stateStepDesc, void *stateStep, const unsigned int state_step, const float lr, const float beta1, const float beta2, const float weight_decay, const float eps, const bool amsgrad, const bool maximize, const bool adamw, const miopenTensorDescriptor_t gradScaleDesc, const void *gradScale, const miopenTensorDescriptor_t foundInfDesc, const void *foundInf)
 Perform Fused Adam optimization for a single tensor (Adaptive Moment Estimation). More...
 
miopenStatus_t miopenFusedAdamWithOutput (miopenHandle_t handle, const miopenTensorDescriptor_t paramInDesc, void *paramIn, const miopenTensorDescriptor_t paramOutDesc, void *paramOut, const miopenTensorDescriptor_t paramOutFloat16Desc, void *paramOutFloat16, const miopenTensorDescriptor_t gradInDesc, const void *gradIn, const miopenTensorDescriptor_t expAvgInDesc, void *expAvgIn, const miopenTensorDescriptor_t expAvgOutDesc, void *expAvgOut, const miopenTensorDescriptor_t expAvgSqInDesc, void *expAvgSqIn, const miopenTensorDescriptor_t expAvgSqOutDesc, void *expAvgSqOut, const miopenTensorDescriptor_t maxExpAvgSqInDesc, void *maxExpAvgSqIn, const miopenTensorDescriptor_t maxExpAvgSqOutDesc, void *maxExpAvgSqOut, const miopenTensorDescriptor_t stateStepInDesc, void *stateStepIn, const miopenTensorDescriptor_t stateStepOutDesc, void *stateStepOut, const unsigned int state_step, const float lr, const float beta1, const float beta2, const float weight_decay, const float eps, const bool amsgrad, const bool maximize, const bool adamw, const miopenTensorDescriptor_t gradScaleDesc, const void *gradScale, const miopenTensorDescriptor_t foundInfDesc, const void *foundInf)
 Execute single tensor Adam optimization and receive the result in a separate output tensor. More...
 
miopenStatus_t miopenTransformersAdamW (miopenHandle_t handle, const miopenTensorDescriptor_t paramDesc, void *param, const miopenTensorDescriptor_t gradDesc, const void *grad, const miopenTensorDescriptor_t expAvgDesc, void *expAvg, const miopenTensorDescriptor_t expAvgSqDesc, void *expAvgSq, const miopenTensorDescriptor_t stateStepDesc, void *stateStep, const unsigned int state_step, const float lr, const float beta1, const float beta2, const float weight_decay, const float eps, const bool correct_bias, const miopenTensorDescriptor_t gradScaleDesc, const void *gradScale, const miopenTensorDescriptor_t foundInfDesc, const void *foundInf)
 Implements Adam algorithm with weight decay fix as introduced in Decoupled Weight Decay Regularization. This is the fused kernel version of AdamW included in the Hugging Face Transformers module. More...
 
miopenStatus_t miopenTransformersAdamWWithOutput (miopenHandle_t handle, const miopenTensorDescriptor_t paramInDesc, void *paramIn, const miopenTensorDescriptor_t paramOutDesc, void *paramOut, const miopenTensorDescriptor_t paramOutFloat16Desc, void *paramOutFloat16, const miopenTensorDescriptor_t gradInDesc, const void *gradIn, const miopenTensorDescriptor_t expAvgInDesc, void *expAvgIn, const miopenTensorDescriptor_t expAvgOutDesc, void *expAvgOut, const miopenTensorDescriptor_t expAvgSqInDesc, void *expAvgSqIn, const miopenTensorDescriptor_t expAvgSqOutDesc, void *expAvgSqOut, const miopenTensorDescriptor_t stateStepInDesc, void *stateStepIn, const miopenTensorDescriptor_t stateStepOutDesc, void *stateStepOut, const unsigned int state_step, const float lr, const float beta1, const float beta2, const float weight_decay, const float eps, const float step_size, const bool correct_bias, const miopenTensorDescriptor_t gradScaleDesc, const void *gradScale, const miopenTensorDescriptor_t foundInfDesc, const void *foundInf)
 Execute single tensor Adam optimization and receive the result in a separate output tensor. More...
 

Detailed Description

Function Documentation

◆ miopenFusedAdam()

miopenStatus_t miopenFusedAdam ( miopenHandle_t  handle,
const miopenTensorDescriptor_t  paramDesc,
void *  param,
const miopenTensorDescriptor_t  gradDesc,
const void *  grad,
const miopenTensorDescriptor_t  expAvgDesc,
void *  expAvg,
const miopenTensorDescriptor_t  expAvgSqDesc,
void *  expAvgSq,
const miopenTensorDescriptor_t  maxExpAvgSqDesc,
void *  maxExpAvgSq,
const miopenTensorDescriptor_t  stateStepDesc,
void *  stateStep,
const unsigned int  state_step,
const float  lr,
const float  beta1,
const float  beta2,
const float  weight_decay,
const float  eps,
const bool  amsgrad,
const bool  maximize,
const bool  adamw,
const miopenTensorDescriptor_t  gradScaleDesc,
const void *  gradScale,
const miopenTensorDescriptor_t  foundInfDesc,
const void *  foundInf 
)

Perform Fused Adam optimization for a single tensor (Adaptive Moment Estimation).

This function implements the Fused Adam optimization algorithm. Adam, short for Adaptive Moment Estimation, extends the RMSProp optimizer. It combines the advantages of AdaGrad and RMSProp by adaptively adjusting learning rates for each parameter using the first and second moments of gradients. Fused Adam optimization efficiently combines multiple operations into a single kernel, reducing memory access overhead and improving performance.

Additionally, Fused Adam can be utilized in both adam w and Automatic Mixed Precision (AMP), enabling accelerated model training and reduced memory consumption. AMP supports FP16 computation, optimizing model calculations using a mixture of FP32 and FP16 precision to enhance training speed. When utilizing AMP, FoundInf, ScaleGrad, and step tensors should be employed. In AMP mode, the execution of Adam is determined based on the FoundInf value. State Step accepts both int values and int tensors. If a Step tensor is employed, the step received as an int is disregarded, and if Adam is executed, the step tensor is incremented by 1.

// Execute Adam
paramDesc,
param,
gradDesc,
grad,
expAvgDesc,
expAvg,
expAvgSqDesc,
expAvgSq,
NULL, // Unused maxExpAvgSqDesc because amsgrad is false
NULL,
NULL, // Unused stateStep Tensor because use step integer argument
NULL,
step,
lr,
beta1,
beta2,
weight_decay,
eps,
false, // amsgrad
false, // maximize
false, // adamw
NULL, // Unused gradScale Tensor because not amp
NULL,
NULL, // Unused foundInf Tensor because not amp
NULL);
// Execute AdamW
paramDesc,
param,
gradDesc,
grad,
expAvgDesc,
expAvg,
expAvgSqDesc,
expAvgSq,
NULL, // Unused maxExpAvgSqDesc because amsgrad is false
NULL,
NULL, // Unused stateStep Tensor because use step integer argument
NULL,
step,
lr,
beta1,
beta2,
weight_decay,
eps,
false, // amsgrad
false, // maximize
true, // adamw
NULL, // Unused gradScale Tensor because not amp
NULL,
NULL, // Unused foundInf Tensor because not amp
NULL);
// Execute AMP Adam
paramDesc,
param,
gradDesc,
grad,
expAvgDesc,
expAvg,
expAvgSqDesc,
expAvgSq,
NULL, // Unused maxExpAvgSqDesc because amsgrad is false
NULL,
stateStepDesc,
stateStep,
-1, // Ignore step value because stateStep Tensor is used
lr,
beta1,
beta2,
weight_decay,
eps,
false, // amsgrad
false, // maximize
false, // adamw
gradScaleDesc,
gradScale,
foundInfDesc,
foundInf);
miopenStatus_t miopenFusedAdam(miopenHandle_t handle, const miopenTensorDescriptor_t paramDesc, void *param, const miopenTensorDescriptor_t gradDesc, const void *grad, const miopenTensorDescriptor_t expAvgDesc, void *expAvg, const miopenTensorDescriptor_t expAvgSqDesc, void *expAvgSq, const miopenTensorDescriptor_t maxExpAvgSqDesc, void *maxExpAvgSq, const miopenTensorDescriptor_t stateStepDesc, void *stateStep, const unsigned int state_step, const float lr, const float beta1, const float beta2, const float weight_decay, const float eps, const bool amsgrad, const bool maximize, const bool adamw, const miopenTensorDescriptor_t gradScaleDesc, const void *gradScale, const miopenTensorDescriptor_t foundInfDesc, const void *foundInf)
Perform Fused Adam optimization for a single tensor (Adaptive Moment Estimation).
Parameters
handleMIOpen handle (input)
paramDescTensor descriptor for the input parameter tensor (input)
paramInput parameter tensor (input)
gradDescTensor descriptor for the input gradient tensor (input)
gradInput gradient tensor (input)
expAvgDescTensor descriptor for the input exponential moving average tensor (input)
expAvgInput exponential moving average tensor (input)
expAvgSqDescTensor descriptor for the input exponential moving average squared tensor (input)
expAvgSqInput exponential moving average squared tensor (input)
maxExpAvgSqDescTensor descriptor for the input maximum exponential moving average squared tensor. Used when amsgrad is true (input, optional)
maxExpAvgSqInput maximum exponential moving average squared tensor. Used when amsgrad is true (input, optional)
stateStepDescTensor descriptor for the input state step tensor (input)
stateStepInput state step tensor (input)
state_stepInput state step. used when the step tensor is null (input)
lrLearning rate (input)
beta1Coefficient used for computing the first moment running average of gradient (input)
beta2Coefficient used for computing the second moment running average of gradient (input)
weight_decayWeight decay (input)
epsTerm added to the denominator to improve numerical stability (input)
amsgradFlag indicating whether to use the AMSGrad variant of Adam (input)
maximizeFlag indicating whether to maximize the objective with respect to the parameters (input)
adamwIf true, the operation becomes AdamW (input)
gradScaleDescTensor descriptor for the input grad scale tensor (input, optional)
gradScaleInput grad scale tensor (input, optional)
foundInfDescTensor descriptor for the input found inf tensor (input, optional)
foundInfTensor indicating the presence of inf or NaN in gradients. If true, skips operation and step update (input, optional)
Returns
miopenStatus_t
Examples
/home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-miopen/checkouts/docs-6.3.0/include/miopen/miopen.h.

◆ miopenFusedAdamWithOutput()

miopenStatus_t miopenFusedAdamWithOutput ( miopenHandle_t  handle,
const miopenTensorDescriptor_t  paramInDesc,
void *  paramIn,
const miopenTensorDescriptor_t  paramOutDesc,
void *  paramOut,
const miopenTensorDescriptor_t  paramOutFloat16Desc,
void *  paramOutFloat16,
const miopenTensorDescriptor_t  gradInDesc,
const void *  gradIn,
const miopenTensorDescriptor_t  expAvgInDesc,
void *  expAvgIn,
const miopenTensorDescriptor_t  expAvgOutDesc,
void *  expAvgOut,
const miopenTensorDescriptor_t  expAvgSqInDesc,
void *  expAvgSqIn,
const miopenTensorDescriptor_t  expAvgSqOutDesc,
void *  expAvgSqOut,
const miopenTensorDescriptor_t  maxExpAvgSqInDesc,
void *  maxExpAvgSqIn,
const miopenTensorDescriptor_t  maxExpAvgSqOutDesc,
void *  maxExpAvgSqOut,
const miopenTensorDescriptor_t  stateStepInDesc,
void *  stateStepIn,
const miopenTensorDescriptor_t  stateStepOutDesc,
void *  stateStepOut,
const unsigned int  state_step,
const float  lr,
const float  beta1,
const float  beta2,
const float  weight_decay,
const float  eps,
const bool  amsgrad,
const bool  maximize,
const bool  adamw,
const miopenTensorDescriptor_t  gradScaleDesc,
const void *  gradScale,
const miopenTensorDescriptor_t  foundInfDesc,
const void *  foundInf 
)

Execute single tensor Adam optimization and receive the result in a separate output tensor.

This function is equivalent to miopenFusedAdam but receives the result in a separate output tensor.

See also
miopenFusedAdam
// Execute Adam
paramInDesc,
paramIn,
paramOutDesc,
paramOut,
NULL, // Unused paramOutFloat16 tensor because is not amp
NULL,
gradInDesc,
gradIn,
expAvgInDesc,
expAvgIn,
expAvgOutDesc,
expAvgOut,
expAvgInSqDesc,
expAvgSqIn,
expAvgSqOutDesc,
expAvgSqOut,
NULL, // Unused maxExpAvgSqIn tensor because amsgrad is false
NULL,
NULL, // Unused maxExpAvgSqOut tensor because amsgrad is false
NULL,
NULL, // Unused stateStepIn tensor because use step integer argument
NULL,
NULL, // Unused stateStepOut tensor because use step integer argument
NULL,
step,
lr,
beta1,
beta2,
weight_decay,
eps,
false, // amsgrad
false, // maximize
false, // adamw
NULL, // Unused gradScale Tensor because not amp
NULL,
NULL, // Unused foundInf Tensor because not amp
NULL);
// Execute Amp Adam
paramInDesc,
paramIn,
paramOutDesc,
paramOut,
paramOutFloat16Desc, // paramOutFloat16 tensor is optional in amp
paramOutFloat16,
gradInDesc,
gradIn,
expAvgInDesc,
expAvgIn,
expAvgOutDesc,
expAvgOut,
expAvgInSqDesc,
expAvgSqIn,
expAvgSqIn,
expAvgSqOutDesc,
expAvgSqOut,
NULL, // Unused maxExpAvgSqIn tensor because amsgrad is false
NULL,
NULL, // Unused maxExpAvgSqOut tensor because amsgrad is false
NULL,
stateStepInDesc,
stateStepIn,
stateStepOutDesc,
stateStepOut
-1, // Ignore step value because stateStep Tensor is used
lr, beta1, beta2, weight_decay, eps,
false, // amsgrad
false, // maximize
false, // adamw
gradScaleDesc,
gradScale,
foundInfDesc,
foundInf);
miopenStatus_t miopenFusedAdamWithOutput(miopenHandle_t handle, const miopenTensorDescriptor_t paramInDesc, void *paramIn, const miopenTensorDescriptor_t paramOutDesc, void *paramOut, const miopenTensorDescriptor_t paramOutFloat16Desc, void *paramOutFloat16, const miopenTensorDescriptor_t gradInDesc, const void *gradIn, const miopenTensorDescriptor_t expAvgInDesc, void *expAvgIn, const miopenTensorDescriptor_t expAvgOutDesc, void *expAvgOut, const miopenTensorDescriptor_t expAvgSqInDesc, void *expAvgSqIn, const miopenTensorDescriptor_t expAvgSqOutDesc, void *expAvgSqOut, const miopenTensorDescriptor_t maxExpAvgSqInDesc, void *maxExpAvgSqIn, const miopenTensorDescriptor_t maxExpAvgSqOutDesc, void *maxExpAvgSqOut, const miopenTensorDescriptor_t stateStepInDesc, void *stateStepIn, const miopenTensorDescriptor_t stateStepOutDesc, void *stateStepOut, const unsigned int state_step, const float lr, const float beta1, const float beta2, const float weight_decay, const float eps, const bool amsgrad, const bool maximize, const bool adamw, const miopenTensorDescriptor_t gradScaleDesc, const void *gradScale, const miopenTensorDescriptor_t foundInfDesc, const void *foundInf)
Execute single tensor Adam optimization and receive the result in a separate output tensor.
Parameters
handleMIOpen handle (input)
paramInDescTensor descriptor for the input parameter tensor (input)
paramInInput parameter tensor (input)
paramOutDescTensor descriptor for the output parameter tensor (input)
paramOutOutput parameter tensor (output)
paramOutFloat16DescTensor descriptor for the output parameter tensor float16 (input, optional)
paramOutFloat16Output parameter tensor (output, optional)
gradInDescTensor descriptor for the input gradient tensor (input)
gradInInput gradient tensor (input)
expAvgInDescTensor descriptor for the input exponential moving average tensor (input)
expAvgInInput exponential moving average tensor (input)
expAvgOutDescTensor descriptor for the output exponential moving average tensor (input)
expAvgOutOutput exponential moving average tensor (output)
expAvgSqInDescTensor descriptor for the input exponential moving average squared tensor (input)
expAvgSqInInput exponential moving average squared tensor (input)
expAvgSqOutDescTensor descriptor for the output exponential moving average squared tensor (input)
expAvgSqOutOutput exponential moving average squared tensor (output)
maxExpAvgSqInDescTensor descriptor for the input maximum exponential moving average squared tensor. Used when amsgrad is true (input, optional)
maxExpAvgSqInInput maximum exponential moving average squared tensor. Used when amsgrad is true (input, optional)
maxExpAvgSqOutDescTensor descriptor for the output maximum exponential moving average squared tensor. Used when amsgrad is true (input, optional)
maxExpAvgSqOutOutput maximum exponential moving average squared tensor. Used when amsgrad is true (output, optional)
stateStepInDescTensor descriptor for the input state step tensor (input, optional)
stateStepInInput state step tensor (input, optional)
stateStepOutDescTensor descriptor for the output state step tensor (input, optional)
stateStepOutOutput state step tensor that stores the updated step value. (output, optional)
state_stepInput state step, It is used when the step tensor is null. (input)
lrLearning rate (input)
beta1Coefficient used for computing the first moment running average of gradient (input)
beta2Coefficient used for computing the second moment running average of gradient (input)
weight_decayWeight decay (input)
epsTerm added to the denominator to improve numerical stability (input)
amsgradFlag indicating whether to use the AMSGrad variant of Adam (input)
maximizeFlag indicating whether to maximize the objective with respect to the parameters (input)
adamwIf it is true, the operation becomes AdamW (input)
gradScaleDescTensor descriptor for the input grad scale tensor (input, optional)
gradScaleInput grad scale tensor (input, optional)
foundInfDescTensor descriptor for the input found inf tensor (input, optional)
foundInfTensor indicating presence of inf or nan in gradients. If true, skips operation and step update. (input, optional)
Returns
miopenStatus_t
Examples
/home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-miopen/checkouts/docs-6.3.0/include/miopen/miopen.h.

◆ miopenTransformersAdamW()

miopenStatus_t miopenTransformersAdamW ( miopenHandle_t  handle,
const miopenTensorDescriptor_t  paramDesc,
void *  param,
const miopenTensorDescriptor_t  gradDesc,
const void *  grad,
const miopenTensorDescriptor_t  expAvgDesc,
void *  expAvg,
const miopenTensorDescriptor_t  expAvgSqDesc,
void *  expAvgSq,
const miopenTensorDescriptor_t  stateStepDesc,
void *  stateStep,
const unsigned int  state_step,
const float  lr,
const float  beta1,
const float  beta2,
const float  weight_decay,
const float  eps,
const bool  correct_bias,
const miopenTensorDescriptor_t  gradScaleDesc,
const void *  gradScale,
const miopenTensorDescriptor_t  foundInfDesc,
const void *  foundInf 
)

Implements Adam algorithm with weight decay fix as introduced in Decoupled Weight Decay Regularization. This is the fused kernel version of AdamW included in the Hugging Face Transformers module.

See also
miopenFusedAdam
// Execute Adam
paramDesc,
param,
gradDesc,
grad,
expAvgDesc,
expAvg,
expAvgSqDesc,
expAvgSq,
NULL, // Unused stateStep Tensor because use step integer argument
NULL,
step,
lr,
beta1,
beta2,
weight_decay,
eps,
true, // correct_bias
NULL, // Unused gradScale Tensor because not amp
NULL,
NULL, // Unused foundInf Tensor because not amp
NULL);
// Execute AMP Adam
paramDesc,
param,
gradDesc,
grad,
expAvgDesc,
expAvg,
expAvgSqDesc,
expAvgSq,
stateStepDesc,
stateStep,
-1, // Ignore step value because stateStep Tensor is used
lr,
beta1,
beta2,
weight_decay,
eps,
true, // correct_bias
gradScaleDesc,
gradScale,
foundInfDesc,
foundInf);
miopenStatus_t miopenTransformersAdamW(miopenHandle_t handle, const miopenTensorDescriptor_t paramDesc, void *param, const miopenTensorDescriptor_t gradDesc, const void *grad, const miopenTensorDescriptor_t expAvgDesc, void *expAvg, const miopenTensorDescriptor_t expAvgSqDesc, void *expAvgSq, const miopenTensorDescriptor_t stateStepDesc, void *stateStep, const unsigned int state_step, const float lr, const float beta1, const float beta2, const float weight_decay, const float eps, const bool correct_bias, const miopenTensorDescriptor_t gradScaleDesc, const void *gradScale, const miopenTensorDescriptor_t foundInfDesc, const void *foundInf)
Implements Adam algorithm with weight decay fix as introduced in Decoupled Weight Decay Regularizatio...
Parameters
handleMIOpen handle (input)
paramDescTensor descriptor for the input parameter tensor (input)
paramInput parameter tensor (input)
gradDescTensor descriptor for the input gradient tensor (input)
gradInput gradient tensor (input)
expAvgDescTensor descriptor for the input exponential moving average tensor (input)
expAvgInput exponential moving average tensor (input)
expAvgSqDescTensor descriptor for the input exponential moving average squared tensor (input)
expAvgSqInput exponential moving average squared tensor (input)
stateStepDescTensor descriptor for the input state step tensor (input)
stateStepInput state step tensor (input)
state_stepInput state step. used when the step tensor is null (input)
lrLearning rate (input)
beta1Coefficient used for computing the first moment running average of gradient (input)
beta2Coefficient used for computing the second moment running average of gradient (input)
weight_decayWeight decay (input)
epsTerm added to the denominator to improve numerical stability (input)
correct_biasWhether or not to correct bias in Adam (for instance, in Bert TF repository they use False).
gradScaleDescTensor descriptor for the input grad scale tensor (input, optional)
gradScaleInput grad scale tensor (input, optional)
foundInfDescTensor descriptor for the input found inf tensor (input, optional)
foundInfTensor indicating the presence of inf or NaN in gradients. If true, skips operation and step update (input, optional)
Returns
miopenStatus_t
Examples
/home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-miopen/checkouts/docs-6.3.0/include/miopen/miopen.h.

◆ miopenTransformersAdamWWithOutput()

miopenStatus_t miopenTransformersAdamWWithOutput ( miopenHandle_t  handle,
const miopenTensorDescriptor_t  paramInDesc,
void *  paramIn,
const miopenTensorDescriptor_t  paramOutDesc,
void *  paramOut,
const miopenTensorDescriptor_t  paramOutFloat16Desc,
void *  paramOutFloat16,
const miopenTensorDescriptor_t  gradInDesc,
const void *  gradIn,
const miopenTensorDescriptor_t  expAvgInDesc,
void *  expAvgIn,
const miopenTensorDescriptor_t  expAvgOutDesc,
void *  expAvgOut,
const miopenTensorDescriptor_t  expAvgSqInDesc,
void *  expAvgSqIn,
const miopenTensorDescriptor_t  expAvgSqOutDesc,
void *  expAvgSqOut,
const miopenTensorDescriptor_t  stateStepInDesc,
void *  stateStepIn,
const miopenTensorDescriptor_t  stateStepOutDesc,
void *  stateStepOut,
const unsigned int  state_step,
const float  lr,
const float  beta1,
const float  beta2,
const float  weight_decay,
const float  eps,
const float  step_size,
const bool  correct_bias,
const miopenTensorDescriptor_t  gradScaleDesc,
const void *  gradScale,
const miopenTensorDescriptor_t  foundInfDesc,
const void *  foundInf 
)

Execute single tensor Adam optimization and receive the result in a separate output tensor.

This function is equivalent to miopenTransformersAdam but receives the result in a separate output tensor.

See also
miopenTransformersAdamW
miopenFusedAdamWithOutput
// Execute Adam
paramInDesc,
paramIn,
paramOutDesc,
paramOut,
NULL, // Unused paramOutFloat16 tensor because is not amp
NULL,
gradInDesc,
gradIn,
expAvgInDesc,
expAvgIn,
expAvgOutDesc,
expAvgOut,
expAvgInSqDesc,
expAvgSqIn,
expAvgSqOutDesc,
expAvgSqOut,
NULL, // Unused stateStepIn tensor because use step int
NULL,
NULL, // Unused stateStepOut tensor because use step int
NULL,
step,
lr,
beta1,
beta2,
weight_decay,
eps,
-1, // step_size
true, // correct_bias
NULL, // Unused gradScale Tensor because not amp
NULL,
NULL, // Unused foundInf Tensor because not amp
NULL);
// Execute Amp Adam
paramInDesc,
paramIn,
paramOutDesc,
paramOut,
paramOutFloat16Desc, // optional in amp
paramOutFloat16,
gradInDesc,
gradIn,
expAvgInDesc,
expAvgIn,
expAvgOutDesc,
expAvgOut,
expAvgInSqDesc,
expAvgSqIn,
expAvgSqIn,
expAvgSqOutDesc,
expAvgSqOut,
stateStepInDesc,
stateStepIn,
stateStepOutDesc,
stateStepOut
-1, // Ignore step value because stateStep Tensor is used
lr,
beta1,
beta2,
weight_decay,
eps,
-1, // step_size
true, // correct_bias
NULL, // Unused gradScale Tensor because not amp
NULL,
NULL, // Unused foundInf Tensor because not amp
NULL);
miopenStatus_t miopenTransformersAdamWWithOutput(miopenHandle_t handle, const miopenTensorDescriptor_t paramInDesc, void *paramIn, const miopenTensorDescriptor_t paramOutDesc, void *paramOut, const miopenTensorDescriptor_t paramOutFloat16Desc, void *paramOutFloat16, const miopenTensorDescriptor_t gradInDesc, const void *gradIn, const miopenTensorDescriptor_t expAvgInDesc, void *expAvgIn, const miopenTensorDescriptor_t expAvgOutDesc, void *expAvgOut, const miopenTensorDescriptor_t expAvgSqInDesc, void *expAvgSqIn, const miopenTensorDescriptor_t expAvgSqOutDesc, void *expAvgSqOut, const miopenTensorDescriptor_t stateStepInDesc, void *stateStepIn, const miopenTensorDescriptor_t stateStepOutDesc, void *stateStepOut, const unsigned int state_step, const float lr, const float beta1, const float beta2, const float weight_decay, const float eps, const float step_size, const bool correct_bias, const miopenTensorDescriptor_t gradScaleDesc, const void *gradScale, const miopenTensorDescriptor_t foundInfDesc, const void *foundInf)
Execute single tensor Adam optimization and receive the result in a separate output tensor.
Parameters
handleMIOpen handle (input)
paramInDescTensor descriptor for the input parameter tensor (input)
paramInInput parameter tensor (input)
paramOutDescTensor descriptor for the output parameter tensor (input)
paramOutOutput parameter tensor (output)
paramOutFloat16DescTensor descriptor for the output parameter tensor float16 (input, optional)
paramOutFloat16Output parameter tensor (output, optional)
gradInDescTensor descriptor for the input gradient tensor (input)
gradInInput gradient tensor (input)
expAvgInDescTensor descriptor for the input exponential moving average tensor (input)
expAvgInInput exponential moving average tensor (input)
expAvgOutDescTensor descriptor for the output exponential moving average tensor (input)
expAvgOutOutput exponential moving average tensor (output)
expAvgSqInDescTensor descriptor for the input exponential moving average squared tensor (input)
expAvgSqInInput exponential moving average squared tensor (input)
expAvgSqOutDescTensor descriptor for the output exponential moving average squared tensor (input)
expAvgSqOutOutput exponential moving average squared tensor (output)
stateStepInDescTensor descriptor for the input state step tensor (input, optional)
stateStepInInput state step tensor (input, optional)
stateStepOutDescTensor descriptor for the output state step tensor (input, optional)
stateStepOutOutput state step tensor that stores the updated step value. (output, optional)
state_stepInput state step, It is used when the step tensor is null. (input)
lrLearning rate (input)
beta1Coefficient used for computing the first moment running average of gradient (input)
beta2Coefficient used for computing the second moment running average of gradient (input)
weight_decayWeight decay (input)
epsTerm added to the denominator to improve numerical stability (input)
step_sizePre-calculated step_size, used for performance enhancement (input)
correct_biasWhether or not to correct bias in Adam (for instance, in Bert TF repository they use False) (input)
gradScaleDescTensor descriptor for the input grad scale tensor (input, optional)
gradScaleInput grad scale tensor (input, optional)
foundInfDescTensor descriptor for the input found inf tensor (input, optional)
foundInfTensor indicating presence of inf or nan in gradients. If true, skips operation and step update. (input, optional)
Returns
miopenStatus_t
Examples
/home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-miopen/checkouts/docs-6.3.0/include/miopen/miopen.h.