I've just discovered a TensorFlow code that uses this operation for training. How does it help the variable training process?
Maintains moving averages of variables by employing an exponential decay.
When training a model, it is often beneficial to maintain moving averages of the trained parameters. Evaluations that use averaged parameters sometimes produce significantly better results than the final trained values.
doc: https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage?version=stable
You might want to look into the docs of class tf.train.ExponentialMovingAverage
:
Some training algorithms, such as GradientDescent and Momentum often benefit from maintaining a moving average of variables during optimization. Using the moving averages for evaluations often improve results significantly.
Maintains moving averages of variables by employing an exponential decay.
Explanations:
I have an issue here. Considering the TensorFlow official documentation for tf.train.ExponentialMovingAverage, the formula for updating the shadow variable
is as follows:
shadow_variable = decay * shadow_variable + (1 - decay) * variable
But taking a look at Wikipedia link for moving average concept, the update operation looks like to be as follows:
shadow_variable = (1 - decay) * shadow_variable + decay * variable
Which one is correct? I think Wikipedia link for Wikipedia documentation for moving average is more comprehensive. However, I am not so sure about my aforementioned argument.
According to me:
shadow_variable = decay * shadow_variable + (1 - decay) * variable
is correct one. I have computed the ExponentailMovingAverage with the given data:
In [31]: sess.run([a,b])
Out[31]:
[array([[-2.0687273 , -0.43363234],
[ 0.40200853, 0.02875281]], dtype=float32),
array([[-0.31468132, -0.69469845],
[ 2.0624537 , -0.25533926]], dtype=float32)]
with the shadow variables:
In [28]: sess.run([ema_a,ema_b])
Out[28]:
[array([[-1.0375735, -1.0736414],
[ 0.0657308, -0.668182 ]], dtype=float32),
array([[-0.31468132, -0.69469845],
[ 2.0624537 , -0.25533926]], dtype=float32)]
then I applied the ema_operation to compute new shadow variables. The result comes as follows:
In [29]: sess.run(ema_apply_op)
In [30]: sess.run([ema_a,ema_b])
Out[30]:
[array([[-1.1406889 , -1.0096405 ],
[ 0.09935857, -0.5984885 ]], dtype=float32),
array([[-0.31468132, -0.69469845],
[ 2.0624537 , -0.25533926]], dtype=float32)]
Note: You can calculate these values, then you will come to know that
shadow_variable = decay * shadow_variable + (1 - decay) * variable
is used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With