What does tf.train.ExponentialMovingAverage do?

Question

I've just discovered a TensorFlow code that uses this operation for training. How does it help the variable training process?

Ishant Mrinal · Accepted Answer

Maintains moving averages of variables by employing an exponential decay.

When training a model, it is often beneficial to maintain moving averages of the trained parameters. Evaluations that use averaged parameters sometimes produce significantly better results than the final trained values.

doc: https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage?version=stable

Martin Thoma · Answer

You might want to look into the docs of class tf.train.ExponentialMovingAverage:

Some training algorithms, such as GradientDescent and Momentum often benefit from maintaining a moving average of variables during optimization. Using the moving averages for evaluations often improve results significantly.

Maintains moving averages of variables by employing an exponential decay.

Explanations:

Moving average
Exponential decay: A learning rate scheduling algorithm

amirsina torfi · Answer

I have an issue here. Considering the TensorFlow official documentation for tf.train.ExponentialMovingAverage, the formula for updating the shadow variable is as follows:

shadow_variable = decay * shadow_variable + (1 - decay) * variable

But taking a look at Wikipedia link for moving average concept, the update operation looks like to be as follows:

shadow_variable = (1 - decay) * shadow_variable + decay * variable

Which one is correct? I think Wikipedia link for Wikipedia documentation for moving average is more comprehensive. However, I am not so sure about my aforementioned argument.

Noorul Hasan · Answer

According to me:

shadow_variable = decay * shadow_variable + (1 - decay) * variable

is correct one. I have computed the ExponentailMovingAverage with the given data:

In [31]: sess.run([a,b])
Out[31]:
[array([[-2.0687273 , -0.43363234],
        [ 0.40200853,  0.02875281]], dtype=float32),
 array([[-0.31468132, -0.69469845],
        [ 2.0624537 , -0.25533926]], dtype=float32)]

with the shadow variables:

In [28]: sess.run([ema_a,ema_b])
Out[28]:
[array([[-1.0375735, -1.0736414],
        [ 0.0657308, -0.668182 ]], dtype=float32),
 array([[-0.31468132, -0.69469845],
        [ 2.0624537 , -0.25533926]], dtype=float32)]

then I applied the ema_operation to compute new shadow variables. The result comes as follows:

In [29]: sess.run(ema_apply_op)

In [30]: sess.run([ema_a,ema_b])
Out[30]:
[array([[-1.1406889 , -1.0096405 ],
        [ 0.09935857, -0.5984885 ]], dtype=float32),
 array([[-0.31468132, -0.69469845],
        [ 2.0624537 , -0.25533926]], dtype=float32)]

Note: You can calculate these values, then you will come to know that

shadow_variable = decay * shadow_variable + (1 - decay) * variable

is used.

What does tf.train.ExponentialMovingAverage do?

Tags:

machine-learning

tensorflow

luongminh97

4 Answers

Ishant Mrinal

Martin Thoma

amirsina torfi

Noorul Hasan

Recent Activity

Donate For Us

What does tf.train.ExponentialMovingAverage do?

Tags:

machine-learning

tensorflow

luongminh97

4 Answers

Ishant Mrinal

Martin Thoma

amirsina torfi

Noorul Hasan

Related questions

Recent Activity

Donate For Us