Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does tf.train.ExponentialMovingAverage do?

I've just discovered a TensorFlow code that uses this operation for training. How does it help the variable training process?

like image 903
luongminh97 Avatar asked Jul 22 '16 09:07

luongminh97


4 Answers

Maintains moving averages of variables by employing an exponential decay.

When training a model, it is often beneficial to maintain moving averages of the trained parameters. Evaluations that use averaged parameters sometimes produce significantly better results than the final trained values.

doc: https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage?version=stable

like image 95
Ishant Mrinal Avatar answered Oct 03 '22 03:10

Ishant Mrinal


You might want to look into the docs of class tf.train.ExponentialMovingAverage:

Some training algorithms, such as GradientDescent and Momentum often benefit from maintaining a moving average of variables during optimization. Using the moving averages for evaluations often improve results significantly.

Maintains moving averages of variables by employing an exponential decay.

Explanations:

  • Moving average
  • Exponential decay: A learning rate scheduling algorithm
like image 28
Martin Thoma Avatar answered Oct 03 '22 02:10

Martin Thoma


I have an issue here. Considering the TensorFlow official documentation for tf.train.ExponentialMovingAverage, the formula for updating the shadow variable is as follows:

shadow_variable = decay * shadow_variable + (1 - decay) * variable

But taking a look at Wikipedia link for moving average concept, the update operation looks like to be as follows:

shadow_variable = (1 - decay) * shadow_variable + decay * variable

Which one is correct? I think Wikipedia link for Wikipedia documentation for moving average is more comprehensive. However, I am not so sure about my aforementioned argument.

like image 45
amirsina torfi Avatar answered Oct 03 '22 02:10

amirsina torfi


According to me:

shadow_variable = decay * shadow_variable + (1 - decay) * variable

is correct one. I have computed the ExponentailMovingAverage with the given data:

In [31]: sess.run([a,b])
Out[31]:
[array([[-2.0687273 , -0.43363234],
        [ 0.40200853,  0.02875281]], dtype=float32),
 array([[-0.31468132, -0.69469845],
        [ 2.0624537 , -0.25533926]], dtype=float32)]

with the shadow variables:

In [28]: sess.run([ema_a,ema_b])
Out[28]:
[array([[-1.0375735, -1.0736414],
        [ 0.0657308, -0.668182 ]], dtype=float32),
 array([[-0.31468132, -0.69469845],
        [ 2.0624537 , -0.25533926]], dtype=float32)]

then I applied the ema_operation to compute new shadow variables. The result comes as follows:

In [29]: sess.run(ema_apply_op)

In [30]: sess.run([ema_a,ema_b])
Out[30]:
[array([[-1.1406889 , -1.0096405 ],
        [ 0.09935857, -0.5984885 ]], dtype=float32),
 array([[-0.31468132, -0.69469845],
        [ 2.0624537 , -0.25533926]], dtype=float32)]

Note: You can calculate these values, then you will come to know that

shadow_variable = decay * shadow_variable + (1 - decay) * variable

is used.

like image 26
Noorul Hasan Avatar answered Oct 03 '22 04:10

Noorul Hasan