Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a momentum option for Adam optimizer in Keras? [closed]

The question says it all. Since Adam is performing good with most of the datasets, I wanna try momentum tuning for Adam optimizer. So far I only find momentum option for SGD in Keras

like image 943
Tuan Do Avatar asked Nov 07 '17 22:11

Tuan Do


People also ask

What is an optimizer in keras?

An optimizer is one of the two arguments required for compiling a Keras model: from tensorflow import keras from tensorflow.keras import layers model = keras.Sequential() model.add(layers.Dense(64, kernel_initializer='uniform', input_shape=(10,))) model.add(layers.Activation('softmax')) opt = keras.optimizers.Adam(learning_rate=0.01) model.

Is there a momentum option for SGD in keras?

So far I only find momentum option for SGD in Keras Adam, by concept, has already something like momentum. Adding an outer layer momentum, i would not call it Adam anymore (and it's unclear if it's a good idea; probably not). Short answer: no, neither in Keras nor in Tensorflow [EDIT: see UPDATE at the end]

Why does my keras model perform so poorly compared to TensorFlow?

Currently compared to the Tensorflow model the Keras model completly underperforms. The loss is much higher and decreases slower compared to the original model. My best guess is that I am using the wrong Optimizer. In the Tensorflow code the Optimizer looks like this:

Is momentum tuning for Adam optimizer possible?

Since Adam is performing good with most of the datasets, I wanna try momentum tuning for Adam optimizer. So far I only find momentum option for SGD in Keras Adam, by concept, has already something like momentum. Adding an outer layer momentum, i would not call it Adam anymore (and it's unclear if it's a good idea; probably not).


1 Answers

Short answer: no, neither in Keras nor in Tensorflow [EDIT: see UPDATE at the end]

Long answer: as already mentioned in the comments, Adam already incorporates something like momentum. Here is some relevant corroboration:

From the highly recommended An overview of gradient descent optimization algorithms (available also as a paper):

In addition to storing an exponentially decaying average of past squared gradients u[t] like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients m[t], similar to momentum

From Stanford CS231n: CNNs for Visual Recognition:

Adam is a recently proposed update that looks a bit like RMSProp with momentum

Notice that some frameworks actually include a momentum parameter for Adam, but this is actually the beta1 parameter; here is CNTK:

momentum (float, list, output of momentum_schedule()) – momentum schedule. Note that this is the beta1 parameter in the Adam paper. For additional information, please refer to the this CNTK Wiki article.

That said, there is an ICLR 2016 paper titled Incorporating Nesterov momentum into Adam, along with an implementation skeleton in Tensorflow by the author - cannot offer any opinion on this, though.

UPDATE: Keras indeed includes now an optimizer called Nadam, based on the ICLR 2016 paper mentioned above; from the docs:

Much like Adam is essentially RMSprop with momentum, Nadam is Adam RMSprop with Nesterov momentum.

It is also included in Tensorflow as a contributed module NadamOptimizer.

like image 193
desertnaut Avatar answered Sep 18 '22 11:09

desertnaut