The question says it all. Since Adam is performing good with most of the datasets, I wanna try momentum tuning for Adam optimizer. So far I only find momentum option for SGD in Keras
An optimizer is one of the two arguments required for compiling a Keras model: from tensorflow import keras from tensorflow.keras import layers model = keras.Sequential() model.add(layers.Dense(64, kernel_initializer='uniform', input_shape=(10,))) model.add(layers.Activation('softmax')) opt = keras.optimizers.Adam(learning_rate=0.01) model.
So far I only find momentum option for SGD in Keras Adam, by concept, has already something like momentum. Adding an outer layer momentum, i would not call it Adam anymore (and it's unclear if it's a good idea; probably not). Short answer: no, neither in Keras nor in Tensorflow [EDIT: see UPDATE at the end]
Currently compared to the Tensorflow model the Keras model completly underperforms. The loss is much higher and decreases slower compared to the original model. My best guess is that I am using the wrong Optimizer. In the Tensorflow code the Optimizer looks like this:
Since Adam is performing good with most of the datasets, I wanna try momentum tuning for Adam optimizer. So far I only find momentum option for SGD in Keras Adam, by concept, has already something like momentum. Adding an outer layer momentum, i would not call it Adam anymore (and it's unclear if it's a good idea; probably not).
Short answer: no, neither in Keras nor in Tensorflow [EDIT: see UPDATE at the end]
Long answer: as already mentioned in the comments, Adam already incorporates something like momentum. Here is some relevant corroboration:
From the highly recommended An overview of gradient descent optimization algorithms (available also as a paper):
In addition to storing an exponentially decaying average of past squared gradients u[t] like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients m[t], similar to momentum
From Stanford CS231n: CNNs for Visual Recognition:
Adam is a recently proposed update that looks a bit like RMSProp with momentum
Notice that some frameworks actually include a momentum
parameter for Adam, but this is actually the beta1
parameter; here is CNTK:
momentum (float, list, output of
momentum_schedule()
) – momentum schedule. Note that this is the beta1 parameter in the Adam paper. For additional information, please refer to the this CNTK Wiki article.
That said, there is an ICLR 2016 paper titled Incorporating Nesterov momentum into Adam, along with an implementation skeleton in Tensorflow by the author - cannot offer any opinion on this, though.
UPDATE: Keras indeed includes now an optimizer called Nadam
, based on the ICLR 2016 paper mentioned above; from the docs:
Much like Adam is essentially RMSprop with momentum, Nadam is Adam RMSprop with Nesterov momentum.
It is also included in Tensorflow as a contributed module NadamOptimizer
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With