For example, the implementation of Keras' Adagrad has been:
class Adagrad(Optimizer):
"""Adagrad optimizer.
It is recommended to leave the parameters of this optimizer
at their default values.
# Arguments
lr: float >= 0. Learning rate.
epsilon: float >= 0.
decay: float >= 0. Learning rate decay over each update.
# References
- [Adaptive Subgradient Methods for Online Learning and Stochastic Optimization](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
"""
def __init__(self, lr=0.01, epsilon=1e-8, decay=0., **kwargs):
super(Adagrad, self).__init__(**kwargs)
self.lr = K.variable(lr)
self.epsilon = epsilon
self.decay = K.variable(decay)
self.initial_decay = decay
self.iterations = K.variable(0.)
def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
shapes = [K.get_variable_shape(p) for p in params]
accumulators = [K.zeros(shape) for shape in shapes]
self.weights = accumulators
self.updates = []
lr = self.lr
if self.initial_decay > 0:
lr *= (1. / (1. + self.decay * self.iterations))
self.updates.append(K.update_add(self.iterations, 1))
for p, g, a in zip(params, grads, accumulators):
new_a = a + K.square(g) # update accumulator
self.updates.append(K.update(a, new_a))
new_p = p - lr * g / (K.sqrt(new_a) + self.epsilon)
# apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p)
self.updates.append(K.update(p, new_p))
return self.updates
And the Function 'get_update()' seems one step update. However should the accumulators be stored the history information? Why it has been initialized to zeros at each step? How it can be an accumulator through the whole training process?
What does this line do?
self.weights = accumulators
It seems self.weights is never been called anymore.
You are correct.. for all optimizers in Keras get_updates()
implements the tensor logic for one step of updates. This function is called once for each model.fit()
from _make_train_function()
here, which is used to create the tensor function by passing the update rule as update=
here. This update rule is used iteration to iteration to update the model parameters and other parameters.
self.weights
of an optimizer class is its internal parameters. This is not used for training. It just functions to keep the state of the optimizer (list of pointers to the param/accumulators tensor) and when model.save
is called they are also saved by calling get_weights()
here and is loaded back when model.load
is called by set_weights()
here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With