I am using TensorFlow to train a neural network. This is how I am initializing the GradientDescentOptimizer
:
init = tf.initialize_all_variables() sess = tf.Session() sess.run(init) mse = tf.reduce_mean(tf.square(out - out_)) train_step = tf.train.GradientDescentOptimizer(0.3).minimize(mse)
The thing here is that I don't know how to set an update rule for the learning rate or a decay value for that.
How can I use an adaptive learning rate here?
Adaptive Learning Rates This is called an adaptive learning rate. Perhaps the simplest implementation is to make the learning rate smaller once the performance of the model plateaus, such as by decreasing the learning rate by a factor of two or an order of magnitude.
The challenge of using learning rate schedules is that their hyperparameters have to be defined in advance and they depend heavily on the type of model and problem. Another problem is that the same learning rate is applied to all parameter updates.
First of all, tf.train.GradientDescentOptimizer
is designed to use a constant learning rate for all variables in all steps. TensorFlow also provides out-of-the-box adaptive optimizers including the tf.train.AdagradOptimizer
and the tf.train.AdamOptimizer
, and these can be used as drop-in replacements.
However, if you want to control the learning rate with otherwise-vanilla gradient descent, you can take advantage of the fact that the learning_rate
argument to the tf.train.GradientDescentOptimizer
constructor can be a Tensor
object. This allows you to compute a different value for the learning rate in each step, for example:
learning_rate = tf.placeholder(tf.float32, shape=[]) # ... train_step = tf.train.GradientDescentOptimizer( learning_rate=learning_rate).minimize(mse) sess = tf.Session() # Feed different values for learning rate to each training step. sess.run(train_step, feed_dict={learning_rate: 0.1}) sess.run(train_step, feed_dict={learning_rate: 0.1}) sess.run(train_step, feed_dict={learning_rate: 0.01}) sess.run(train_step, feed_dict={learning_rate: 0.01})
Alternatively, you could create a scalar tf.Variable
that holds the learning rate, and assign it each time you want to change the learning rate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With