Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set layer-wise learning rate in Tensorflow?

I am wondering if there is a way that I can use different learning rate for different layers like what is in Caffe. I am trying to modify a pre-trained model and use it for other tasks. What I want is to speed up the training for new added layers and keep the trained layers at low learning rate in order to prevent them from being distorted. for example, I have a 5-conv-layer pre-trained model. Now I add a new conv layer and fine tune it. The first 5 layers would have learning rate of 0.00001 and the last one would have 0.001. Any idea how to achieve this?

like image 915
Tong Shen Avatar asked Jan 22 '16 11:01

Tong Shen


People also ask

How does TensorFlow choose learning rate?

The learning rate controls how much the weights are updated according to the estimated error. Choose too small of a value and your model will train forever and likely get stuck. Opt for a too large learning rate and your model might skip the optimal set of weights during training.

How do I set learning rate in keras?

The constant learning rate is the default schedule in all Keras Optimizers. For example, in the SGD optimizer, the learning rate defaults to 0.01 . To use a custom learning rate, simply instantiate an SGD optimizer and pass the argument learning_rate=0.01 .


2 Answers

It can be achieved quite easily with 2 optimizers:

var_list1 = [variables from first 5 layers] var_list2 = [the rest of variables] train_op1 = GradientDescentOptimizer(0.00001).minimize(loss, var_list=var_list1) train_op2 = GradientDescentOptimizer(0.0001).minimize(loss, var_list=var_list2) train_op = tf.group(train_op1, train_op2) 

One disadvantage of this implementation is that it computes tf.gradients(.) twice inside the optimizers and thus it might not be optimal in terms of execution speed. This can be mitigated by explicitly calling tf.gradients(.), splitting the list into 2 and passing corresponding gradients to both optimizers.

Related question: Holding variables constant during optimizer

EDIT: Added more efficient but longer implementation:

var_list1 = [variables from first 5 layers] var_list2 = [the rest of variables] opt1 = tf.train.GradientDescentOptimizer(0.00001) opt2 = tf.train.GradientDescentOptimizer(0.0001) grads = tf.gradients(loss, var_list1 + var_list2) grads1 = grads[:len(var_list1)] grads2 = grads[len(var_list1):] tran_op1 = opt1.apply_gradients(zip(grads1, var_list1)) train_op2 = opt2.apply_gradients(zip(grads2, var_list2)) train_op = tf.group(train_op1, train_op2) 

You can use tf.trainable_variables() to get all training variables and decide to select from them. The difference is that in the first implementation tf.gradients(.) is called twice inside the optimizers. This may cause some redundant operations to be executed (e.g. gradients on the first layer can reuse some computations for the gradients of the following layers).

like image 164
Rafał Józefowicz Avatar answered Sep 22 '22 12:09

Rafał Józefowicz


Tensorflow 1.7 introduced tf.custom_gradient that greatly simplifies setting learning rate multipliers, in a way that is now compatible with any optimizer, including those accumulating gradient statistics. For example,

import tensorflow as tf  def lr_mult(alpha):   @tf.custom_gradient   def _lr_mult(x):     def grad(dy):       return dy * alpha * tf.ones_like(x)     return x, grad   return _lr_mult  x0 = tf.Variable(1.) x1 = tf.Variable(1.) loss = tf.square(x0) + tf.square(lr_mult(0.1)(x1))  step = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(loss)  sess = tf.InteractiveSession() tf.global_variables_initializer().run() tf.local_variables_initializer().run()  for _ in range(5):   sess.run([step])   print(sess.run([x0, x1, loss])) 
like image 41
P-Gn Avatar answered Sep 19 '22 12:09

P-Gn