How to set layer-wise learning rate in Tensorflow?

Tags:

I am wondering if there is a way that I can use different learning rate for different layers like what is in Caffe. I am trying to modify a pre-trained model and use it for other tasks. What I want is to speed up the training for new added layers and keep the trained layers at low learning rate in order to prevent them from being distorted. for example, I have a 5-conv-layer pre-trained model. Now I add a new conv layer and fine tune it. The first 5 layers would have learning rate of 0.00001 and the last one would have 0.001. Any idea how to achieve this?

915

asked Jan 22 '16 11:01

Tong Shen

2 Answers

It can be achieved quite easily with 2 optimizers:

var_list1 = [variables from first 5 layers] var_list2 = [the rest of variables] train_op1 = GradientDescentOptimizer(0.00001).minimize(loss, var_list=var_list1) train_op2 = GradientDescentOptimizer(0.0001).minimize(loss, var_list=var_list2) train_op = tf.group(train_op1, train_op2)

One disadvantage of this implementation is that it computes tf.gradients(.) twice inside the optimizers and thus it might not be optimal in terms of execution speed. This can be mitigated by explicitly calling tf.gradients(.), splitting the list into 2 and passing corresponding gradients to both optimizers.

Related question: Holding variables constant during optimizer

EDIT: Added more efficient but longer implementation:

var_list1 = [variables from first 5 layers] var_list2 = [the rest of variables] opt1 = tf.train.GradientDescentOptimizer(0.00001) opt2 = tf.train.GradientDescentOptimizer(0.0001) grads = tf.gradients(loss, var_list1 + var_list2) grads1 = grads[:len(var_list1)] grads2 = grads[len(var_list1):] tran_op1 = opt1.apply_gradients(zip(grads1, var_list1)) train_op2 = opt2.apply_gradients(zip(grads2, var_list2)) train_op = tf.group(train_op1, train_op2)

You can use tf.trainable_variables() to get all training variables and decide to select from them. The difference is that in the first implementation tf.gradients(.) is called twice inside the optimizers. This may cause some redundant operations to be executed (e.g. gradients on the first layer can reuse some computations for the gradients of the following layers).

164

answered Sep 22 '22 12:09

Rafał Józefowicz

Tensorflow 1.7 introduced tf.custom_gradient that greatly simplifies setting learning rate multipliers, in a way that is now compatible with any optimizer, including those accumulating gradient statistics. For example,

import tensorflow as tf  def lr_mult(alpha):   @tf.custom_gradient   def _lr_mult(x):     def grad(dy):       return dy * alpha * tf.ones_like(x)     return x, grad   return _lr_mult  x0 = tf.Variable(1.) x1 = tf.Variable(1.) loss = tf.square(x0) + tf.square(lr_mult(0.1)(x1))  step = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(loss)  sess = tf.InteractiveSession() tf.global_variables_initializer().run() tf.local_variables_initializer().run()  for _ in range(5):   sess.run([step])   print(sess.run([x0, x1, loss]))

answered Sep 19 '22 12:09

P-Gn

Related questions
                            
                                Using 'argparse.ArgumentError' in Python
                            
                                Python argparse: Lots of choices results in ugly help output
                            
                                Why does "python setup.py sdist" create unwanted "PROJECT-egg.info" in project root directory?
                            
                                Threaded Django task doesn't automatically handle transactions or db connections?
                            
                                What is the fastest way to flatten arbitrarily nested lists in Python? [duplicate]
                            
                                Storing the secrets (passwords) in a separate file
                            
                                How to get keyboard input in pygame?
                            
                                Set up Python 3 build system with Sublime Text 3
                            
                                python pandas to_sql with sqlalchemy : how to speed up exporting to MS SQL?
                            
                                What is the time complexity of popping elements from list in Python?
                            
                                Filtering all rows with NaT in a column in Dataframe python
                            
                                automatically position text box in matplotlib
                            
                                Constructing a co-occurrence matrix in python pandas
                            
                                Sort pandas dataframe both on values of a column and index?
                            
                                How to change a widget's font style without knowing the widget's font family/size?
                            
                                OperationalError, no such column. Django
                            
                                module 'yaml' has no attribute 'FullLoader'
                            
                                Check and wait until a file exists to read it
                            
                                What is the difference between class and instance variables?
                            
                                In pandas, how can I reset index without adding a new column?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to set layer-wise learning rate in Tensorflow?

Tags:

python

tensorflow

deep-learning

Tong Shen

People also ask

2 Answers

Rafał Józefowicz

P-Gn

Recent Activity

Donate For Us