Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reuse an existing neural network to train a new one using TensorFlow?

I want to train a new neural network using TensorFlow by reusing the lower layers of an existing neural network (which is already trained). I want to drop the top layers of the existing network and replace them with new layers, and I also want to lock the lowest layers to prevent backpropagation from modifying them. Here's a little ascii art to summarize this:

*Original model*           *New model*

Output Layer                Output Layer (new)
     |                          |
Hidden Layer 3             Hidden Layer 3 (copied)
     |             ==>          |
Hidden Layer 2             Hidden Layer 2 (copied+locked)
     |                          |
Hidden Layer 1             Hidden Layer 1 (copied+locked)
     |                          |
   Inputs                     Inputs

What's a good way to do this?

Edit

My original network was created like this:

X = tf.placeholder(tf.float32, shape=(None, 500), name="X")
y = tf.placeholder(tf.int64, shape=(None), name="y")

hidden1 = fully_connected(X, 300, scope="hidden1")
hidden2 = fully_connected(hidden1, 100, scope="hidden2")
hidden3 = fully_connected(hidden2, 50, scope="hidden3")
output = fully_connected(hidden3, 5, activation_fn=None, scope="output)

xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, y)
loss = tf.reduce_mean(xentropy, name="loss")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
training_op = optimizer.minimize(loss)

init = tf.initialize_all_variables()
saver = tf.train.Saver()

# ... Train then save the network using the saver

What's the code that would load this network, lock the 2 lower hidden layers, and replace the output layer? If possible, it would be great to be able to cache the output of the top locked layer (hidden2) for each input, to speed up training.

Extra details

I looked at retrain.py and the corresponding How-To (a very interesting read). The code basically loads the original model, then it computes the output of the bottleneck layer (ie. the last hidden layer before the output layer) for each input. Then it creates a brand new model and trains it using the bottleneck outputs as inputs. This basically answers my question for the copied+locked layers: I just need to run the original model on the whole training set and store the output of the top-most locked layer. But I don't know how to handle the copied but unlocked (ie. trainable) layers (eg. Hidden Layer 3 in my diagram).

Thanks!

like image 899
MiniQuark Avatar asked Oct 19 '22 03:10

MiniQuark


1 Answers

TensorFlow gives you fine grain control of the set of parameters (Variables) you update in every training step. For instance, in your model, suppose the layers are all fully connected layers. Then you would have a weights parameter and biases parameter for each layer. Let's say you have the corresponding Variable objects in W1, b1, W2, b2, W3, b3, Woutput and boutput. Assuming you are using the Optimizer interface, and assuming that loss is the value you want to minimize, you can only train hidden and output layers by doing the following :

opt = GradientDescentOptimizer(learning_rate=0.1)
grads_and_vars = opt.compute_gradients(loss, var_list=[W3, b3, Woutput, boutput])
train_op = opt.apply_gradients(grads_and_vars)

NOTE: opt.minimize(loss, var_list) does the equivalent of above, but I split it in two to illustrate the details.

opt.compute_gradients computes the gradients with respect to specific set of your model parameters, and you have full control as to what you consider as your model parameters. Note that you have to initialize Hidden layer 3 parameters from the older model, and Output layer parameters randomly. You can do so by restoring your new model from the original model which would copy all the parameters from the original model, and adding extra tf.assign operations to initialize the output layer parameters randomly.

like image 51
keveman Avatar answered Oct 21 '22 06:10

keveman