I want to train a new neural network using TensorFlow by reusing the lower layers of an existing neural network (which is already trained). I want to drop the top layers of the existing network and replace them with new layers, and I also want to lock the lowest layers to prevent backpropagation from modifying them. Here's a little ascii art to summarize this:
*Original model* *New model*
Output Layer Output Layer (new)
| |
Hidden Layer 3 Hidden Layer 3 (copied)
| ==> |
Hidden Layer 2 Hidden Layer 2 (copied+locked)
| |
Hidden Layer 1 Hidden Layer 1 (copied+locked)
| |
Inputs Inputs
What's a good way to do this?
Edit
My original network was created like this:
X = tf.placeholder(tf.float32, shape=(None, 500), name="X")
y = tf.placeholder(tf.int64, shape=(None), name="y")
hidden1 = fully_connected(X, 300, scope="hidden1")
hidden2 = fully_connected(hidden1, 100, scope="hidden2")
hidden3 = fully_connected(hidden2, 50, scope="hidden3")
output = fully_connected(hidden3, 5, activation_fn=None, scope="output)
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, y)
loss = tf.reduce_mean(xentropy, name="loss")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
training_op = optimizer.minimize(loss)
init = tf.initialize_all_variables()
saver = tf.train.Saver()
# ... Train then save the network using the saver
What's the code that would load this network, lock the 2 lower hidden layers, and replace the output layer? If possible, it would be great to be able to cache the output of the top locked layer (hidden2) for each input, to speed up training.
Extra details
I looked at retrain.py and the corresponding How-To (a very interesting read). The code basically loads the original model, then it computes the output of the bottleneck layer (ie. the last hidden layer before the output layer) for each input. Then it creates a brand new model and trains it using the bottleneck outputs as inputs. This basically answers my question for the copied+locked layers: I just need to run the original model on the whole training set and store the output of the top-most locked layer. But I don't know how to handle the copied but unlocked (ie. trainable) layers (eg. Hidden Layer 3 in my diagram).
Thanks!
TensorFlow gives you fine grain control of the set of parameters (Variable
s) you update in every training step. For instance, in your model, suppose the layers are all fully connected layers. Then you would have a weights parameter and biases parameter for each layer. Let's say you have the corresponding Variable
objects in W1
, b1
, W2
, b2
, W3
, b3
, Woutput
and boutput
. Assuming you are using the Optimizer
interface, and assuming that loss
is the value you want to minimize, you can only train hidden and output layers by doing the following :
opt = GradientDescentOptimizer(learning_rate=0.1)
grads_and_vars = opt.compute_gradients(loss, var_list=[W3, b3, Woutput, boutput])
train_op = opt.apply_gradients(grads_and_vars)
NOTE: opt.minimize(loss, var_list)
does the equivalent of above, but I split it in two to illustrate the details.
opt.compute_gradients
computes the gradients with respect to specific set of your model parameters, and you have full control as to what you consider as your model parameters. Note that you have to initialize Hidden layer 3 parameters from the older model, and Output layer parameters randomly. You can do so by restoring your new model from the original model which would copy all the parameters from the original model, and adding extra tf.assign
operations to initialize the output layer parameters randomly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With