I don't understand why I have to call the fit()
/fit_generator()
function twice in order to fine-tune InceptionV3 (or any other pretrained model) in Keras (version 2.0.0).
The documentation suggests the following:
Fine-tune InceptionV3 on a new set of classes
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K
# create the base pre-trained model
base_model = InceptionV3(weights='imagenet', include_top=False)
# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a logistic layer -- let's say we have 200 classes
predictions = Dense(200, activation='softmax')(x)
# this is the model we will train
model = Model(input=base_model.input, output=predictions)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
layer.trainable = False
# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# train the model on the new data for a few epochs
model.fit_generator(...)
# at this point, the top layers are well trained and we can start fine-tuning
# convolutional layers from inception V3. We will freeze the bottom N layers
# and train the remaining top layers.
# let's visualize layer names and layer indices to see how many layers
# we should freeze:
for i, layer in enumerate(base_model.layers):
print(i, layer.name)
# we chose to train the top 2 inception blocks, i.e. we will freeze
# the first 172 layers and unfreeze the rest:
for layer in model.layers[:172]:
layer.trainable = False
for layer in model.layers[172:]:
layer.trainable = True
# we need to recompile the model for these modifications to take effect
# we use SGD with a low learning rate
from keras.optimizers import SGD
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy')
# we train our model again (this time fine-tuning the top 2 inception blocks
# alongside the top Dense layers
model.fit_generator(...)
Why don't we call fit()
/fit_generator()
only once?
As always, thanks for your help!
E D I T :
The below given answers by Nassim Ben and David de la Iglesia are both very good. I'd like to highly recommend the link given by David de la Iglesia: Transfer Learning
Transfer Learning and Fine-tuning are used interchangeably and are defined as the process of training a neural network on new data but initialising it with pre-trained weights obtained from training it on a different, mostly much larger dataset, for a new task which is somewhat related to the data and task the network ...
Fine-tuning is a way of applying or utilizing transfer learning. Specifically, fine-tuning is a process that takes a model that has already been trained for one given task and then tunes or tweaks the model to make it perform a second similar task.
Fine-tune VGG16. VGG16 is a 16-layer Covnet used by the Visual Geometry Group (VGG) at Oxford University in the 2014 ILSVRC (ImageNet) competition. The model achieves a 7.5% top 5 error rate on the validation set, which is a result that earned them a second place finish in the competition.
InceptionV3 is a very deep and complex network, it has been trained to recognize some things, but you are using it for another classification task. This means that when you use it, it's not perfectly adapted to what you do.
So the aim they want to achieve here, is use some of the features learnt already by the trained network and modify a bit the top of the network (the highest level features, the closest to your task).
So they removed the very top layer and added some more, new and untrained ones. They wanted to train that big model for their task, using the feature extraction made by the 172 first layers and learning the last ones to be adapted to your task.
In that part that they want to train, there is one subpart with already learned parameters and another with new, randomly initialized, parameters. The thing is that the layers that are already learnt, you only want to fine tune them and not relearn them from scratch... The model has no way to distinguish layers that it should just finetune and layers that should be learnt completely. If you only do one fit on the [172:] layers of the model, you will lose the interesting features learnt on the huge dataset of imagnet. You don't want that, so what you do is :
So to summarize, when you want to train a mix of "already learnt" layers with new layers, you bring the new ones up to date and then train on everything to fine tune them.
If you append 2 layers with random initializations on top of an already tuned convnet and you try to finetune some convolutional layers without "warming up" the new layers, the high gradients of this new layers will blow up the (useful) things learned by those convolutional layers.
That's why your first fit
only trains this 2 new layers, using the pre-trained convnet like some sort of "fixed" feature extractor.
After that, your 2 Dense layers don't have high gradients and you are able to finetune some of the pre-trained convolutional layers. That's what you are doing on your second fit
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With