Why do I have to do two train steps for fine-tuning InceptionV3 in Keras?

Tags:

I don't understand why I have to call the fit()/fit_generator() function twice in order to fine-tune InceptionV3 (or any other pretrained model) in Keras (version 2.0.0). The documentation suggests the following:

Fine-tune InceptionV3 on a new set of classes

from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K

# create the base pre-trained model
base_model = InceptionV3(weights='imagenet', include_top=False)

# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a logistic layer -- let's say we have 200 classes
predictions = Dense(200, activation='softmax')(x)

# this is the model we will train
model = Model(input=base_model.input, output=predictions)

# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
    layer.trainable = False

# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

# train the model on the new data for a few epochs
model.fit_generator(...)

# at this point, the top layers are well trained and we can start fine-tuning
# convolutional layers from inception V3. We will freeze the bottom N layers
# and train the remaining top layers.

# let's visualize layer names and layer indices to see how many layers
# we should freeze:
for i, layer in enumerate(base_model.layers):
   print(i, layer.name)

# we chose to train the top 2 inception blocks, i.e. we will freeze
# the first 172 layers and unfreeze the rest:
for layer in model.layers[:172]:
   layer.trainable = False
for layer in model.layers[172:]:
   layer.trainable = True

# we need to recompile the model for these modifications to take effect
# we use SGD with a low learning rate
from keras.optimizers import SGD
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy')

# we train our model again (this time fine-tuning the top 2 inception blocks
# alongside the top Dense layers
model.fit_generator(...)

Why don't we call fit()/fit_generator() only once? As always, thanks for your help!

E D I T :

The below given answers by Nassim Ben and David de la Iglesia are both very good. I'd like to highly recommend the link given by David de la Iglesia: Transfer Learning

565

asked Mar 17 '17 16:03

D.Laupheimer

2 Answers

InceptionV3 is a very deep and complex network, it has been trained to recognize some things, but you are using it for another classification task. This means that when you use it, it's not perfectly adapted to what you do.

So the aim they want to achieve here, is use some of the features learnt already by the trained network and modify a bit the top of the network (the highest level features, the closest to your task).

So they removed the very top layer and added some more, new and untrained ones. They wanted to train that big model for their task, using the feature extraction made by the 172 first layers and learning the last ones to be adapted to your task.

In that part that they want to train, there is one subpart with already learned parameters and another with new, randomly initialized, parameters. The thing is that the layers that are already learnt, you only want to fine tune them and not relearn them from scratch... The model has no way to distinguish layers that it should just finetune and layers that should be learnt completely. If you only do one fit on the [172:] layers of the model, you will lose the interesting features learnt on the huge dataset of imagnet. You don't want that, so what you do is :

Learn "good enough" last layers by setting the whole inceptionV3 to not trainable, this will produce a good result.
The layers newly trained will be good and if you "unfreeze" some of the top layers they won't be perturbated too much, they will only be fine tuned, just how you want.

So to summarize, when you want to train a mix of "already learnt" layers with new layers, you bring the new ones up to date and then train on everything to fine tune them.

159

answered Oct 19 '22 08:10

Nassim Ben

If you append 2 layers with random initializations on top of an already tuned convnet and you try to finetune some convolutional layers without "warming up" the new layers, the high gradients of this new layers will blow up the (useful) things learned by those convolutional layers.

That's why your first fit only trains this 2 new layers, using the pre-trained convnet like some sort of "fixed" feature extractor.

After that, your 2 Dense layers don't have high gradients and you are able to finetune some of the pre-trained convolutional layers. That's what you are doing on your second fit.

answered Oct 19 '22 10:10

David de la Iglesia

Related questions
                            
                                insert html into template using AJAX with Python Flask
                            
                                Checking that the geometry for a triangle is contained in a list of lines
                            
                                Visualize output of each layer in theano Convolutional MLP
                            
                                Pymongo.find() only return answer
                            
                                Python csv writer "AttributeError: __exit__" issue
                            
                                Scikit-learn SVC always giving accuracy 0 on random data cross validation
                            
                                Can't change "heading 1" font name using docx
                            
                                Determinate if class has user defined __init__
                            
                                Graphite Web Error Log, OperationalError: no such table: auth_user
                            
                                Accelerating one-to-many correlation calculations in Python
                            
                                Detecting Peaks in a FFT Plot
                            
                                Converting words with String upper() does not work for certain letters?
                            
                                Ruby's ERB like feature in python
                            
                                Ubuntu 16.04, Python 2.7 - ImportError: No module named enum
                            
                                HTTPError: HTTP Error 401: Unauthorized for sendgrid integration with python
                            
                                UDP packet headers from socket() are not as expected
                            
                                AttributeError: 'Timestamp' object has no attribute 'timestamp'
                            
                                Using typing module in Python 2.7
                            
                                Python installed for all users or current user only?
                            
                                python animation.FuncAnimation error : object is not iterable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why do I have to do two train steps for fine-tuning InceptionV3 in Keras?

Tags:

neural-network

python-2.7

keras

pre-trained-model

D.Laupheimer

People also ask

2 Answers

Nassim Ben

David de la Iglesia

Recent Activity

Donate For Us