In keras.applications
, there is a VGG16 model pre-trained on imagenet.
from keras.applications import VGG16 model = VGG16(weights='imagenet')
This model has the following structure.
Layer (type) Output Shape Param # Connected to ==================================================================================================== input_1 (InputLayer) (None, 3, 224, 224) 0 ____________________________________________________________________________________________________ block1_conv1 (Convolution2D) (None, 64, 224, 224) 1792 input_1[0][0] ____________________________________________________________________________________________________ block1_conv2 (Convolution2D) (None, 64, 224, 224) 36928 block1_conv1[0][0] ____________________________________________________________________________________________________ block1_pool (MaxPooling2D) (None, 64, 112, 112) 0 block1_conv2[0][0] ____________________________________________________________________________________________________ block2_conv1 (Convolution2D) (None, 128, 112, 112) 73856 block1_pool[0][0] ____________________________________________________________________________________________________ block2_conv2 (Convolution2D) (None, 128, 112, 112) 147584 block2_conv1[0][0] ____________________________________________________________________________________________________ block2_pool (MaxPooling2D) (None, 128, 56, 56) 0 block2_conv2[0][0] ____________________________________________________________________________________________________ block3_conv1 (Convolution2D) (None, 256, 56, 56) 295168 block2_pool[0][0] ____________________________________________________________________________________________________ block3_conv2 (Convolution2D) (None, 256, 56, 56) 590080 block3_conv1[0][0] ____________________________________________________________________________________________________ block3_conv3 (Convolution2D) (None, 256, 56, 56) 590080 block3_conv2[0][0] ____________________________________________________________________________________________________ block3_pool (MaxPooling2D) (None, 256, 28, 28) 0 block3_conv3[0][0] ____________________________________________________________________________________________________ block4_conv1 (Convolution2D) (None, 512, 28, 28) 1180160 block3_pool[0][0] ____________________________________________________________________________________________________ block4_conv2 (Convolution2D) (None, 512, 28, 28) 2359808 block4_conv1[0][0] ____________________________________________________________________________________________________ block4_conv3 (Convolution2D) (None, 512, 28, 28) 2359808 block4_conv2[0][0] ____________________________________________________________________________________________________ block4_pool (MaxPooling2D) (None, 512, 14, 14) 0 block4_conv3[0][0] ____________________________________________________________________________________________________ block5_conv1 (Convolution2D) (None, 512, 14, 14) 2359808 block4_pool[0][0] ____________________________________________________________________________________________________ block5_conv2 (Convolution2D) (None, 512, 14, 14) 2359808 block5_conv1[0][0] ____________________________________________________________________________________________________ block5_conv3 (Convolution2D) (None, 512, 14, 14) 2359808 block5_conv2[0][0] ____________________________________________________________________________________________________ block5_pool (MaxPooling2D) (None, 512, 7, 7) 0 block5_conv3[0][0] ____________________________________________________________________________________________________ flatten (Flatten) (None, 25088) 0 block5_pool[0][0] ____________________________________________________________________________________________________ fc1 (Dense) (None, 4096) 102764544 flatten[0][0] ____________________________________________________________________________________________________ fc2 (Dense) (None, 4096) 16781312 fc1[0][0] ____________________________________________________________________________________________________ predictions (Dense) (None, 1000) 4097000 fc2[0][0] ==================================================================================================== Total params: 138,357,544 Trainable params: 138,357,544 Non-trainable params: 0 ____________________________________________________________________________________________________
I would like to fine-tune this model with dropout layers between the dense layers (fc1, fc2 and predictions), while keeping all the pre-trained weights of the model intact. I know it's possible to access each layer individually with model.layers
, but I haven't found anywhere how to add new layers between the existing layers.
What's the best practice of doing this?
Usually, dropout is placed on the fully connected layers only because they are the one with the greater number of parameters and thus they're likely to excessively co-adapting themselves causing overfitting. However, since it's a stochastic regularization technique, you can really place it everywhere.
Dropout layers are important in training CNNs because they prevent overfitting on the training data. If they aren't present, the first batch of training samples influences the learning in a disproportionately high manner.
Dropout is easily implemented by randomly selecting nodes to be dropped-out with a given probability (e.g. 20%) each weight update cycle. This is how Dropout is implemented in Keras. Dropout is only used during the training of a model and is not used when evaluating the skill of the model.
Use Dropouts. Dropout is a regularization technique that prevents neural networks from overfitting. Regularization methods like L1 and L2 reduce overfitting by modifying the cost function. Dropout on the other hand, modify the network itself.
I found an answer myself by using Keras functional API
from keras.applications import VGG16 from keras.layers import Dropout from keras.models import Model model = VGG16(weights='imagenet') # Store the fully connected layers fc1 = model.layers[-3] fc2 = model.layers[-2] predictions = model.layers[-1] # Create the dropout layers dropout1 = Dropout(0.85) dropout2 = Dropout(0.85) # Reconnect the layers x = dropout1(fc1.output) x = fc2(x) x = dropout2(x) predictors = predictions(x) # Create a new model model2 = Model(input=model.input, output=predictors)
model2
has the dropout layers as I wanted
____________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ==================================================================================================== input_1 (InputLayer) (None, 3, 224, 224) 0 ____________________________________________________________________________________________________ block1_conv1 (Convolution2D) (None, 64, 224, 224) 1792 input_1[0][0] ____________________________________________________________________________________________________ block1_conv2 (Convolution2D) (None, 64, 224, 224) 36928 block1_conv1[0][0] ____________________________________________________________________________________________________ block1_pool (MaxPooling2D) (None, 64, 112, 112) 0 block1_conv2[0][0] ____________________________________________________________________________________________________ block2_conv1 (Convolution2D) (None, 128, 112, 112) 73856 block1_pool[0][0] ____________________________________________________________________________________________________ block2_conv2 (Convolution2D) (None, 128, 112, 112) 147584 block2_conv1[0][0] ____________________________________________________________________________________________________ block2_pool (MaxPooling2D) (None, 128, 56, 56) 0 block2_conv2[0][0] ____________________________________________________________________________________________________ block3_conv1 (Convolution2D) (None, 256, 56, 56) 295168 block2_pool[0][0] ____________________________________________________________________________________________________ block3_conv2 (Convolution2D) (None, 256, 56, 56) 590080 block3_conv1[0][0] ____________________________________________________________________________________________________ block3_conv3 (Convolution2D) (None, 256, 56, 56) 590080 block3_conv2[0][0] ____________________________________________________________________________________________________ block3_pool (MaxPooling2D) (None, 256, 28, 28) 0 block3_conv3[0][0] ____________________________________________________________________________________________________ block4_conv1 (Convolution2D) (None, 512, 28, 28) 1180160 block3_pool[0][0] ____________________________________________________________________________________________________ block4_conv2 (Convolution2D) (None, 512, 28, 28) 2359808 block4_conv1[0][0] ____________________________________________________________________________________________________ block4_conv3 (Convolution2D) (None, 512, 28, 28) 2359808 block4_conv2[0][0] ____________________________________________________________________________________________________ block4_pool (MaxPooling2D) (None, 512, 14, 14) 0 block4_conv3[0][0] ____________________________________________________________________________________________________ block5_conv1 (Convolution2D) (None, 512, 14, 14) 2359808 block4_pool[0][0] ____________________________________________________________________________________________________ block5_conv2 (Convolution2D) (None, 512, 14, 14) 2359808 block5_conv1[0][0] ____________________________________________________________________________________________________ block5_conv3 (Convolution2D) (None, 512, 14, 14) 2359808 block5_conv2[0][0] ____________________________________________________________________________________________________ block5_pool (MaxPooling2D) (None, 512, 7, 7) 0 block5_conv3[0][0] ____________________________________________________________________________________________________ flatten (Flatten) (None, 25088) 0 block5_pool[0][0] ____________________________________________________________________________________________________ fc1 (Dense) (None, 4096) 102764544 flatten[0][0] ____________________________________________________________________________________________________ dropout_1 (Dropout) (None, 4096) 0 fc1[0][0] ____________________________________________________________________________________________________ fc2 (Dense) (None, 4096) 16781312 dropout_1[0][0] ____________________________________________________________________________________________________ dropout_2 (Dropout) (None, 4096) 0 fc2[1][0] ____________________________________________________________________________________________________ predictions (Dense) (None, 1000) 4097000 dropout_2[0][0] ==================================================================================================== Total params: 138,357,544 Trainable params: 138,357,544 Non-trainable params: 0 ____________________________________________________________________________________________________
Here is a solution that stays within the Keras "Sequential API".
You can loop through the layers and sequentially add them to an updated Sequential model. Add Dropouts after the layers of your choice with an if-clause.
from tensorflow.keras.applications import VGG16 from tensorflow.keras.layers import Dropout from tensorflow.keras.models import Sequential model = VGG16(weights='imagenet') # check structure and layer names before looping model.summary() # loop through layers, add Dropout after layers 'fc1' and 'fc2' updated_model = Sequential() for layer in model.layers: updated_model.add(layer) if layer.name in ['fc1', 'fc2']: updated_model.add(Dropout(.2)) model = updated_model # check structure model.summary()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With