As a "sanity check" I tried two ways to use transfer learning that I expected to behave the same, if not in running time than at least in the results.
The first method was use of bottleneck features (as explained here https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
), that is, using the existing predictor to generate the features just before the last dense layer, saving them, then training a new dense layer with these features as input.
The second method was to replace the last dense layer of the model with a new one, then freezing all other layers in model.
I expected the second method to be as effective as the first one, but it was not.
The output of the first method was
Epoch 1/50
16/16 [==============================] - 0s - loss: 1.3095 - acc: 0.4375 - val_loss: 0.4533 - val_acc: 0.7500
Epoch 2/50
16/16 [==============================] - 0s - loss: 0.3555 - acc: 0.8125 - val_loss: 0.2305 - val_acc: 1.0000
Epoch 3/50
16/16 [==============================] - 0s - loss: 0.1365 - acc: 1.0000 - val_loss: 0.1603 - val_acc: 1.0000
Epoch 4/50
16/16 [==============================] - 0s - loss: 0.0600 - acc: 1.0000 - val_loss: 0.1012 - val_acc: 1.0000
Epoch 5/50
16/16 [==============================] - 0s - loss: 0.0296 - acc: 1.0000 - val_loss: 0.0681 - val_acc: 1.0000
Epoch 6/50
16/16 [==============================] - 0s - loss: 0.0165 - acc: 1.0000 - val_loss: 0.0521 - val_acc: 1.0000
Epoch 7/50
16/16 [==============================] - 0s - loss: 0.0082 - acc: 1.0000 - val_loss: 0.0321 - val_acc: 1.0000
Epoch 8/50
16/16 [==============================] - 0s - loss: 0.0036 - acc: 1.0000 - val_loss: 0.0222 - val_acc: 1.0000
Epoch 9/50
16/16 [==============================] - 0s - loss: 0.0023 - acc: 1.0000 - val_loss: 0.0185 - val_acc: 1.0000
Epoch 10/50
16/16 [==============================] - 0s - loss: 0.0011 - acc: 1.0000 - val_loss: 0.0108 - val_acc: 1.0000
Epoch 11/50
16/16 [==============================] - 0s - loss: 5.6636e-04 - acc: 1.0000 - val_loss: 0.0087 - val_acc: 1.0000
Epoch 12/50
16/16 [==============================] - 0s - loss: 2.9463e-04 - acc: 1.0000 - val_loss: 0.0094 - val_acc: 1.0000
Epoch 13/50
16/16 [==============================] - 0s - loss: 1.5169e-04 - acc: 1.0000 - val_loss: 0.0072 - val_acc: 1.0000
Epoch 14/50
16/16 [==============================] - 0s - loss: 7.4001e-05 - acc: 1.0000 - val_loss: 0.0039 - val_acc: 1.0000
Epoch 15/50
16/16 [==============================] - 0s - loss: 3.9956e-05 - acc: 1.0000 - val_loss: 0.0034 - val_acc: 1.0000
Epoch 16/50
16/16 [==============================] - 0s - loss: 2.0384e-05 - acc: 1.0000 - val_loss: 0.0024 - val_acc: 1.0000
Epoch 17/50
16/16 [==============================] - 0s - loss: 1.0036e-05 - acc: 1.0000 - val_loss: 0.0026 - val_acc: 1.0000
Epoch 18/50
16/16 [==============================] - 0s - loss: 5.0962e-06 - acc: 1.0000 - val_loss: 0.0010 - val_acc: 1.0000
Epoch 19/50
16/16 [==============================] - 0s - loss: 2.7791e-06 - acc: 1.0000 - val_loss: 0.0011 - val_acc: 1.0000
Epoch 20/50
16/16 [==============================] - 0s - loss: 1.5646e-06 - acc: 1.0000 - val_loss: 0.0015 - val_acc: 1.0000
Epoch 21/50
16/16 [==============================] - 0s - loss: 8.6427e-07 - acc: 1.0000 - val_loss: 9.0825e-04 - val_acc: 1.0000
Epoch 22/50
16/16 [==============================] - 0s - loss: 4.3958e-07 - acc: 1.0000 - val_loss: 5.6370e-04 - val_acc: 1.0000
Epoch 23/50
16/16 [==============================] - 0s - loss: 2.5332e-07 - acc: 1.0000 - val_loss: 5.1226e-04 - val_acc: 1.0000
Epoch 24/50
16/16 [==============================] - 0s - loss: 1.6391e-07 - acc: 1.0000 - val_loss: 6.6560e-04 - val_acc: 1.0000
Epoch 25/50
16/16 [==============================] - 0s - loss: 1.3411e-07 - acc: 1.0000 - val_loss: 6.5456e-04 - val_acc: 1.0000
Epoch 26/50
16/16 [==============================] - 0s - loss: 1.1921e-07 - acc: 1.0000 - val_loss: 3.4316e-04 - val_acc: 1.0000
Epoch 27/50
16/16 [==============================] - 0s - loss: 1.1921e-07 - acc: 1.0000 - val_loss: 3.4316e-04 - val_acc: 1.0000
Epoch 28/50
16/16 [==============================] - 0s - loss: 1.1921e-07 - acc: 1.0000 - val_loss: 3.4316e-04 - val_acc: 1.0000
Epoch 29/50
16/16 [==============================] - 0s - loss: 1.1921e-07 - acc: 1.0000 - val_loss: 3.4316e-04 - val_acc: 1.0000
Epoch 30/50
16/16 [==============================] - 0s - loss: 1.1921e-07 - acc: 1.0000 - val_loss: 3.4316e-04 - val_acc: 1.0000
It converges quickly and yields good results.
The second method, on the other hand, gives this:
Epoch 1/50
24/24 [==============================] - 63s - loss: 0.7375 - acc: 0.7500 - val_loss: 0.7575 - val_acc: 0.6667
Epoch 2/50
24/24 [==============================] - 61s - loss: 0.6763 - acc: 0.7500 - val_loss: 1.5228 - val_acc: 0.5000
Epoch 3/50
24/24 [==============================] - 61s - loss: 0.7149 - acc: 0.7500 - val_loss: 3.5805 - val_acc: 0.3333
Epoch 4/50
24/24 [==============================] - 61s - loss: 0.6363 - acc: 0.7500 - val_loss: 1.5066 - val_acc: 0.5000
Epoch 5/50
24/24 [==============================] - 61s - loss: 0.6542 - acc: 0.7500 - val_loss: 1.8745 - val_acc: 0.6667
Epoch 6/50
24/24 [==============================] - 61s - loss: 0.7007 - acc: 0.7500 - val_loss: 1.5328 - val_acc: 0.5000
Epoch 7/50
24/24 [==============================] - 61s - loss: 0.6900 - acc: 0.7500 - val_loss: 3.6004 - val_acc: 0.3333
Epoch 8/50
24/24 [==============================] - 61s - loss: 0.6615 - acc: 0.7500 - val_loss: 1.5734 - val_acc: 0.5000
Epoch 9/50
24/24 [==============================] - 61s - loss: 0.6571 - acc: 0.7500 - val_loss: 3.0078 - val_acc: 0.6667
Epoch 10/50
24/24 [==============================] - 61s - loss: 0.5762 - acc: 0.7083 - val_loss: 3.6029 - val_acc: 0.5000
Epoch 11/50
24/24 [==============================] - 61s - loss: 0.6515 - acc: 0.7500 - val_loss: 5.8610 - val_acc: 0.3333
Epoch 12/50
24/24 [==============================] - 61s - loss: 0.6541 - acc: 0.7083 - val_loss: 2.4551 - val_acc: 0.5000
Epoch 13/50
24/24 [==============================] - 61s - loss: 0.6700 - acc: 0.7500 - val_loss: 2.9983 - val_acc: 0.6667
Epoch 14/50
24/24 [==============================] - 61s - loss: 0.6486 - acc: 0.7500 - val_loss: 3.6179 - val_acc: 0.5000
Epoch 15/50
24/24 [==============================] - 61s - loss: 0.6985 - acc: 0.6667 - val_loss: 5.8419 - val_acc: 0.3333
Epoch 16/50
24/24 [==============================] - 62s - loss: 0.6465 - acc: 0.7083 - val_loss: 2.5201 - val_acc: 0.5000
Epoch 17/50
24/24 [==============================] - 62s - loss: 0.6246 - acc: 0.7500 - val_loss: 2.9912 - val_acc: 0.6667
Epoch 18/50
24/24 [==============================] - 62s - loss: 0.6768 - acc: 0.7500 - val_loss: 3.6320 - val_acc: 0.5000
Epoch 19/50
24/24 [==============================] - 62s - loss: 0.5774 - acc: 0.7083 - val_loss: 5.8575 - val_acc: 0.3333
Epoch 20/50
24/24 [==============================] - 62s - loss: 0.6642 - acc: 0.7500 - val_loss: 2.5865 - val_acc: 0.5000
Epoch 21/50
24/24 [==============================] - 63s - loss: 0.6553 - acc: 0.7083 - val_loss: 2.9967 - val_acc: 0.6667
Epoch 22/50
24/24 [==============================] - 62s - loss: 0.6469 - acc: 0.7083 - val_loss: 3.6233 - val_acc: 0.5000
Epoch 23/50
24/24 [==============================] - 64s - loss: 0.6029 - acc: 0.7500 - val_loss: 5.8225 - val_acc: 0.3333
Epoch 24/50
24/24 [==============================] - 63s - loss: 0.6183 - acc: 0.7083 - val_loss: 2.5325 - val_acc: 0.5000
Epoch 25/50
24/24 [==============================] - 62s - loss: 0.6631 - acc: 0.7500 - val_loss: 2.9879 - val_acc: 0.6667
Epoch 26/50
24/24 [==============================] - 63s - loss: 0.6082 - acc: 0.7500 - val_loss: 3.6206 - val_acc: 0.5000
Epoch 27/50
24/24 [==============================] - 62s - loss: 0.6536 - acc: 0.7500 - val_loss: 5.7937 - val_acc: 0.3333
Epoch 28/50
24/24 [==============================] - 63s - loss: 0.5853 - acc: 0.7500 - val_loss: 2.6138 - val_acc: 0.5000
Epoch 29/50
24/24 [==============================] - 62s - loss: 0.5523 - acc: 0.7500 - val_loss: 3.0126 - val_acc: 0.6667
Epoch 30/50
24/24 [==============================] - 62s - loss: 0.7112 - acc: 0.7500 - val_loss: 3.7054 - val_acc: 0.5000
The same model (Inception V4) was used for both methods. My code is as follows:
First method (Bottleneck Features):
from keras import backend as K
import inception_v4
import numpy as np
import cv2
import os
from keras import optimizers
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.layers import Activation, Dropout, Flatten, Dense, Input
from keras.models import Model
os.environ['CUDA_VISIBLE_DEVICES'] = ''
v4 = inception_v4.create_model(weights='imagenet')
#v4.summary()
my_batch_size=1
train_data_dir ='//shared_directory/projects/try_CDxx/data/train/'
validation_data_dir ='//shared_directory/projects/try_CDxx/data/validation/'
top_model_weights_path= 'bottleneck_fc_model.h5'
class_num=2
img_width, img_height = 299, 299
#nb_train_samples=16
#nb_validation_samples=8
nb_epoch=50
main_input= v4.layers[1].input
main_output=v4.layers[-1].output
flatten_output= v4.layers[-2].output
model = Model(input=[main_input], output=[main_output, flatten_output])
def save_BN(model):
#
datagen = ImageDataGenerator(rescale=1./255) # here!
#
generator = datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=my_batch_size,
class_mode='categorical',
shuffle=False)
nb_train_samples = generator.classes.size
bottleneck_features_train = model.predict_generator(generator, nb_train_samples)
#
np.save(open('bottleneck_flat_features_train.npy', 'wb'), bottleneck_features_train[1])
np.save(open('bottleneck_train_labels.npy', 'wb'), generator.classes)
generator = datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=my_batch_size,
class_mode='categorical',
shuffle=False)
nb_validation_samples = generator.classes.size
bottleneck_features_validation = model.predict_generator(generator, nb_validation_samples)
np.save(open('bottleneck_flat_features_validation.npy', 'wb'), bottleneck_features_validation[1])
np.save(open('bottleneck_validation_labels.npy', 'wb'), generator.classes)
def train_top_model ():
train_data = np.load(open('bottleneck_flat_features_train.npy'))
train_labels = np.load(open('bottleneck_train_labels.npy'))
#
validation_data = np.load(open('bottleneck_flat_features_validation.npy'))
validation_labels = np.load(open('bottleneck_validation_labels.npy'))
#
top_m = Sequential()
top_m.add(Dense(class_num,input_shape=train_data.shape[1:], activation='softmax', name='top_dense1'))
top_m.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
#
top_m.fit(train_data, train_labels,
nb_epoch=nb_epoch, batch_size=my_batch_size,
validation_data=(validation_data, validation_labels))
Dense_layer=top_m.layers[-1]
my_weights=Dense_layer.get_weights()
np.save(open('retrained_top_layer_weight.npy', 'wb'), my_weights)
save_BN(model)
train_top_model()
Second method (freezing all but the last)
from keras import backend as K
import inception_v4
import numpy as np
import cv2
import os
from keras import optimizers
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.layers import Activation, Dropout, Flatten, Dense, Input
from keras.models import Model
os.environ['CUDA_VISIBLE_DEVICES'] = ''
my_batch_size=1
train_data_dir ='//shared_directory/projects/try_CDxx/data/train/'
validation_data_dir ='//shared_directory/projects/try_CDxx/data/validation/'
top_model_path= 'tm_trained_model.h5'
img_width, img_height = 299, 299
num_classes=2
#nb_epoch=50
nb_epoch=50
nbr_train_samples = 24
nbr_validation_samples = 12
def train_top_model (num_classes):
v4 = inception_v4.create_model(weights='imagenet')
predictions = Dense(output_dim=num_classes, activation='softmax', name="newDense")(v4.layers[-2].output) # replacing the 1001 categories dense layer with my own
main_input= v4.layers[1].input
main_output=predictions
t_model = Model(input=[main_input], output=[main_output])
val_datagen = ImageDataGenerator(rescale=1./255)
train_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size = (img_width, img_height),
batch_size = my_batch_size,
shuffle = False,
class_mode = 'categorical')
validation_generator = val_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=my_batch_size,
shuffle = False,
class_mode = 'categorical')
#
for layer in t_model.layers:
layer.trainable = False
t_model.layers[-1].trainable=True
t_model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
#
t_model.fit_generator(
train_generator,
samples_per_epoch = nbr_train_samples,
nb_epoch = nb_epoch,
validation_data = validation_generator,
nb_val_samples = nbr_validation_samples)
t_model.save(top_model_path)
# print (t_model.trainable_weights)
train_top_model(num_classes)
I think that freezing all of the net but the top and training only the top should be identical to using all of the net but the top to create the features that exist just before the top, and then train a new dense layer is basically the same thing.
So I am either incorrect in my code or my thinking about the problem (or both...)
What am I doing wrong?
Thank you for your time.
This was a really neat problem. It's because of Dropout
layers in your second approach. Even though the layer was set to be not trainable
- Dropout
still works and prevents your network from overfitting by changing your input.
Try to change your code to:
v4 = inception_v4.create_model(weights='imagenet')
predictions = Flatten()(v4.layers[-4].output)
predictions = Dense(output_dim=num_classes, activation='softmax', name="newDense")(predictions)
Also - because of the BatchNormalization
change the batch_size
to 24
.
This should work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With