Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reuse VGG19 for image classification in Keras?

I am currently trying to understand how to reuse VGG19 (or other architectures) in order to improve my small image classification model. I am classifying images (in this case paintings) into 3 classes (let's say, paintings from 15th, 16th and 17th centuries). I have quite a small dataset, 1800 training examples per class with 250 per class in the validation set.

I have the following implementation:

from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras import backend as K
from keras.callbacks import ModelCheckpoint
from keras.regularizers import l2, l1
from keras.models import load_model

# set proper image ordering for TensorFlow
K.set_image_dim_ordering('th')

batch_size = 32

# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

# this is the augmentation configuration we will use for testing:
# only rescaling
test_datagen = ImageDataGenerator(rescale=1./255)

# this is a generator that will read pictures found in
# subfolers of 'data/train', and indefinitely generate
# batches of augmented image data
train_generator = train_datagen.flow_from_directory(
        'C://keras//train_set_paintings//',  # this is the target directory
        target_size=(150, 150),  # all images will be resized to 150x150
        batch_size=batch_size,
        class_mode='categorical')

# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
        'C://keras//validation_set_paintings//',
        target_size=(150, 150),
        batch_size=batch_size,
        class_mode='categorical')

model = Sequential()

model.add(Conv2D(16, (3, 3), input_shape=(3, 150, 150)))
model.add(Activation('relu'))  # also tried LeakyRelu, no improvments
model.add(MaxPooling2D(pool_size=(2, 3), data_format="channels_first"))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 3), data_format="channels_first"))

model.add(Flatten())
model.add(Dense(64, kernel_regularizer=l2(.01))) 
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(3))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',  # also tried SGD, it doesn't perform as well as adam
              metrics=['accuracy'])

fBestModel = 'best_model_final_paintings.h5'
best_model = ModelCheckpoint(fBestModel, verbose=0, save_best_only=True)

hist = model.fit_generator(
    train_generator,
    steps_per_epoch=2000 // batch_size,
    epochs=100,
    validation_data=validation_generator,
    validation_steps=200 // batch_size,
    callbacks=[best_model],
    workers=8  # cpu generation is run in parallel to the gpu training
)

print("Maximum train accuracy:", max(hist.history["acc"]))
print("Maximum train accuracy on epoch:", hist.history["acc"].index(max(hist.history["acc"]))+1)

print("Maximum validation accuracy:", max(hist.history["val_acc"]))
print("Maximum validation accuracy on epoch:", hist.history["val_acc"].index(max(hist.history["val_acc"]))+1)

I have managed to keep it rather balanced in terms of overfitting: enter image description here enter image description here

If I make the architecture deeper, it either overfits a lot or jumps around like insane if I regularize it more strictly, even reaching 100% at one point: enter image description here

I have also tried using BatchNormalization, but then the model doesn't learn almost at all, it doesn't go over 50% acc on the training set. Tried it with and without dropout.

I am looking for other ways of improving the model other than changing the architecture too much. One of the options I see is reusing an existing architecture with its weights and plugging it into my model. But I can't find any real examples of how to do it. I am mostly following this blog post: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

It talks about reusing VGG19 to improve accuracy but it doesn't really explain how it is done. Are there any other examples I could follow? How would I adapt it to my current implementation? I found a full model architecture, but running it is not possible on my hardware, so I am looking into a way of reusing an already trained model with weights and then adapting it to my problem.

Also, I don't understand the concept behind "bottleneck features", which the blog talks about in the VGG part. Would be glad if someone could explain it.

like image 969
Ivan Bilan Avatar asked Dec 23 '22 14:12

Ivan Bilan


2 Answers

You should definitely try out Transfer Learning (link is to the first Google result for "transfer learning Keras", there's plenty of tutorials on the subject). Essentially TL is a fine-tuning of a network that was pre-trained on some big dataset (i.e., most commonly Imagenet) with new classification layers. The idea behind is that you want to keep all the good features learned in the lower levels of the network (because there's a high probability your images will also have those features) and just learn a new classifier on top of those features. This tends to work well, especially if you have small datasets that don't allow for a full training of the network from scratch (it's also much faster than a full training)

Please note that there are several ways to do TL (and I do encourage you to research the topic to find what suits you best). In my applications, I simply init the network with the weights taken from an Imagenet public checkpoint, remove the last layers and train everything from there (with a low-enough learning rate, or you'll mess up the low-level features that you actually want to keep). This approach allows for data augmentation.

Another approach is by using bottlenecks. In this context, a bottleneck, also called embedding in other contexts, is the internal representation of one of your input samples at a certain depth level in the network. Rephrasing that, you can see a bottleneck at level N as the output of the network stopped after N layers. Why is this useful? Because you can precompute the bottlenecks for all your samples using a pre-trained network and then simulate the training of only the last layers of the network without having to actually recompute all the (expensive) part of the network up to the bottleneck point.

A simplified example

Let's say you have a network with the following structure:

in -> A -> B -> C -> D -> E -> out

where in and out are input and output layers and the other are any type of layer you might have in a network. Let's also say that you found published somewhere a checkpoint of the network pre-trained on Imagenet. Imagenet has 1000 classes, none of which you need. So you'll throw away the final layer (classifier) of the network. The other layers, however, contain features you want to keep. Let E be the classifier layer in our example.

Taking samples from your dataset, you feed them to in and collect the matching bottleneck value as the output of layer D. You do this once for all samples in your dataset. The collection of bottlenecks is your new dataset you'll use to train the new clssifier.

You build a dummy network with the following structure:

bottleneck_in -> E' -> out

you now train this network as you normally would, but instead of feeding samples from your dataset, you feed the matching bottleneck from the bottleneck dataset. Note that doing this you save the computation of all layers from A to D, but this way you can't apply any data augmentation during training (of course you can still do that building the bottlenecks, but you'll have lots of data to store).

Finally, to build your final classifier, your network architecture will be

in -> A -> B -> C -> D -> E' -> out

with weights A to D taken from the public checkpoint and weights E' resulting from your training.

like image 84
GPhilo Avatar answered Jan 12 '23 10:01

GPhilo


The very short version:

  1. Load Vgg
  2. Throw the output layer and the second last layer away
  3. Put a new, randomly initialized output layer at the end
  4. Fine tune with your data

I'm almost certain Keras contains at least one code example for this

like image 27
Martin Thoma Avatar answered Jan 12 '23 11:01

Martin Thoma