Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras: Training loss decrases (accuracy increase) while validation loss increases (accuracy decrease)

I am working on a very sparse dataset with the point of predicting 6 classes. I have tried working with a lot of models and architectures, but the problem remains the same.

When I start training, the acc for training will slowly start to increase and loss will decrease where as the validation will do the exact opposite.

I have really tried to deal with overfitting, and I simply cannot still believe that this is what is coursing this issue.

What have I tried

Transfer learning on VGG16:

  • exclude top layer and add dense layer with 256 units and 6 units softmax output layer
  • finetune the top CNN block
  • finetune the top 3-4 CNN blocks

To deal with overfitting I use heavy augmentation in Keras and dropout after the 256 dense layer with p=0.5.

Creating own CNN with VGG16-ish architecture:

  • including batch normalization wherever possible
  • L2 regularization on each CNN+dense layer
  • Dropout from anywhere between 0.5-0.8 after each CNN+dense+pooling layer
  • Heavy data augmentation in "on the fly" in Keras

Realising that perhaps I have too many free parameters:

  • decreasing the network to only contain 2 CNN blocks + dense + output.
  • dealing with overfitting in the same manner as above.

Without exception all training sessions are looking like this: Training & Validation loss+accuracy

The last mentioned architecture looks like this:

    reg = 0.0001

    model = Sequential()

    model.add(Conv2D(8, (3, 3), input_shape=input_shape, padding='same',
            kernel_regularizer=regularizers.l2(reg)))
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    model.add(Dropout(0.7))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.5))

    model.add(Conv2D(16, (3, 3), input_shape=input_shape, padding='same',
            kernel_regularizer=regularizers.l2(reg)))
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    model.add(Dropout(0.7))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.5))

    model.add(Flatten())
    model.add(Dense(16, kernel_regularizer=regularizers.l2(reg)))
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    model.add(Dropout(0.5))

    model.add(Dense(6))
    model.add(Activation('softmax'))

    model.compile(loss='categorical_crossentropy', optimizer='SGD',metrics=['accuracy'])

And the data is augmented by the generator in Keras and is loaded with flow_from_directory:

    train_datagen = ImageDataGenerator(rotation_range=10,
                                width_shift_range=0.05,
                                height_shift_range=0.05,
                                shear_range=0.05,
                                zoom_range=0.05,
                                rescale=1/255.,
                                fill_mode='nearest',
                                channel_shift_range=0.2*255)
    train_generator = train_datagen.flow_from_directory(
                train_data_dir,
                target_size=(img_width, img_height),
                batch_size=batch_size,
                shuffle = True,
                class_mode='categorical')

    validation_datagen = ImageDataGenerator(rescale=1/255.)
    validation_generator = validation_datagen.flow_from_directory(
                                            validation_data_dir,
                                            target_size=(img_width, img_height),
                                            batch_size=1,
                                            shuffle = True,
                                            class_mode='categorical')
like image 422
Jesper Avatar asked Nov 13 '17 19:11

Jesper


3 Answers

What I can think of by analyzing your metric outputs (from the link you provided):

enter image description here

Seems to me that approximately near epoch 30 your model is starting to overfit. Therefore you can try stopping your training in that iteration, or well just train it for ~30 epochs (or the exact number). The Keras Callbacks may be useful here, specially the ModelCheckpoint to enable you to stop your training when desired (Ctrl +C) or when certain criteria is met. Here is an example of basic ModelCheckpoint use:

#save best True saves only if the metric improves
chk = ModelCheckpoint("myModel.h5", monitor='val_loss', save_best_only=False) 
callbacks_list = [chk]
#pass callback on fit
history = model.fit(X, Y, ... , callbacks=callbacks_list)

(Edit:) As suggested in comments, another option you have available is to use the EarlyStopping callback, where you can specify the minimum change tolerated and the 'patience' or epochs without such improvement before stopping the training. If using this, you have to pass it to the callbacks argument as explained before.

At the current setup you model has (and with the modifications you have tried) that point in your training seems to be the optimal training time for your case; training it further will bring no benefits to your model (in fact, will make it generalize worse).

Given you have tried several modifications, one thing you can do is to try to increase your Network Depth, to give it more capacity. Try adding more layers, one at a time, and check for improvements. Also, you usually you want to start with simpler models first, before attempting a multi-layer solution.

If a simple model doesn't work, add one layer and test again, repeating until satisfied or possible. And by simple I mean really simple, have you tried a non-convolutional approach? Although CNN are great for images, maybe you are overkilling it here.

If nothing seems to work, maybe it is time to get more data, or to generate more data from the one you have by sampling or other techniques. For that last suggestion, try checking this keras blog I have found really useful. Deep learning algorithms usually require substantial amount of training data, specially for complex models, like images, so be aware this may not be an easy task. Hope this helps.

like image 88
DarkCygnus Avatar answered Nov 19 '22 05:11

DarkCygnus


IMHO, this is just normal situation for DL. In Keras you can setup a callback that will save the best model (depending on evaluation metric that you provide), and callback that will stop training if model isn't improving.

See ModelCheckpoint & EarlyStopping callbacks respectively.

P.S. Sorry, maybe I misunderstood question - do you have validation loss decreasing form first step?

like image 30
Alex Ott Avatar answered Nov 19 '22 07:11

Alex Ott


Validation loss is increasing. This means you need more data, or more regularization. Standard situation here, and nothing to be worried about. By the way, more parameters (bigger model) is just going to worsen this problem unless you fix it.

So you can now investigate profitably by introducing more examples, L2, L1, or dropout.

like image 2
Geoffrey Anderson Avatar answered Nov 19 '22 05:11

Geoffrey Anderson