Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prevention of overfitting in convolutional layers of a CNN

I'm using TensorFlow to train a Convolutional Neural Network (CNN) for a sign language application. The CNN has to classify 27 different labels, so unsurprisingly, a major problem has been addressing overfitting. I've taken several steps to accomplish this:

  1. I've collected a large amount of high-quality training data (over 5000 samples per label).
  2. I've built a reasonably sophisticated pre-processing stage to help maximize invariance to things like lighting conditions.
  3. I'm using dropout on the fully-connected layers.
  4. I'm applying L2 regularization to the fully-connected parameters.
  5. I've done extensive hyper-parameter optimization (to the extent possible given HW and time limitations) to identify the simplest model that can achieve close to 0% loss on training data.

Unfortunately, even after all these steps, I'm finding that I can't achieve much better that about 3% test error. (It's not terrible, but for the application to be viable, I'll need to improve that substantially.)

I suspect that the source of the overfitting lies in the convolutional layers since I'm not taking any explicit steps there to regularize (besides keeping the layers as small as possible). But based on examples provided with TensorFlow, it doesn't appear that regularization or dropout is typically applied to convolutional layers.

The only approach I've found online that explicitly deals with prevention of overfitting in convolutional layers is a fairly new approach called Stochastic Pooling. Unfortunately, it appears that there is no implementation for this in TensorFlow, at least not yet.

So in short, is there a recommended approach to prevent overfitting in convolutional layers that can be achieved in TensorFlow? Or will it be necessary to create a custom pooling operator to support the Stochastic Pooling approach?

Thanks for any guidance!

like image 469
Aenimated1 Avatar asked Mar 21 '16 19:03

Aenimated1


People also ask

How does CNN determine overfitting?

In terms of 'loss', overfitting reveals itself when your model has a low error in the training set and a higher error in the testing set. You can identify this visually by plotting your loss and accuracy metrics and seeing where the performance metrics converge for both datasets.


1 Answers

How can I fight overfitting?

  • Get more data (or data augmentation)
  • Dropout (see paper, explanation, dropout for cnns)
  • DropConnect
  • Regularization (see my masters thesis, page 85 for examples)
  • Feature scale clipping
  • Global average pooling
  • Make network smaller
  • Early stopping

How can I improve my CNN?

Thoma, Martin. "Analysis and Optimization of Convolutional Neural Network Architectures." arXiv preprint arXiv:1707.09725 (2017).

See chapter 2.5 for analysis techniques. As written in the beginning of that chapter, you can usually do the following:

  • (I1) Change the problem definition (e.g., the classes which are to be distinguished)
  • (I2) Get more training data
  • (I3) Clean the training data
  • (I4) Change the preprocessing (see Appendix B.1)
  • (I5) Augment the training data set (see Appendix B.2)
  • (I6) Change the training setup (see Appendices B.3 to B.5)
  • (I7) Change the model (see Appendices B.6 and B.7)

Misc

The CNN has to classify 27 different labels, so unsurprisingly, a major problem has been addressing overfitting.

I don't understand how this is connected. You can have hundreds of labels without a problem of overfitting.

like image 175
Martin Thoma Avatar answered Oct 06 '22 19:10

Martin Thoma