Prevention of overfitting in convolutional layers of a CNN

Tags:

I'm using TensorFlow to train a Convolutional Neural Network (CNN) for a sign language application. The CNN has to classify 27 different labels, so unsurprisingly, a major problem has been addressing overfitting. I've taken several steps to accomplish this:

I've collected a large amount of high-quality training data (over 5000 samples per label).
I've built a reasonably sophisticated pre-processing stage to help maximize invariance to things like lighting conditions.
I'm using dropout on the fully-connected layers.
I'm applying L2 regularization to the fully-connected parameters.
I've done extensive hyper-parameter optimization (to the extent possible given HW and time limitations) to identify the simplest model that can achieve close to 0% loss on training data.

Unfortunately, even after all these steps, I'm finding that I can't achieve much better that about 3% test error. (It's not terrible, but for the application to be viable, I'll need to improve that substantially.)

I suspect that the source of the overfitting lies in the convolutional layers since I'm not taking any explicit steps there to regularize (besides keeping the layers as small as possible). But based on examples provided with TensorFlow, it doesn't appear that regularization or dropout is typically applied to convolutional layers.

The only approach I've found online that explicitly deals with prevention of overfitting in convolutional layers is a fairly new approach called Stochastic Pooling. Unfortunately, it appears that there is no implementation for this in TensorFlow, at least not yet.

So in short, is there a recommended approach to prevent overfitting in convolutional layers that can be achieved in TensorFlow? Or will it be necessary to create a custom pooling operator to support the Stochastic Pooling approach?

Thanks for any guidance!

469

asked Mar 21 '16 19:03

Aenimated1

1 Answers

How can I fight overfitting?

Get more data (or data augmentation)
Dropout (see paper, explanation, dropout for cnns)
DropConnect
Regularization (see my masters thesis, page 85 for examples)
Feature scale clipping
Global average pooling
Make network smaller
Early stopping

How can I improve my CNN?

Thoma, Martin. "Analysis and Optimization of Convolutional Neural Network Architectures." arXiv preprint arXiv:1707.09725 (2017).

See chapter 2.5 for analysis techniques. As written in the beginning of that chapter, you can usually do the following:

(I1) Change the problem definition (e.g., the classes which are to be distinguished)
(I2) Get more training data
(I3) Clean the training data
(I4) Change the preprocessing (see Appendix B.1)
(I5) Augment the training data set (see Appendix B.2)
(I6) Change the training setup (see Appendices B.3 to B.5)
(I7) Change the model (see Appendices B.6 and B.7)

Misc

The CNN has to classify 27 different labels, so unsurprisingly, a major problem has been addressing overfitting.

I don't understand how this is connected. You can have hundreds of labels without a problem of overfitting.

175

answered Oct 06 '22 19:10

Martin Thoma

Related questions
                            
                                TypeError: An op outside of the function building code is being passed a Graph tensor
                            
                                How to create ensemble in tensorflow?
                            
                                Tensorflow: ran out of memory trying to allocate 3.90GiB. The caller indicates that this is not a failure
                            
                                How to efficiently save a Pandas Dataframe into one/more TFRecord file?
                            
                                Process finished with exit code -1073740791 (0xC0000409) PyCharm
                            
                                Minimal RNN example in tensorflow
                            
                                Integrate Python based TensorFlow into a .NET application [closed]
                            
                                Saving image files in Tensorflow
                            
                                Unaggregated gradients / gradients per example in tensorflow
                            
                                Tensorflow: How to modify the value in tensor
                            
                                TypeError: can't pickle _thread.lock objects in Seq2Seq
                            
                                TFRecords and record shuffling
                            
                                how does tensorflow indexing work
                            
                                How to understand tf.get_collection() in TensorFlow
                            
                                what's the difference between tf.constant and tf.convert_to_tensor
                            
                                Bilinear Tensor Product in TensorFlow
                            
                                Google Cloud - Compute Engine VS Machine Learning
                            
                                Input pipeline w/ keras.utils.Sequence object or tf.data.Dataset?
                            
                                Why is TF Keras inference way slower than Numpy operations?
                            
                                Confusion about keras Model: __call__ vs. call vs. predict methods

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Prevention of overfitting in convolutional layers of a CNN

Tags:

tensorflow

conv-neural-network