I am reading the a deep learning with python book. After reading chapter 4, Fighting Overfitting, I have two questions. <ol> <li>Why might increasing the number of epochs cause overfitting? I know increasing increasing the number of epochs will involve more attempts at gradient descent, will this cause overfitting?</li> <li>During the process of fighting overfitting, will the accuracy be reduced ?</li> </ol>

I'm not sure which book you are reading, so some background information may help before I answer the questions specifically. Firstly, increasing the number of epochs won't necessarily cause overfitting, but it certainly can do. If the learning rate and model parameters are small, it may take many epochs to cause measurable overfitting. That said, it is common for more training to do so. To keep the question in perspective, it's important to remember that we most commonly use neural networks to build models we can use for prediction (e.g. predicting whether an image contains a particular object or what the value of a variable will be in the next time step). We build the model by iteratively adjusting weights and biases so that the network can act as a function to translate between input data and predicted outputs. We turn to such models for a number of reasons, often because we just don't know what the function is/should be or the function is too complex to develop analytically. In order for the network to be able to model such complex functions, it must be capable of being highly-complex itself. Whilst this complexity is powerful, it is dangerous! The model can become so complex that it can effectively remember the training data very precisely but then fail to act as an effective, general function that works for data outside of the training set. I.e. it can overfit. You can think of it as being a bit like someone (the model) who learns to bake by only baking fruit cake (training data) over and over again – soon they'll be able to bake an excellent fruit cake without using a recipe (training), but they probably won't be able to bake a sponge cake (unseen data) very well. Back to neural networks! Because the risk of overfitting is high with a neural network there are many tools and tricks available to the deep learning engineer to prevent overfitting, such as the use of dropout. These tools and tricks are collectively known as 'regularisation'. This is why we use development and training strategies involving test datasets – we pretend that the test data is unseen and monitor it during training. You can see an example of this in the plot below (image credit). After about 50 epochs the test error begins to increase as the model has started to 'memorise the training set', despite the training error remaining at its minimum value (often training error will continue to improve). <img src="https://i.stack.imgur.com/pJU0X.png" alt="Example of Overfitting"> So, to answer your questions: <ol> <li>Allowing the model to continue training (i.e. more epochs) increases the risk of the weights and biases being tuned to such an extent that the model performs poorly on unseen (or test/validation) data. The model is now just 'memorising the training set'.</li> <li>Continued epochs may well increase training accuracy, but this doesn't necessarily mean the model's predictions from new data will be accurate – often it actually gets worse. To prevent this, we use a test data set and monitor the test accuracy during training. This allows us to make a more informed decision on whether the model is becoming more accurate for unseen data.</li> </ol> We can use a technique called early stopping, whereby we stop training the model once test accuracy has stopped improving after a small number of epochs. Early stopping can be thought of as another regularisation technique.

why too many epochs will cause overfitting?

1 Answers

I'm not sure which book you are reading, so some background information may help before I answer the questions specifically.

Firstly, increasing the number of epochs won't necessarily cause overfitting, but it certainly can do. If the learning rate and model parameters are small, it may take many epochs to cause measurable overfitting. That said, it is common for more training to do so.

To keep the question in perspective, it's important to remember that we most commonly use neural networks to build models we can use for prediction (e.g. predicting whether an image contains a particular object or what the value of a variable will be in the next time step).

We build the model by iteratively adjusting weights and biases so that the network can act as a function to translate between input data and predicted outputs. We turn to such models for a number of reasons, often because we just don't know what the function is/should be or the function is too complex to develop analytically. In order for the network to be able to model such complex functions, it must be capable of being highly-complex itself. Whilst this complexity is powerful, it is dangerous! The model can become so complex that it can effectively remember the training data very precisely but then fail to act as an effective, general function that works for data outside of the training set. I.e. it can overfit.

You can think of it as being a bit like someone (the model) who learns to bake by only baking fruit cake (training data) over and over again – soon they'll be able to bake an excellent fruit cake without using a recipe (training), but they probably won't be able to bake a sponge cake (unseen data) very well.

Back to neural networks! Because the risk of overfitting is high with a neural network there are many tools and tricks available to the deep learning engineer to prevent overfitting, such as the use of dropout. These tools and tricks are collectively known as 'regularisation'.

This is why we use development and training strategies involving test datasets – we pretend that the test data is unseen and monitor it during training. You can see an example of this in the plot below (image credit). After about 50 epochs the test error begins to increase as the model has started to 'memorise the training set', despite the training error remaining at its minimum value (often training error will continue to improve).

Example of Overfitting

So, to answer your questions:

Allowing the model to continue training (i.e. more epochs) increases the risk of the weights and biases being tuned to such an extent that the model performs poorly on unseen (or test/validation) data. The model is now just 'memorising the training set'.
Continued epochs may well increase training accuracy, but this doesn't necessarily mean the model's predictions from new data will be accurate – often it actually gets worse. To prevent this, we use a test data set and monitor the test accuracy during training. This allows us to make a more informed decision on whether the model is becoming more accurate for unseen data.

We can use a technique called early stopping, whereby we stop training the model once test accuracy has stopped improving after a small number of epochs. Early stopping can be thought of as another regularisation technique.

151

answered Sep 28 '22 09:09

Chris

Related questions
                            
                                Neural Activation Functions - Difference between Logistic / Tanh / etc
                            
                                Should I keep/remove identical training examples that represent different objects?
                            
                                TypeError: sparse matrix length is ambiguous; use getnnz() or shape[0] while using RF classifier?
                            
                                Is the Keras implementation of dropout correct?
                            
                                Split output of a layer in keras
                            
                                Adding an additional value to a Convolutional Neural Network Input? [closed]
                            
                                What is Sequence length in LSTM?
                            
                                Using different loss functions for different outputs simultaneously Keras?
                            
                                How to handle Shift in Forecasted value
                            
                                Parameter Tuning for Perceptron Learning Algorithm
                            
                                Distance between hyperplanes
                            
                                Initial bias values for a neural network
                            
                                Where to apply batch normalization on standard CNNs
                            
                                how to replace values of selected row of a column in panda's dataframe?
                            
                                how to use to_categorical when using ImageDataGenerator
                            
                                UnboundLocalError: local variable 'batch_outputs' referenced before assignment
                            
                                how does theano.scan's updates work?
                            
                                t-SNE predictions in R
                            
                                If we combine one trainable parameters with a non-trainable parameter, is the original trainable param trainable?
                            
                                What is batch size in Caffe or convnets

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

why too many epochs will cause overfitting?

Tags:

machine-learning

gradient-descent

NingLee

People also ask

1 Answers

Chris

Recent Activity

Donate For Us