Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the right way to measure if a machine learning model has overfit?

I understand the intuitive meaning of overfitting and underfitting. Now, given a particular machine learning model that is trained upon the training data, how can you tell if the training overfitted or underfitted the data? Is there a quantitative way to measure these factors?

Can we look at the error and say if it has overfit or underfit?

like image 331
London guy Avatar asked Dec 11 '22 22:12

London guy


2 Answers

I believe the easiest approach is to have two sets of data. Training data and validation data. You train the model on the training data as long as the fitness of the model on the training data is close to the fitness of the model on the validation data. When the models fitness is increasing on the training data but not on the validation data then you're overfitting.

like image 114
Erik Avatar answered Dec 14 '22 10:12

Erik


The usual way, I think, is known as cross-validation. The idea is to split the training set into several pieces, known as folds, then pick one at a time for evaluation and train on the remaining ones.

It does not, of course, measure the actual overfitting or underfitting, but if you can vary the complexity of the model, e.g. by changing the regularization term, you can find the optimal point. This is as far as one can go with just training and testing, I think.

like image 30
Qnan Avatar answered Dec 14 '22 11:12

Qnan