Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras: update model with a bigger training set

I trained a model with Keras for text classification (supervised learning) using a training set. Let's say that there are 50.000 sentences in this training set.

During a week I collect 5.000 new sentences and I add them to the old training set.

If next week I want to train a new model with the new and bigger training set (50.000 old sentences + 5.000 new sentences), should I restart the training phase from the beginning, or can I take the old model and "update" it in some way to save some time?

like image 871
erik.b Avatar asked Nov 14 '18 08:11

erik.b


People also ask

Does reducing training set size reduce overfitting?

A model can overfit a training dataset because it has sufficient capacity to do so. Reducing the capacity of the model reduces the likelihood of the model overfitting the training dataset, to a point where it no longer overfits.

Does more training data increase accuracy?

Many enterprises assume that more training data will improve their AI, but dataset size is just one of many factors that influence accuracy. More training data improves AI performance up to a certain point but can compromise performance beyond it.

Can too much training data cause overfitting?

So increasing the amount of data can only make overfitting worse if you mistakenly also increase the complexity of your model. Otherwise, the performance on the test set should improve or remain the same, but not get significantly worse.

How does changing batch size affect the training process?

large batch size means the model makes very large gradient updates and very small gradient updates. The size of the update depends heavily on which particular samples are drawn from the dataset. On the other hand using small batch size means the model makes updates that are all about the same size.


1 Answers

You can save/load model/weights. Check out this tutorial by Jason Brownlee.

After you loaded the weights, you can start training with the new dataset (the 55000 samples). As the 'training' is basically just updating weights, and you loaded your trained weights, you are now 'updating' the already trained model.

like image 136
Dinari Avatar answered Sep 22 '22 13:09

Dinari