Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calling "fit" multiple times in Keras

I've working on a CNN over several hundred GBs of images. I've created a training function that bites off 4Gb chunks of these images and calls fit over each of these pieces. I'm worried that I'm only training on the last piece on not the entire dataset.

Effectively, my pseudo-code looks like this:

DS = lazy_load_400GB_Dataset() for section in DS:     X_train = section.images     Y_train = section.classes      model.fit(X_train, Y_train, batch_size=16, nb_epoch=30) 

I know that the API and the Keras forums say that this will train over the entire dataset, but I can't intuitively understand why the network wouldn't relearn over just the last training chunk.

Some help understanding this would be much appreciated.

Best, Joe

like image 815
jonas smith Avatar asked Sep 01 '16 05:09

jonas smith


People also ask

What does calling Fit () multiple times on the same model do?

fit(X_train, y_train) for a second time - it'll overwrite all previously fitted coefficients, weights, intercept (bias), etc.

Does model fit reset model?

No, it will use the preexisting weights your model had and perform updates on them. This means you can do consecutive calls to fit if you want to and manage it properly.

What is the default batch size in the fit function of keras?

Number of samples per batch. If unspecified, batch_size will default to 32.

What is epochs in fit?

A number of epochs mean how many times you go through your training set. The model is updated each time a batch is processed, which means that it can be updated multiple times during one epoch. If batch_size is set equal to the length of x, then the model will be updated once per epoch. Hope this answer helps.


2 Answers

This question was raised at the Keras github repository in Issue #4446: Quick Question: can a model be fit for multiple times? It was closed by François Chollet with the following statement:

Yes, successive calls to fit will incrementally train the model.

So, yes, you can call fit multiple times.

like image 154
curlyhairedgenius Avatar answered Sep 23 '22 05:09

curlyhairedgenius


For datasets that do not fit into memory, there is an answer in the Keras Documentation FAQ section

You can do batch training using model.train_on_batch(X, y) and model.test_on_batch(X, y). See the models documentation.

Alternatively, you can write a generator that yields batches of training data and use the method model.fit_generator(data_generator, samples_per_epoch, nb_epoch).

You can see batch training in action in our CIFAR10 example.

So if you want to iterate your dataset the way you are doing, you should probably use model.train_on_batch and take care of the batch sizes and iteration yourself.

One more thing to note is that you should make sure the order in which the samples you train your model with is shuffled after each epoch. The way you have written the example code seems to not shuffle the dataset. You can read a bit more about shuffling here and here

like image 25
Makis Tsantekidis Avatar answered Sep 21 '22 05:09

Makis Tsantekidis