Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what if steps_per_epoch does not fit into numbers of samples?

using Keras fit_generator, steps_per_epoch should be equivalent to the total number available of samples divided by the batch_size.

But how would the generator or the fit_generator react if I choose a batch_size that does not fit n times into the samples? Does it yield samples until it cannot fill a whole batch_size anymore or does it just use a smaller batch_size for the last yield?

Why I ask: I divide my data into train/validation/test of different size (different %) but would use the same batch size for train and validation sets but especially for train and test sets. As they are different in size I cannot guarantee that batch size fit into the total amount of samples.

like image 961
Florida Man Avatar asked Jan 03 '23 07:01

Florida Man


2 Answers

If it's your generator with yield

It's you who create the generator, so the behavior is defined by you.

If steps_per_epoch is greater than the expected batches, fit will not see anything, it will simply keep requesting batches until it reaches the number of steps.

The only thing is: you must assure your generator is infinite.

Do this with while True: at the beginning, for instance.

If it's a generator from ImageDataGenerator.

If the generator is from an ImageDataGenerator, it's actually a keras.utils.Sequence and it has the length property: len(generatorInstance).

Then you can check yourself what happens:

remainingSamples = total_samples % batch_size #confirm that this is gerater than 0
wholeBatches = total_samples // batch_size
totalBatches = wholeBatches + 1

if len(generator) == wholeBatches:
    print("missing the last batch")    
elif len(generator) == totalBatches:
    print("last batch included")
else:
    print('weird behavior')

And check the size of the last batch:

lastBatch = generator[len(generator)-1]

if lastBatch.shape[0] == remainingSamples:
    print('last batch contains the remaining samples')
else:
    print('last batch is different')
like image 112
Daniel Möller Avatar answered Jan 04 '23 20:01

Daniel Möller


If you assign N to the parameter steps_per_epoch of fit_generator(), Keras will basically call your generator N times before considering one epoch done. It's up to your generator to yield all your samples in N batches.

Note that since for most models it is fine to have different batch sizes each iteration, you could fix steps_per_epoch = ceil(dataset_size / batch_size) and let your generator output a smaller batch for the last samples.

like image 31
benjaminplanche Avatar answered Jan 04 '23 20:01

benjaminplanche