using Keras fit_generator
, steps_per_epoch should be equivalent to the total number available of samples divided by the batch_size
.
But how would the generator or the fit_generator
react if I choose a batch_size
that does not fit n times into the samples? Does it yield samples until it cannot fill a whole batch_size
anymore or does it just use a smaller batch_size
for the last yield?
Why I ask: I divide my data into train/validation/test of different size (different %) but would use the same batch size for train and validation sets but especially for train and test sets. As they are different in size I cannot guarantee that batch size fit into the total amount of samples.
yield
It's you who create the generator, so the behavior is defined by you.
If steps_per_epoch
is greater than the expected batches, fit will not see anything, it will simply keep requesting batches until it reaches the number of steps.
The only thing is: you must assure your generator is infinite.
Do this with while True:
at the beginning, for instance.
ImageDataGenerator
.If the generator is from an ImageDataGenerator
, it's actually a keras.utils.Sequence
and it has the length property: len(generatorInstance)
.
Then you can check yourself what happens:
remainingSamples = total_samples % batch_size #confirm that this is gerater than 0
wholeBatches = total_samples // batch_size
totalBatches = wholeBatches + 1
if len(generator) == wholeBatches:
print("missing the last batch")
elif len(generator) == totalBatches:
print("last batch included")
else:
print('weird behavior')
And check the size of the last batch:
lastBatch = generator[len(generator)-1]
if lastBatch.shape[0] == remainingSamples:
print('last batch contains the remaining samples')
else:
print('last batch is different')
If you assign N
to the parameter steps_per_epoch
of fit_generator()
, Keras will basically call your generator N
times before considering one epoch done. It's up to your generator to yield all your samples in N
batches.
Note that since for most models it is fine to have different batch sizes each iteration, you could fix steps_per_epoch = ceil(dataset_size / batch_size)
and let your generator output a smaller batch for the last samples.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With