Lets start with a folder containing 1000 images.
Now if we use no generator and batch_size = 10 and steps_per_epoch = 100 we will have used every picture as 10 * 100 = 1000. So increasing steps_per_epoche will (rightfully) result in the error:
tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least
steps_per_epoch * epochsbatches (in this case, 10000 batches)
On the other hand using a generator will result in endless batches of images:
datagenerator = ImageDataGenerator(
rescale=1./255,
shear_range=0.1,
zoom_range=0.1,
# ...
)
imageFlow = datagenerator.flow_from_directory(
image_dir_with_1000_pcs,
target_size=(150, 150),
batch_size=10,
class_mode='binary')
i = 0
for x, y in imageFlow:
print(x.shape) # batch of images
i += 1
if i > 3000:
break # I break, because it ENDLESSLY goes on otherwise
But still, if I go and run
history = model.fit(
imageFlow,
steps_per_epoch=101, # I increased this above 100!
epochs=5,
#...
)
I will get the same error: WHY? model.fit() gets a generator and therefore endless batches. How can it run out of data when being fed endless batches?
Before posting this question, I read:
How can a generator (ImageDataGenerator) run out of data?
As far as I know, it creates a tf.data.Dataset from the generator, which does not run infinitely, that's why you see this behaviour when fitting.
If it was an infinite dataset, then you had to specify steps_per_epoch.
Edit: If you don't specify steps_per_epoch then the training will stop when number_of_batches >= len(dataset) // batch_size. It is done in every epoch.
For to inspect what really happens under the hood you can check the source. As it can be seen a tf.data.Dataset is created and that handles batch and epoch iteration actually.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With