How can a generator (ImageDataGenerator) run out of data?

Question

Lets start with a folder containing 1000 images.

Now if we use no generator and batch_size = 10 and steps_per_epoch = 100 we will have used every picture as 10 * 100 = 1000. So increasing steps_per_epoche will (rightfully) result in the error:

tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 10000 batches)

On the other hand using a generator will result in endless batches of images:

datagenerator = ImageDataGenerator(
    rescale=1./255,
    shear_range=0.1,
    zoom_range=0.1,
    # ...
)

imageFlow = datagenerator.flow_from_directory(
        image_dir_with_1000_pcs,
        target_size=(150, 150),
        batch_size=10,
        class_mode='binary')

i = 0
for x, y in imageFlow:
  print(x.shape) # batch of images
  
  i += 1
  if i > 3000:
    break # I break, because it ENDLESSLY goes on otherwise

But still, if I go and run

history = model.fit(
      imageFlow,
      steps_per_epoch=101, # I increased this above 100!
      epochs=5,
      #...
)

I will get the same error: WHY? model.fit() gets a generator and therefore endless batches. How can it run out of data when being fed endless batches?

Before posting this question, I read:

documentation like TensorFlow ImageDataGenerator and
relative SO questions like Keras Data Augmentation with ImageDataGenerator (Your input ran out of data)

Frightera · Accepted Answer

How can a generator (ImageDataGenerator) run out of data?

As far as I know, it creates a tf.data.Dataset from the generator, which does not run infinitely, that's why you see this behaviour when fitting.

If it was an infinite dataset, then you had to specify steps_per_epoch.

Edit: If you don't specify steps_per_epoch then the training will stop when number_of_batches >= len(dataset) // batch_size. It is done in every epoch.

For to inspect what really happens under the hood you can check the source. As it can be seen a tf.data.Dataset is created and that handles batch and epoch iteration actually.

How can a generator (ImageDataGenerator) run out of data?

Tags:

python

tensorflow

keras

flipcc

1 Answers

Frightera

Recent Activity

Donate For Us

How can a generator (ImageDataGenerator) run out of data?

Tags:

python

tensorflow

keras

flipcc

1 Answers

Frightera

Related questions

Recent Activity

Donate For Us