Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does ImageDataGenerator add more images to my dataset?

I'm trying to do image classification with the Inception V3 model. Does ImageDataGenerator from Keras create new images which are added onto my dataset? If I have 1000 images, will using this function double it to 2000 images which are used for training? Is there a way to know how many images were created and now fed into the model?

like image 794
student17 Avatar asked Aug 08 '18 13:08

student17


People also ask

Does ImageDataGenerator increase number of images?

Then the "ImageDataGenerator" will produce 10 images in each iteration of the training. An iteration is defined as steps per epoch i.e. the total number of samples / batch_size. In above case, in each epoch of training there will be 100 iterations.

Does image augmentation increase dataset size?

yes you can increase the size by creating and saving augmented images for each class then merging those images with the original trainset.

Why we use ImageDataGenerator?

ImageDataGenerator helps in flipping the images, it can either flip horizontally or vertically. The below example shows how we can randomly flip the images. For horizontal flip operation, we are using horizontal_flip argument.


2 Answers

Short answer: 1) All the original images are just transformed (i.e. rotation, zooming, etc.) every epoch and then used for training, and 2) [Therefore] the number of images in each epoch is equal to the number of original images you have.

Long answer: In each epoch, the ImageDataGenerator applies a transformation on the images you have and use the transformed images for training. The set of transformations includes rotation, zooming, etc. By doing this you're somehow creating new data (i.e. also called data augmentation), but obviously the generated images are not totally different from the original ones. This way the learned model may be more robust and accurate as it is trained on different variations of the same image.

You need to set the steps_per_epoch argument of fit method to n_samples / batch_size, where n_samples is the total number of training data you have (i.e. 1000 in your case). This way in each epoch, each training sample is augmented only one time and therefore 1000 transformed images will be generated in each epoch.

Further, I think it's worth clarifying the meaning of "augmentation" in this context: basically we are augmenting the images when we use ImageDataGenerator and enabling its augmentation capabilities. But the word "augmentation" here does not mean, say, if we have 100 original training images we end up having 1000 images per epoch after augmentation (i.e. the number of training images does not increase per epoch). Instead, it means we use a different transformation of each image in each epoch; hence, if we train our model for, say, 5 epochs, we have used 5 different versions of each original image in training (or 100 * 5 = 500 different images in the whole training, instead of using just the 100 original images in the whole training). To put it differently, the total number of unique images increases in the whole training from start to finish, and not per epoch.

like image 113
today Avatar answered Sep 30 '22 19:09

today


Here is my attempt to answer as I also had this question on my mind.

ImageDataGenerator will NOT add new images to your data set in a sense that it will not make your epochs bigger. Instead, in each epoch it will provide slightly altered images (depending on your configuration). It will always generate new images, no matter how many epochs you have.

So in each epoch model will train on different images, but not too different. This should prevent overfitting and in some way simulates online learning.

All these alterations happen in memory, but if you want to see these images you can save them to disc, inspect them, see how many of them were generated and get the sense of how ImageDataGenerator works. To do this pass save_to_dir=/tmp/img-data-gen-outputs to function flow_from_directory. See docs.

like image 31
Marko Avatar answered Sep 30 '22 19:09

Marko