What's the simplest way I can use flow_from_directory
in Keras while limiting the number of examples used in each subdirectory by some number N
?
For context, I'd like to be able to use a small subset of the total images for testing purposes without having to create a separate top level directory for the smaller dataset, since I'm pulling this data from AWS S3 buckets during training.
Then the "ImageDataGenerator" will produce 10 images in each iteration of the training. An iteration is defined as steps per epoch i.e. the total number of samples / batch_size. In above case, in each epoch of training there will be 100 iterations.
target_size: Size of the input image. color_mode: Set to rgb for colored images otherwise grayscale if the images are black and white. batch_size: Size of the batches of data. class_mode: Set to binary is for 1-D binary labels whereas categorical is for 2-D one-hot encoded labels.
flow_from_directory Method This method is useful when the images are sorted and placed in there respective class/label folders. This method will identify classes automatically from the folder name.
The batch size is the number of images to be produced from the generator. The class mode has to be set "binary" if there are just two classes if not, we can set it to "categorical".
Create keras.preprocessing.image.ImageDataGenerator
with argument validation_split
specified as float. In such case you can use argument subset
in flow_from_directory
to get only some samples from each directory. More info here.
If you want N
images from each folder specifically, you would have to calculate how many files are there in each directory, and set train-validation split accordingly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With