flow_from_directory(directory): This takes in directory but does not take split training images.
sklearn.model_selection.KFold: Provides the split indices of images. Those could be used in fit() but not in fit_generator()
How can anyone use KFold along with ImageDataGenerator? Is it there?
At the moment one cannot split a dataset held in the folder using a flow_from_directory
generator. This option is simply not implemented. To get the test / train split one need to split the main directory into set of train / test /val directories using e.g. os
library in Python.
To anyone, who bumped into this problem: to the date, at which this answer was posted - there's no (at least, relatively) simple out-of-the-box solution in my opinion and deciding by the result of my own searches.
The only solution, that I came up with, resolving similar problem in my project, was to make partitions in my dataset, with number of partitions equal to number of folds, and saving them as dictionary with number of partition as a key and file paths list as value for partition. After that, you still have to sort your files into class folders for train and validation subsets respectively.
For example: let K=10. Algorithm can be described like this:
I'm afraid that code snippet for this solution (including sorting script and partition dictionary forming script) is too large to provide it there, but I'll gladly share it if necessary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With