Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

keras: issue using ImageDataGenerator and KFold for fit_generator

flow_from_directory(directory): This takes in directory but does not take split training images.

sklearn.model_selection.KFold: Provides the split indices of images. Those could be used in fit() but not in fit_generator()

How can anyone use KFold along with ImageDataGenerator? Is it there?

like image 623
Henry Avatar asked Jan 22 '17 16:01

Henry


2 Answers

At the moment one cannot split a dataset held in the folder using a flow_from_directory generator. This option is simply not implemented. To get the test / train split one need to split the main directory into set of train / test /val directories using e.g. os library in Python.

like image 144
Marcin Możejko Avatar answered Nov 15 '22 04:11

Marcin Możejko


To anyone, who bumped into this problem: to the date, at which this answer was posted - there's no (at least, relatively) simple out-of-the-box solution in my opinion and deciding by the result of my own searches.

The only solution, that I came up with, resolving similar problem in my project, was to make partitions in my dataset, with number of partitions equal to number of folds, and saving them as dictionary with number of partition as a key and file paths list as value for partition. After that, you still have to sort your files into class folders for train and validation subsets respectively.

For example: let K=10. Algorithm can be described like this:

  • Divide your dataset into 10 equally-sized partitions.
  • Take one partition as validation subset. Sort it by classes into required folders.
  • Rest of partitions should be considered as training subset and sorted into required folders.
  • Create data_generators for val and train subsets.
  • Train your model and save it using your architecture.
  • Repeat steps described above for every other partition (take one partition as val, train on others) but now you have to load your model from save file.

I'm afraid that code snippet for this solution (including sorting script and partition dictionary forming script) is too large to provide it there, but I'll gladly share it if necessary.

like image 28
Mirazent Avatar answered Nov 15 '22 06:11

Mirazent