Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras flowFromDirectory get file names as they are being generated

Is it possible to get the file names that were loaded using flow_from_directory ? I have :

datagen = ImageDataGenerator(     rotation_range=3, #     featurewise_std_normalization=True,     fill_mode='nearest',     width_shift_range=0.2,     height_shift_range=0.2,     horizontal_flip=True )  train_generator = datagen.flow_from_directory(         path+'/train',         target_size=(224, 224),         batch_size=batch_size,) 

I have a custom generator for my multi output model like:

a = np.arange(8).reshape(2, 4) # print(a)  print(train_generator.filenames)  def generate():     while 1:         x,y = train_generator.next()         yield [x] ,[a,y] 

Node that at the moment I am generating random numbers for a but for real training , I wish to load up a json file that contains the bounding box coordinates for my images. For that I will need to get the file names that were generated using train_generator.next() method. After I have that , I can load the file, parse the json and pass it instead of a. It is also necessary that the ordering of the x variable and the list of the file names that I get is the same.

like image 952
harveyslash Avatar asked Jan 18 '17 08:01

harveyslash


2 Answers

Yes is it possible, at least with version 2.0.4 (don't know about earlier version).

The instance of ImageDataGenerator().flow_from_directory(...) has an attribute with filenames which is a list of all the files in the order the generator yields them and also an attribute batch_index. So you can do it like this:

datagen = ImageDataGenerator() gen = datagen.flow_from_directory(...) 

And every iteration on generator you can get the corresponding filenames like this:

for i in gen:     idx = (gen.batch_index - 1) * gen.batch_size     print(gen.filenames[idx : idx + gen.batch_size]) 

This will give you the filenames of the images in the current batch.

like image 77
Picard Avatar answered Sep 22 '22 12:09

Picard


You can make a pretty minimal subclass that returns the image, file_path tuple by inheriting the DirectoryIterator:

import numpy as np from keras.preprocessing.image import ImageDataGenerator, DirectoryIterator  class ImageWithNames(DirectoryIterator):     def __init__(self, *args, **kwargs):         super().__init__(*args, **kwargs)         self.filenames_np = np.array(self.filepaths)         self.class_mode = None # so that we only get the images back      def _get_batches_of_transformed_samples(self, index_array):         return (super()._get_batches_of_transformed_samples(index_array),                 self.filenames_np[index_array]) 

In the init, I added a attribute that is the numpy version of self.filepaths so that we can easily index into that array to get the paths on each batch generation.

The only other change to the base class is to return a tuple that is the image batch super()._get_batches_of_transformed_samples(index_array) and the file paths self.filenames_np[index_array].

With that, you can make your generator like so:

imagegen = ImageDataGenerator() datagen = ImageWithNames('/data/path', imagegen, target_size=(224,224)) 

And then check with

next(datagen) 
like image 25
Bob Baxley Avatar answered Sep 20 '22 12:09

Bob Baxley