Is it possible to get the file names that were loaded using flow_from_directory
? I have :
datagen = ImageDataGenerator( rotation_range=3, # featurewise_std_normalization=True, fill_mode='nearest', width_shift_range=0.2, height_shift_range=0.2, horizontal_flip=True ) train_generator = datagen.flow_from_directory( path+'/train', target_size=(224, 224), batch_size=batch_size,)
I have a custom generator for my multi output model like:
a = np.arange(8).reshape(2, 4) # print(a) print(train_generator.filenames) def generate(): while 1: x,y = train_generator.next() yield [x] ,[a,y]
Node that at the moment I am generating random numbers for a
but for real training , I wish to load up a json
file that contains the bounding box coordinates for my images. For that I will need to get the file names that were generated using train_generator.next()
method. After I have that , I can load the file, parse the json
and pass it instead of a
. It is also necessary that the ordering of the x
variable and the list of the file names that I get is the same.
Yes is it possible, at least with version 2.0.4 (don't know about earlier version).
The instance of ImageDataGenerator().flow_from_directory(...)
has an attribute with filenames
which is a list of all the files in the order the generator yields them and also an attribute batch_index
. So you can do it like this:
datagen = ImageDataGenerator() gen = datagen.flow_from_directory(...)
And every iteration on generator you can get the corresponding filenames like this:
for i in gen: idx = (gen.batch_index - 1) * gen.batch_size print(gen.filenames[idx : idx + gen.batch_size])
This will give you the filenames of the images in the current batch.
You can make a pretty minimal subclass that returns the image, file_path
tuple by inheriting the DirectoryIterator
:
import numpy as np from keras.preprocessing.image import ImageDataGenerator, DirectoryIterator class ImageWithNames(DirectoryIterator): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.filenames_np = np.array(self.filepaths) self.class_mode = None # so that we only get the images back def _get_batches_of_transformed_samples(self, index_array): return (super()._get_batches_of_transformed_samples(index_array), self.filenames_np[index_array])
In the init, I added a attribute that is the numpy version of self.filepaths
so that we can easily index into that array to get the paths on each batch generation.
The only other change to the base class is to return a tuple that is the image batch super()._get_batches_of_transformed_samples(index_array)
and the file paths self.filenames_np[index_array]
.
With that, you can make your generator like so:
imagegen = ImageDataGenerator() datagen = ImageWithNames('/data/path', imagegen, target_size=(224,224))
And then check with
next(datagen)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With