Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assign ImageDataGenerator result to Numpy array

I'm using the ImageDataGenerator inside Keras to read a directory of images. I'd like to save the result inside a numpy array, so I can do further manipulations and save it to disk in one file.

flow_from_directory() returns an iterator, which is why I tried the following

itr = gen.flow_from_directory('data/train/', batch_size=1, target_size=(32,32))
imgs = np.concatenate([itr.next() for i in range(itr.nb_sample)])

but that produced

ValueError: could not broadcast input array from shape (32,32,3) into shape (1)

I think I'm misusing the concatenate() function, but I can't figure out where I fail.

like image 252
pietz Avatar asked Feb 16 '17 21:02

pietz


2 Answers

While using ImageDataGenerator, the data is loaded in the format of the directoryiterator. you can extract it as batches or as a whole

train_generator = train_datagen.flow_from_directory(
    train_parent_dir,
    target_size=(300, 300),
    batch_size=32,
    class_mode='categorical'
)

the output of which is

Found 3875 images belonging to 3 classes.

to extract as numpy array as a whole(which means not as a batch), this code can be used

x=np.concatenate([train_generator.next()[0] for i in range(train_generator.__len__())])
y=np.concatenate([train_generator.next()[1] for i in range(train_generator.__len__())])
print(x.shape)
print(y.shape)

NOTE:BEFORE THIS CODE IT IS ADVISED TO USE train_generator.reset()

the output of above code is

(3875, 300, 300, 3)
(3875, 3)

The output is obtained as a numpy array together, even though it was loaded as batches of 32 using ImageDataGenerator.

To get the output as batches use the following code

x=[]
y=[]
train_generator.reset()
for i in range(train_generator.__len__()):
   a,b=train_generator.next()
   x.append(a)
   y.append(b)
x=np.array(x)
y=np.array(y)
print(x.shape)
print(y.shape)

the output of the code is

(122,)
(122,)

Hope this works as a solution

like image 78
John Paulson Avatar answered Nov 01 '22 00:11

John Paulson


I had the same problem and solved it the following way: itr.next returns the next batch of images as two numpy.ndarray objects: batch_x, batch_y. (Source: keras/preprocessing/image.py) So what you can do is set the batch_size for flow_from_directory to the size of your whole train dataset.

Example, my whole training set consists of 1481 images:

train_datagen = ImageDataGenerator(rescale=1. / 255)
itr = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=1481,
class_mode='categorical')

X, y = itr.next()
like image 45
Florian Avatar answered Nov 01 '22 02:11

Florian