reduce size of pretrained deep learning model for feature generation

Question

I am using an pretrained model in Keras to generate features for a set of images:

model = InceptionV3(weights='imagenet', include_top=False)
train_data = model.predict(data).reshape(data.shape[0],-1)

However, I have a lot of images and the Imagenet model outputs 131072 features (columns) for each image.

With 200k images I would get an array of (200000, 131072) which is too large to fit into memory.

More importantly, I need to save this array to disk and it would take 100 GB of space when saved as .npy or .h5py

I could circumvent the memory problem by feeding only batches of like 1000 images and saving them to disk, but not the disk space problem.

How can I make the model smaller without losing too much information?

update

as the answer suggested I include the next layer in the model as well:

base_model = InceptionV3(weights='imagenet')
model = Model(input=base_model.input, output=base_model.get_layer('avg_pool').output)

this reduced the output to (200000, 2048)

update 2:

another interesting solution may be the bcolz package to reduce size of numpy arrays https://github.com/Blosc/bcolz

Marcin Możejko · Accepted Answer

I see at least two solutions to your problem:

Apply a model = AveragePooling2D((8, 8), strides=(8, 8))(model) where model is an InceptionV3 object you loaded (without top). This is the next step in InceptionV3 architecture - so one may easily assume - that these features still hold loads of discriminatory clues.
Apply a some kind of dimensionality reduction (e.g. like PCA) on a sample of data and reduce the dimensionality of all data to get the reasonable file size.

Donate For Us