I create a multi-scale CNN in Python Keras. The network architecture is similar to the diagram. Here, the same image is fed to 3 CNN's with different architectures. The weights are NOT shared.
The code I wrote is available below. The issue is that when I run this even with 10 images in train_dir
the network takes about 40GB RAM and finally is killed by the OS. This is "Out of memory ERROR". I am running this on CPU. Any idea why this happens in Keras?
I am using Theano-0.9.0.dev5 | Keras-1.2.1 | Python 2.7.12 | OSX Sierra 10.12.3 (16D32)
## Multi-scale CNN in Keras Python
## https://i.stack.imgur.com/2H4xD.png
#main CNN model - CNN1
main_model = Sequential()
main_model.add(Convolution2D(32, 3, 3, input_shape=(3, 224, 224)))
main_model.add(Activation('relu'))
main_model.add(MaxPooling2D(pool_size=(2, 2)))
main_model.add(Convolution2D(32, 3, 3))
main_model.add(Activation('relu'))
main_model.add(MaxPooling2D(pool_size=(2, 2)))
main_model.add(Convolution2D(64, 3, 3))
main_model.add(Activation('relu'))
main_model.add(MaxPooling2D(pool_size=(2, 2))) # the main_model so far outputs 3D feature maps (height, width, features)
main_model.add(Flatten())
#lower features model - CNN2
lower_model1 = Sequential()
lower_model1.add(Convolution2D(32, 3, 3, input_shape=(3, 224, 224)))
lower_model1.add(Activation('relu'))
lower_model1.add(MaxPooling2D(pool_size=(2, 2)))
lower_model1.add(Flatten())
#lower features model - CNN3
lower_model2 = Sequential()
lower_model2.add(Convolution2D(32, 3, 3, input_shape=(3, 224, 224)))
lower_model2.add(Activation('relu'))
lower_model2.add(MaxPooling2D(pool_size=(2, 2)))
lower_model2.add(Flatten())
#merged model
merged_model = Merge([main_model, lower_model1, lower_model2], mode='concat')
final_model = Sequential()
final_model.add(merged_model)
final_model.add(Dense(64))
final_model.add(Activation('relu'))
final_model.add(Dropout(0.5))
final_model.add(Dense(1))
final_model.add(Activation('sigmoid'))
final_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
print 'About to start training merged CNN'
train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True)
train_generator = train_datagen.flow_from_directory(train_data_dir, target_size=(224, 224), batch_size=32, class_mode='binary')
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(args.test_images, target_size=(224, 224), batch_size=32, class_mode='binary')
final_train_generator = zip(train_generator, train_generator, train_generator)
final_test_generator = zip(test_generator, test_generator, test_generator)
final_model.fit_generator(final_train_generator, samples_per_epoch=nb_train_samples, nb_epoch=nb_epoch, validation_data=final_test_generator, nb_val_samples=nb_validation_samples)
The number of nodes in lower_model1
and lower_model2
after flattening is
32 * 112 * 112 = 401 408
. Followed by a fully connected layer with 64 nodes this gives 401 408 * 2 * 64 = 51 380 224
parameters, which is quite a big number. I would suggest to reconsider size of the images fed to your "lower" models. Do you really need 224 x 224
size there? Take a closer look at the diagram that you attached. There you see that the first step in the second and the third model is subsampling: 8:1
and 4:1
. This is the step that you have missed in your implementation.
Your main_model
is fine because you have enough max pooling layers there that reduce the number of parameters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With