Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi scale CNN Network Python Keras

I create a multi-scale CNN in Python Keras. The network architecture is similar to the diagram. Here, the same image is fed to 3 CNN's with different architectures. The weights are NOT shared.

enter image description here

The code I wrote is available below. The issue is that when I run this even with 10 images in train_dir the network takes about 40GB RAM and finally is killed by the OS. This is "Out of memory ERROR". I am running this on CPU. Any idea why this happens in Keras?

I am using Theano-0.9.0.dev5 | Keras-1.2.1 | Python 2.7.12 | OSX Sierra 10.12.3 (16D32)

## Multi-scale CNN in Keras Python
## https://i.stack.imgur.com/2H4xD.png

#main CNN model - CNN1
main_model = Sequential()
main_model.add(Convolution2D(32, 3, 3, input_shape=(3, 224, 224)))
main_model.add(Activation('relu'))
main_model.add(MaxPooling2D(pool_size=(2, 2)))

main_model.add(Convolution2D(32, 3, 3))
main_model.add(Activation('relu'))
main_model.add(MaxPooling2D(pool_size=(2, 2)))

main_model.add(Convolution2D(64, 3, 3))
main_model.add(Activation('relu'))
main_model.add(MaxPooling2D(pool_size=(2, 2))) # the main_model so far outputs 3D feature maps (height, width, features)

main_model.add(Flatten())

#lower features model - CNN2
lower_model1 = Sequential()
lower_model1.add(Convolution2D(32, 3, 3, input_shape=(3, 224, 224)))
lower_model1.add(Activation('relu'))
lower_model1.add(MaxPooling2D(pool_size=(2, 2)))
lower_model1.add(Flatten())

#lower features model - CNN3
lower_model2 = Sequential()
lower_model2.add(Convolution2D(32, 3, 3, input_shape=(3, 224, 224)))
lower_model2.add(Activation('relu'))
lower_model2.add(MaxPooling2D(pool_size=(2, 2)))
lower_model2.add(Flatten())

#merged model
merged_model = Merge([main_model, lower_model1, lower_model2], mode='concat')

final_model = Sequential()                     
final_model.add(merged_model)                  
final_model.add(Dense(64))
final_model.add(Activation('relu'))
final_model.add(Dropout(0.5))      
final_model.add(Dense(1))
final_model.add(Activation('sigmoid'))
final_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

print 'About to start training merged CNN'
train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True)
train_generator = train_datagen.flow_from_directory(train_data_dir, target_size=(224, 224), batch_size=32, class_mode='binary')

test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(args.test_images, target_size=(224, 224), batch_size=32, class_mode='binary')

final_train_generator = zip(train_generator, train_generator, train_generator)
final_test_generator  = zip(test_generator, test_generator, test_generator)
final_model.fit_generator(final_train_generator, samples_per_epoch=nb_train_samples, nb_epoch=nb_epoch, validation_data=final_test_generator, nb_val_samples=nb_validation_samples)

enter image description here

like image 705
Srikar Appalaraju Avatar asked Jan 24 '17 08:01

Srikar Appalaraju


1 Answers

The number of nodes in lower_model1 and lower_model2 after flattening is 32 * 112 * 112 = 401 408. Followed by a fully connected layer with 64 nodes this gives 401 408 * 2 * 64 = 51 380 224 parameters, which is quite a big number. I would suggest to reconsider size of the images fed to your "lower" models. Do you really need 224 x 224 size there? Take a closer look at the diagram that you attached. There you see that the first step in the second and the third model is subsampling: 8:1 and 4:1. This is the step that you have missed in your implementation.

Your main_model is fine because you have enough max pooling layers there that reduce the number of parameters.

like image 118
Sergii Gryshkevych Avatar answered Oct 10 '22 20:10

Sergii Gryshkevych