I looked to the following examples from Keras:
MLP in MNIST: https://github.com/fchollet/keras/blob/master/examples/mnist_mlp.py
CNN in MNIST: https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py
I run both in Theano on CPU. In the MLP I have a mean time of approximately 16s per epoch with a total of 669,706 parameters:
Layer (type) Output Shape Param #
=================================================================
dense_33 (Dense) (None, 512) 401920
_________________________________________________________________
dropout_16 (Dropout) (None, 512) 0
_________________________________________________________________
dense_34 (Dense) (None, 512) 262656
_________________________________________________________________
dropout_17 (Dropout) (None, 512) 0
_________________________________________________________________
dense_35 (Dense) (None, 10) 5130
=================================================================
Total params: 669,706.0
Trainable params: 669,706.0
Non-trainable params: 0.0
In the CNN, I eliminated the last hidden layer from the original code. I also changed the optimizer to rmsprop to make both cases comparable, leaving the following architecture:
Layer (type) Output Shape Param #
=================================================================
conv2d_36 (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
conv2d_37 (Conv2D) (None, 24, 24, 64) 18496
_________________________________________________________________
max_pooling2d_17 (MaxPooling (None, 12, 12, 64) 0
_________________________________________________________________
dropout_22 (Dropout) (None, 12, 12, 64) 0
_________________________________________________________________
flatten_17 (Flatten) (None, 9216) 0
_________________________________________________________________
dense_40 (Dense) (None, 10) 92170
=================================================================
Total params: 110,986.0
Trainable params: 110,986.0
Non-trainable params: 0.0
However, the average time here is of approximately 340 s per epoch! Even though there are six times less parameters!
To check more on this, I reduced the number of filters per layer to 4, leaving the following architecture:
Layer (type) Output Shape Param #
=================================================================
conv2d_38 (Conv2D) (None, 26, 26, 4) 40
_________________________________________________________________
conv2d_39 (Conv2D) (None, 24, 24, 4) 148
_________________________________________________________________
max_pooling2d_18 (MaxPooling (None, 12, 12, 4) 0
_________________________________________________________________
dropout_23 (Dropout) (None, 12, 12, 4) 0
_________________________________________________________________
flatten_18 (Flatten) (None, 576) 0
_________________________________________________________________
dense_41 (Dense) (None, 10) 5770
=================================================================
Total params: 5,958.0
Trainable params: 5,958.0
Non-trainable params: 0.0
Now the time is of 28 s per epoch even though there are roughly 6000 parameters!!
Why is this? Intuitively, the optimization should only depend on the number of variables and the calculation of the gradient (which due to same batch size should be similar).
Some light on this? Thank you
It is clearly evident that the CNN converges faster than the MLP model in terms of epochs but each epoch in CNN model takes more time compared to MLP model as the number of parameters is more in CNN model than in MLP model in this example.
Both MLP and CNN can be used for Image classification however MLP takes vector as input and CNN takes tensor as input so CNN can understand spatial relation(relation between nearby pixels of image)between pixels of images better thus for complicated images CNN will perform better than MLP.
Convolutions are not densely connected, not all input nodes affect all output nodes. This gives convolutional layers more flexibility in learning. Moreover, the number of weights per layer is a lot smaller, which helps a lot with high-dimensional inputs such as image data.
Convolutional Neural Networks generally can take a long time to train. We have found that even performing adequate transfer learning on a pre-trained model such as VGG16 or ResNet can take over an hour per epoch if working with a large dataset over a pipeline which includes aggressive image augmentation.
I assume the kernel size is (3x3) for all the convolution operations and input 2D array channel size as 3.
For conv2d_36
you would have:
So, excluding all the summation(bias + conv internal),
conv2d_36
you would have 3 * 32 * 26 * 26 * 3 * 3 =~ 585k
multiplication operationsconv2d_37
, similarly 32 * 64 * 24 * 24 * 3 * 3 =~ 10.6M
multiplication operationsdense_40
as there is no convolution, it would be equal to 9216 * 10 = 92k
multiplication operations. When we sum up all of them, there are ~11.3M
single multiplication operations for second model with CNN.
On the other hand, if we flatten it and apply MLP,
dense_33
layer, there will be 28 * 28 * 3 * 512 = 1.2M
multiplication operationsdense_34
layer, there will be 512 * 512 = 262k
multiplication operationsdense_35
layer, there will be 512 * 10 = 5k
multiplication operationsWhen we sum up all of them, there are ~1.5M
single multiplication operations for first model with MLP.
Hence, just the multiplications of CNN model are ~7.5 times more than MLP model. Considering the overhead within the layers, other operation costs like summation and memory copy/access operations it seems totally reasonable for CNN model to be as slow as you mentioned.
The convolution operation are much more complex than dense layer. Convolution is the process of adding each element of the image to its local neighbors, weighted by the kernel. Every convolution is essentially a multiple nested loop. This means that dense layer needs a fraction of the time respect to convolutional layers. Wikipedia has an enlightening example of the convolution operation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With