Intuition behind Stacking Multiple Conv2D Layers before Dropout in CNN

Tags:

Background:

Tagging TensorFlow since Keras runs on top of it and this is more a general deep learning question.

I have been working on the Kaggle Digit Recognizer problem and used Keras to train CNN models for the task. This model below has the original CNN structure I used for this competition and it performed okay.

def build_model1():
    model = models.Sequential()

    model.add(layers.Conv2D(32, (3, 3), padding="Same" activation="relu", input_shape=[28, 28, 1]))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Dropout(0.25))

    model.add(layers.Conv2D(64, (3, 3), padding="Same", activation="relu"))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Dropout(0.25))

    model.add(layers.Conv2D(64, (3, 3), padding="Same", activation="relu"))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Dropout(0.25))

    model.add(layers.Flatten())
    model.add(layers.Dense(64, activation="relu"))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(10, activation="softmax"))

    return model

Then I read some other notebooks on Kaggle and borrowed another CNN structure (copied below), which works much better than the one above in that it achieved better accuracy, lower error rate, and took many more epochs before overfitting the training data.

def build_model2():
    model = models.Sequential()

    model.add(layers.Conv2D(32, (5, 5),padding ='Same', activation='relu', input_shape = (28, 28, 1)))
    model.add(layers.Conv2D(32, (5, 5),padding = 'Same', activation ='relu'))
    model.add(layers.MaxPool2D((2, 2)))
    model.add(layers.Dropout(0.25))

    model.add(layers.Conv2D(64,(3, 3),padding = 'Same', activation ='relu'))
    model.add(layers.Conv2D(64, (3, 3),padding = 'Same', activation ='relu'))
    model.add(layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
    model.add(layers.Dropout(0.25))

    model.add(layers.Flatten())
    model.add(layers.Dense(256, activation = "relu"))
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(10, activation = "softmax"))

    return model

Question:

Is there any intuition or explanation behind the better performance of the second CNN structure? What is it that makes stacking 2 Conv2D layers better than just using 1 Conv2D layer before max pooling and dropout? Or is there something else that contributes to the result of the second model?

Thank y'all for your time and help.

610

asked Oct 01 '17 17:10

Nahua Kang

1 Answers

The main difference between these two approaches is that the later (2 conv) has more flexibility in expressing non-linear transformations without loosing information. Maxpool removes information from the signal, dropout forces distributed representation, thus both effectively make it harder to propagate information. If, for given problem, highly non-linear transformation has to be applied on raw data, stacking multiple convs (with relu) will make it easier to learn, that's it. Also note that you are comparing a model with 3 max poolings with model with only 2, consequently the second one will potentially loose less information. Another thing is it has way bigger fully connected bit at the end, while the first one is tiny (64 neurons + 0.5 dropout means that you effectively have at most 32 neurons active, that is a tiny layer!). To sum up:

These architectures differe in many aspects, not just stacking conv nets.
Stacking convnets usually leads to less information being lost in processing; see for example "all convolutional" architectures.

199

answered Sep 27 '22 21:09

lejlot

Related questions
                            
                                Handling unassigned (null) values of features in regression (machine learning)?
                            
                                Error in running h2o.ensemble
                            
                                How to use CNN to train input data of different size?
                            
                                RandomForestClassifier was given input with invalid label column error in Apache Spark
                            
                                High-dimensional data structure in Python
                            
                                Is it ok to only use one epoch?
                            
                                Keras ImageDataGenerator setting mean and std
                            
                                Machine Learning: Why xW+b instead of Wx+b?
                            
                                Spark 2.0 ALS Recommendation how to recommend to a user
                            
                                Possible/maybe category in deep learning
                            
                                save binarizer together with sklearn model
                            
                                Define pinball loss function in keras with tensorflow backend
                            
                                Dead simple example of synaptic js lstm rnn algorithm
                            
                                How do I input multiple exogenous variables into a SARIMAX model in statsmodel?
                            
                                How to set parameters to score function in sklearn SelectKBest ()
                            
                                What does graph argument in tf.Session() do?
                            
                                How does data shape change during Conv2D and Dense in Keras?
                            
                                How to get all confusion matrix terminologies (TPR, FPR, TNR, FNR) for a multi class?
                            
                                How do you use TensorFlow Graphkeys to get all weights?
                            
                                Backward Propagation - Gradient error [Python]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Intuition behind Stacking Multiple Conv2D Layers before Dropout in CNN

Tags:

machine-learning

tensorflow

deep-learning

keras

Nahua Kang

People also ask

1 Answers

lejlot

Recent Activity

Donate For Us