Ordering of batch normalization and dropout?

Usually, Just drop the `Dropout`(when you have `BN`):

"BN eliminates the need for Dropout in some cases cause BN provides similar regularization benefits as Dropout intuitively"
"Architectures like ResNet, DenseNet, etc. not using Dropout

For more details, refer to this paper [Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift] as already mentioned by @Haramoz in the comments.

Conv - Activation - DropOut - BatchNorm - Pool --> Test_loss: 0.04261355847120285

Conv - Activation - DropOut - Pool - BatchNorm --> Test_loss: 0.050065308809280396

Conv - Activation - BatchNorm - Pool - DropOut --> Test_loss: 0.04911309853196144

Conv - Activation - BatchNorm - DropOut - Pool --> Test_loss: 0.06809622049331665

Conv - BatchNorm - Activation - DropOut - Pool --> Test_loss: 0.038886815309524536

Conv - BatchNorm - Activation - Pool - DropOut --> Test_loss: 0.04126095026731491

Conv - BatchNorm - DropOut - Activation - Pool --> Test_loss: 0.05142546817660332

Conv - DropOut - Activation - BatchNorm - Pool --> Test_loss: 0.04827788099646568

Conv - DropOut - Activation - Pool - BatchNorm --> Test_loss: 0.04722036048769951

Conv - DropOut - BatchNorm - Activation - Pool --> Test_loss: 0.03238215297460556

Trained on the MNIST dataset (20 epochs) with 2 convolutional modules (see below), followed each time with

model.add(Flatten())
model.add(layers.Dense(512, activation="elu"))
model.add(layers.Dense(10, activation="softmax"))

The Convolutional layers have a kernel size of (3,3), default padding, the activation is elu. The Pooling is a MaxPooling of the poolside (2,2). Loss is categorical_crossentropy and the optimizer is adam.

The corresponding Dropout probability is 0.2 or 0.3, respectively. The amount of feature maps is 32 or 64, respectively.

Edit: When I dropped the Dropout, as recommended in some answers, it converged faster but had a worse generalization ability than when I use BatchNorm and Dropout.

I found a paper that explains the disharmony between Dropout and Batch Norm(BN). The key idea is what they call the "variance shift". This is due to the fact that dropout has a different behavior between training and testing phases, which shifts the input statistics that BN learns. The main idea can be found in this figure which is taken from this paper. enter image description here

A small demo for this effect can be found in this notebook.

I read the recommended papers in the answer and comments from https://stackoverflow.com/a/40295999/8625228

From Ioffe and Szegedy (2015)’s point of view, only use BN in the network structure. Li et al. (2018) give the statistical and experimental analyses, that there is a variance shift when the practitioners use Dropout before BN. Thus, Li et al. (2018) recommend applying Dropout after all BN layers.

From Ioffe and Szegedy (2015)’s point of view, BN is located inside/before the activation function. However, Chen et al. (2019) use an IC layer which combines dropout and BN, and Chen et al. (2019) recommends use BN after ReLU.

On the safety background, I use Dropout or BN only in the network.

Chen, Guangyong, Pengfei Chen, Yujun Shi, Chang-Yu Hsieh, Benben Liao, and Shengyu Zhang. 2019. “Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks.” CoRR abs/1905.05928. http://arxiv.org/abs/1905.05928.

Ioffe, Sergey, and Christian Szegedy. 2015. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.” CoRR abs/1502.03167. http://arxiv.org/abs/1502.03167.

Li, Xiang, Shuo Chen, Xiaolin Hu, and Jian Yang. 2018. “Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift.” CoRR abs/1801.05134. http://arxiv.org/abs/1801.05134.

Related questions
                            
                                How to scp in Python?
                            
                                Looping over a list in Python
                            
                                Web scraping with Python [closed]
                            
                                Python Threading String Arguments
                            
                                How to efficiently compare two unordered lists (not sets) in Python?
                            
                                Generating matplotlib graphs without a running X server [duplicate]
                            
                                Python extract pattern matches
                            
                                How do I build a numpy array from a generator?
                            
                                "TypeError: (Integer) is not JSON serializable" when serializing JSON in Python?
                            
                                Pandas: sum DataFrame rows for given columns
                            
                                Using multiple arguments for string formatting in Python (e.g., '%s ... %s')
                            
                                What is the reason for performing a double fork when creating a daemon?
                            
                                Add missing dates to pandas dataframe
                            
                                Log exception with traceback in python
                            
                                Convert Unicode to ASCII without errors in Python
                            
                                Replacing column values in a pandas DataFrame
                            
                                Python to print out status bar and percentage
                            
                                Python UTC datetime object's ISO format doesn't include Z (Zulu or Zero offset)
                            
                                Finding the source code for built-in Python functions?
                            
                                Why is printing to stdout so slow? Can it be sped up?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Ordering of batch normalization and dropout?

Tags:

python

neural-network

tensorflow

conv-neural-network

People also ask

Usually, Just drop the `Dropout`(when you have `BN`):

Recent Activity

Donate For Us

Ordering of batch normalization and dropout?

Tags:

python

neural-network

tensorflow

conv-neural-network

People also ask

Usually, Just drop the Dropout(when you have BN):

Related questions

Recent Activity

Donate For Us

Usually, Just drop the `Dropout`(when you have `BN`):