Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does batch normalisation work with a small batch size?

I'm using batch normalization with a batch size of size 10 for face detection, I wanted to know if it is better to remove the batch norm layers or keep them. And if it is better to remove them what can I use instead?

like image 891
hhoomn Avatar asked Jul 02 '19 20:07

hhoomn


2 Answers

This question depends on a few things, first being the depth of your neural network. Batch normalization is useful for increasing the training of your data when there are a lot of hidden layers. It can decrease the number of epochs it takes to train your model and hep regulate your data. By standardizing the inputs to your network, you reduce the risk of chasing a 'moving target', meaning your learning algorithm is not performing as optimally as it could be.

My advice would be to include batch normalization layers in your code if you have a deep neural network. Reminder, you should probably include some Dropout in your layers as well.

Let me know if this helps!

like image 60
Andrew Avatar answered Nov 29 '22 12:11

Andrew


Yes, it works for the smaller size, it will work even with the smallest possible size you set.

The trick is the bach size also adds to the regularization effect, not only the batch norm. I will show you few pics:

bs=10

We are on the same scale tracking the bach loss. The left-hand side is a module without the batch norm layer (black), the right-hand side is with the batch norm layer. Note how the regularization effect is evident even for the bs=10.

bs=64

When we set the bs=64 the batch loss regularization is super evident. Note the y scale is always [0, 4].

My examination was purely on nn.BatchNorm1d(10, affine=False) without learnable parameters gamma and beta i.e. w and b.

This is why when you have low batch size, it has sense to use the BatchNorm layer.

like image 23
prosti Avatar answered Nov 29 '22 12:11

prosti