I'm using batch normalization with a batch size of size 10 for face detection, I wanted to know if it is better to remove the batch norm layers or keep them. And if it is better to remove them what can I use instead?
This question depends on a few things, first being the depth of your neural network. Batch normalization is useful for increasing the training of your data when there are a lot of hidden layers. It can decrease the number of epochs it takes to train your model and hep regulate your data. By standardizing the inputs to your network, you reduce the risk of chasing a 'moving target', meaning your learning algorithm is not performing as optimally as it could be.
My advice would be to include batch normalization layers in your code if you have a deep neural network. Reminder, you should probably include some Dropout in your layers as well.
Let me know if this helps!
Yes, it works for the smaller size, it will work even with the smallest possible size you set.
The trick is the bach size also adds to the regularization effect, not only the batch norm. I will show you few pics:
We are on the same scale tracking the bach loss. The left-hand side is a module without the batch norm layer (black), the right-hand side is with the batch norm layer.
Note how the regularization effect is evident even for the bs=10
.
When we set the bs=64
the batch loss regularization is super evident. Note the y
scale is always [0, 4]
.
My examination was purely on nn.BatchNorm1d(10, affine=False)
without learnable parameters gamma
and beta
i.e. w
and b
.
This is why when you have low batch size, it has sense to use the BatchNorm layer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With