I have seen many tutorials doing this and myself too have been adhering to this standard practice.
When it comes to batch size of a training data, we assign any value in geometric progression starting with 2 like 2,4,8,16,32,64.
Even when selecting the number of neurons in the hidden layers, we assign it the same way. Either of these - 2,4,8,16,32,64,128,256,512,...
What is the core reason behind this? Why does the neural network performs better doing this?
If you use NVIDIA GPUs (the most popular choice for deep learning), the choice of channel size for convolutions and fully-connected layers mostly has to do with enabling Tensor cores, which as the name implies are used for efficient Tensor and matrix operations (and therefore for convolutions). To quote the NVIDIA guide on performance for deep learning:
Choose the number of input and output channels to be divisible by 8 to enable Tensor Cores
Similar guidelines are given regarding batch size, however the reason for those is quantization.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With