Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the reason of taking batch size or number of neurons as a power of 2 in neural networks?

I have seen many tutorials doing this and myself too have been adhering to this standard practice.

When it comes to batch size of a training data, we assign any value in geometric progression starting with 2 like 2,4,8,16,32,64.

Even when selecting the number of neurons in the hidden layers, we assign it the same way. Either of these - 2,4,8,16,32,64,128,256,512,...

What is the core reason behind this? Why does the neural network performs better doing this?

like image 736
Asutosh Panda Avatar asked Mar 03 '23 01:03

Asutosh Panda


1 Answers

If you use NVIDIA GPUs (the most popular choice for deep learning), the choice of channel size for convolutions and fully-connected layers mostly has to do with enabling Tensor cores, which as the name implies are used for efficient Tensor and matrix operations (and therefore for convolutions). To quote the NVIDIA guide on performance for deep learning:

Choose the number of input and output channels to be divisible by 8 to enable Tensor Cores

Similar guidelines are given regarding batch size, however the reason for those is quantization.

like image 150
Ash Avatar answered Mar 05 '23 17:03

Ash