Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In neural networks, why conventionally set number of neurons to 2^n?

For example, when pile up Dense layers, conventionally, we always set the number of neurons as 256 neurons, 128 neurons, 64 neurons, ... and so on.

My question is:

What's the reason for conventionally use 2^n neurons? Will this implementation makes the code runs faster? Saves memory? Or are there any other reasons?

like image 271
o_yeah Avatar asked Sep 18 '25 22:09

o_yeah


1 Answers

It's historical. Early neural network implementations for GPU Computing (written in CUDA, OpenCL etc) had to concern themselves with efficient memory management to do data parallelism.

Generally speaking, you have to align N computations on physical processors. The number of physical processors is usually a power of 2. Therefore, if the number of computations is not a power of 2, the computations can't be mapped 1:1 and have to be moved around, requiring additional memory management (further reading here). This was only relevant for parallel batch processing, i.e. having the batch size as a power of 2 gave you slightly better performance. Interestingly, having other hyperparameters such as the number of hidden units as a power of 2 never had a measurable benefit - I assume as neural networks got more popular, people simply started adapting this practice without knowing why and spreading it to other hyperparameters.

Nowadays, some low-level implementations might still benefit from this convention but if you're using CUDA with Tensorflow or Pytorch in 2020 with a modern GPU architecture, you're very unlikely to encounter any difference between a batch size of 128 and 129 as these systems are highly optimized for very efficient data parallelism.

like image 159
runDOSrun Avatar answered Sep 23 '25 10:09

runDOSrun