Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caffe: What can I do if only a small batch fits into memory?

I am trying to train a very large model. Therefore, I can only fit a very small batch size into GPU memory. Working with small batch sizes results with very noisy gradient estimations.
What can I do to avoid this problem?

like image 267
Shai Avatar asked Apr 10 '16 07:04

Shai


People also ask

What is the effect of setting a small batch size?

Minibatch Gradient Descent. Smaller batch sizes are used for two main reasons: Smaller batch sizes are noisy, offering a regularizing effect and lower generalization error. Smaller batch sizes make it easier to fit one batch worth of training data in memory (i.e. when using a GPU).

Does small batch size cause Overfitting?

I have been playing with different values and observed that lower batch size values lead to overfitting. You can see the validation loss starts to increase after 10 epochs indicating the model starts to overfit.

What should be the optimal batch size?

In practical terms, to determine the optimum batch size, we recommend trying smaller batch sizes first(usually 32 or 64), also keeping in mind that small batch sizes require small learning rates. The number of batch sizes should be a power of 2 to take full advantage of the GPUs processing.

What is a good batch size for neural network?

Results Of Small vs Large Batch Sizes On Neural Network Training. From the validation metrics, the models trained with small batch sizes generalize well on the validation set. The batch size of 32 gave us the best result.


1 Answers

You can change the iter_size in the solver parameters. Caffe accumulates gradients over iter_size x batch_size instances in each stochastic gradient descent step. So increasing iter_size can also get more stable gradient when you cannot use large batch_size due to the limited memory.

like image 62
Liang Xiao Avatar answered Oct 14 '22 00:10

Liang Xiao