Caffe: What can I do if only a small batch fits into memory?

Tags:

I am trying to train a very large model. Therefore, I can only fit a very small batch size into GPU memory. Working with small batch sizes results with very noisy gradient estimations.
What can I do to avoid this problem?

267

asked Apr 10 '16 07:04

Shai

1 Answers

You can change the iter_size in the solver parameters. Caffe accumulates gradients over iter_size x batch_size instances in each stochastic gradient descent step. So increasing iter_size can also get more stable gradient when you cannot use large batch_size due to the limited memory.

answered Oct 14 '22 00:10

Liang Xiao

Related questions
                            
                                How to extract unsupervised clusters from a Dirichlet Process in PyMC3?
                            
                                Why use a restricted Boltzmann machine rather than a multi-layer perceptron?
                            
                                How do I set up TensorFlow in the Google cloud?
                            
                                Get weight matrices from gensim word2Vec
                            
                                How to compare ROC AUC scores of different binary classifiers and assess statistical significance in Python? (p-value, confidence interval)
                            
                                Tensorflow: save the model with smallest validation error
                            
                                How to implement multi-class semantic segmentation?
                            
                                BertForSequenceClassification vs. BertForMultipleChoice for sentence multi-class classification
                            
                                Can stop-words be found automatically?
                            
                                Return number of epochs for EarlyStopping callback in Keras
                            
                                Concatenate custom features with CountVectorizer
                            
                                feature normalization- advantage of l2 normalization
                            
                                How to implement dropout in Pytorch, and where to apply it
                            
                                CNN model conditional layer in Keras
                            
                                How does glmnet's standardize argument handle dummy variables?
                            
                                Caffe Multiple Input Images
                            
                                Naive Bayes: Imbalanced Test Dataset
                            
                                What is the default variable initializer in Tensorflow?
                            
                                seasonal decompose in python
                            
                                Custom loss function for U-net in keras using class weights: `class_weight` not supported for 3+ dimensional targets

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Caffe: What can I do if only a small batch fits into memory?

Tags:

machine-learning

neural-network

deep-learning

gradient-descent

caffe

Shai

People also ask

1 Answers

Liang Xiao

Recent Activity

Donate For Us