I think it's a pretty common message for PyTorch users with low GPU memory: <pre class="prettyprint"><code>RuntimeError: CUDA out of memory. Tried to allocate 😊 MiB (GPU 😊; 😊 GiB total capacity; 😊 GiB already allocated; 😊 MiB free; 😊 cached) </code></pre> I tried to process an image by loading each layer to GPU and then loading it back: <pre class="prettyprint lang-py prettyprint-override"><code>for m in self.children(): m.cuda() x = m(x) m.cpu() torch.cuda.empty_cache() </code></pre> But it doesn't seem to be very effective. I'm wondering is there any tips and tricks to train large deep learning models while using little GPU memory.

Although <pre class="prettyprint"><code>import torch torch.cuda.empty_cache() </code></pre> provides a good alternative for clearing the occupied cuda memory and we can also manually clear the not in use variables by using, <pre class="prettyprint"><code>import gc del variables gc.collect() </code></pre> But still after using these commands, the error might appear again because pytorch doesn't actually clears the memory instead clears the reference to the memory occupied by the variables. So reducing the batch_size after restarting the kernel and finding the optimum batch_size is the best possible option (but sometimes not a very feasible one). Another way to get a deeper insight into the alloaction of memory in gpu is to use: <pre class="prettyprint"><code>torch.cuda.memory_summary(device=None, abbreviated=False) </code></pre> wherein, both the arguments are optional. This gives a readable summary of memory allocation and allows you to figure the reason of CUDA running out of memory and restart the kernel to avoid the error from happening again (Just like I did in my case). Passing the data iteratively might help but changing the size of layers of your network or breaking them down would also prove effective (as sometimes the model also occupies a significant memory for example, while doing transfer learning).

Try not drag your grads too far. I got the same error when I tried to sum up loss in all batches. <pre class="prettyprint"><code>loss = self.criterion(pred, label) total_loss += loss </code></pre> Then I use loss.item instead of loss which requires grads, then solved the problem <pre class="prettyprint"><code>loss = self.criterion(pred, label) total_loss += loss.item() </code></pre> The solution below is credited to yuval reina in the kaggle question <blockquote> This error is related to the GPU memory and not the general memory => @cjinny comment might not work. Do you use TensorFlow/Keras or Pytorch? Try using a smaller batch size. If you use Keras, Try to decrease some of the hidden layer sizes. If you use Pytorch: do you keep all the training data on the GPU all the time? make sure you don't drag the grads too far check the sizes of you hidden layer </blockquote>

How to avoid "CUDA out of memory" in PyTorch

Tags:

python

deep-learning

pytorch

object-detection

low-memory

I think it's a pretty common message for PyTorch users with low GPU memory:

RuntimeError: CUDA out of memory. Tried to allocate 😊 MiB (GPU 😊; 😊 GiB total capacity; 😊 GiB already allocated; 😊 MiB free; 😊 cached)

I tried to process an image by loading each layer to GPU and then loading it back:

for m in self.children():
    m.cuda()
    x = m(x)
    m.cpu()
    torch.cuda.empty_cache()

But it doesn't seem to be very effective. I'm wondering is there any tips and tricks to train large deep learning models while using little GPU memory.

816

asked Dec 01 '19 20:12

voilalex

Video Answer

5 Answers

Although

import torch
torch.cuda.empty_cache()

provides a good alternative for clearing the occupied cuda memory and we can also manually clear the not in use variables by using,

import gc
del variables
gc.collect()

But still after using these commands, the error might appear again because pytorch doesn't actually clears the memory instead clears the reference to the memory occupied by the variables. So reducing the batch_size after restarting the kernel and finding the optimum batch_size is the best possible option (but sometimes not a very feasible one).

Another way to get a deeper insight into the alloaction of memory in gpu is to use:

torch.cuda.memory_summary(device=None, abbreviated=False)

wherein, both the arguments are optional. This gives a readable summary of memory allocation and allows you to figure the reason of CUDA running out of memory and restart the kernel to avoid the error from happening again (Just like I did in my case).

Passing the data iteratively might help but changing the size of layers of your network or breaking them down would also prove effective (as sometimes the model also occupies a significant memory for example, while doing transfer learning).

answered Oct 17 '22 19:10

SHAGUN SHARMA

Just reduce the batch size, and it will work. While I was training, it gave following error:

CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 10.76 GiB total capacity; 4.29 GiB already allocated; 10.12 MiB free; 4.46 GiB reserved in total by PyTorch)

And I was using batch size of 32. So I just changed it to 15 and it worked for me.

answered Oct 17 '22 18:10

Rahul

Send the batches to CUDA iteratively, and make small batch sizes. Don't send all your data to CUDA at once in the beginning. Rather, do it as follows:

for e in range(epochs):
    for images, labels in train_loader:   
        if torch.cuda.is_available():
            images, labels = images.cuda(), labels.cuda()   
        # blablabla

You can also use dtypes that use less memory. For instance, torch.float16 or torch.half.

answered Oct 17 '22 17:10

Nicolas Gervais

Try not drag your grads too far.

I got the same error when I tried to sum up loss in all batches.

loss =  self.criterion(pred, label)

total_loss += loss

Then I use loss.item instead of loss which requires grads, then solved the problem

loss =  self.criterion(pred, label)

total_loss += loss.item()

The solution below is credited to yuval reina in the kaggle question

This error is related to the GPU memory and not the general memory => @cjinny comment might not work.
Do you use TensorFlow/Keras or Pytorch?
Try using a smaller batch size.
If you use Keras, Try to decrease some of the hidden layer sizes.
If you use Pytorch:
do you keep all the training data on the GPU all the time?
make sure you don't drag the grads too far
check the sizes of you hidden layer

answered Oct 17 '22 18:10

pandas007

Most things are covered, still will add a little.

If torch gives error as "Tried to allocate 2 MiB" etc. it is a mis-leading message. Actually, CUDA runs out of total memory required to train the model. You can reduce the batch size. Say, even if batch size of 1 is not working (happens when you train NLP models with massive sequences), try to pass lesser data, this will help you confirm that your GPU does not have enough memory to train the model.

Also, Garbage collection and cleaning cache part has to be done again, if you want to re-train the model.