Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix this strange error: "RuntimeError: CUDA error: out of memory"

I successfully trained the network but got this error during validation:

RuntimeError: CUDA error: out of memory

like image 279
xiaoding chen Avatar asked Jan 26 '19 01:01

xiaoding chen


People also ask

Why is cuda out of memory?

In my model, it appears that “cuda runtime error(2): out of memory” is occurring due to a GPU memory drain. Because PyTorch typically manages large amounts of data, failure to recognize small errors can cause your program to crash to the ground without all its GPU being available.

How do I free up graphics memory?

Adjust paging file settings for the game drive Click the Advanced tab and now click Settings under the Performance category. Open the Advanced tab in the Performance Options window that just opened. Under the Virtual Memory category, click Change. Select your system drive, and then select System managed size.


2 Answers

The error occurs because you ran out of memory on your GPU.

One way to solve it is to reduce the batch size until your code runs without this error.

like image 143
K. Khanda Avatar answered Oct 16 '22 12:10

K. Khanda


1.. When you only perform validation not training,
you don't need to calculate gradients for forward and backward phase.
In that situation, your code can be located under

with torch.no_grad():
    ...
    net=Net()
    pred_for_validation=net(input)
    ...

Above code doesn't use GPU memory

2.. If you use += operator in your code,
it can accumulate gradient continuously in your gradient graph.
In that case, you need to use float() like following site
https://pytorch.org/docs/stable/notes/faq.html#my-model-reports-cuda-runtime-error-2-out-of-memory

Even if docs guides with float(), in case of me, item() also worked like

entire_loss=0.0
for i in range(100):
    one_loss=loss_function(prediction,label)
    entire_loss+=one_loss.item()

3.. If you use for loop in training code,
data can be sustained until entire for loop ends.
So, in that case, you can explicitly delete variables after performing optimizer.step()

for one_epoch in range(100):
    ...
    optimizer.step()
    del intermediate_variable1,intermediate_variable2,...
like image 32
YoungMin Park Avatar answered Oct 16 '22 11:10

YoungMin Park