I successfully trained the network but got this error during validation:
RuntimeError: CUDA error: out of memory
In my model, it appears that “cuda runtime error(2): out of memory” is occurring due to a GPU memory drain. Because PyTorch typically manages large amounts of data, failure to recognize small errors can cause your program to crash to the ground without all its GPU being available.
Adjust paging file settings for the game drive Click the Advanced tab and now click Settings under the Performance category. Open the Advanced tab in the Performance Options window that just opened. Under the Virtual Memory category, click Change. Select your system drive, and then select System managed size.
The error occurs because you ran out of memory on your GPU.
One way to solve it is to reduce the batch size until your code runs without this error.
1.. When you only perform validation not training,
you don't need to calculate gradients for forward and backward phase.
In that situation, your code can be located under
with torch.no_grad():
...
net=Net()
pred_for_validation=net(input)
...
Above code doesn't use GPU memory
2.. If you use += operator in your code,
it can accumulate gradient continuously in your gradient graph.
In that case, you need to use float() like following site
https://pytorch.org/docs/stable/notes/faq.html#my-model-reports-cuda-runtime-error-2-out-of-memory
Even if docs guides with float(), in case of me, item() also worked like
entire_loss=0.0
for i in range(100):
one_loss=loss_function(prediction,label)
entire_loss+=one_loss.item()
3.. If you use for loop in training code,
data can be sustained until entire for loop ends.
So, in that case, you can explicitly delete variables after performing optimizer.step()
for one_epoch in range(100):
...
optimizer.step()
del intermediate_variable1,intermediate_variable2,...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With