I'm using a GPU on Google Colab to run some deep learning code.
I have got 70% of the way through the training, but now I keep getting the following error:
RuntimeError: CUDA out of memory. Tried to allocate 2.56 GiB (GPU 0; 15.90 GiB total capacity; 10.38 GiB already allocated; 1.83 GiB free; 2.99 GiB cached)
I'm trying to understand what this means. Is it talking about RAM memory? If so, the code should just run the same as is has been doing shouldn't it? When I try to restart it, the memory message appears immediately. Why would it be using more RAM when I start it today than it did when I started it yesterday or the day before?
Or is this message about hard disk space? I could understand that because the code saves things as it goes on and so the hard disk usage would be cumulative.
Any help would be much appreciated.
So if it's just the GPU running out of memory - could someone explain why the error message says 10.38 GiB already allocated
- how can there be memory already allocated when I start to run something. Could that be being used by someone else? Do I just need to wait and try again later?
Here is a screenshot of the GPU usage when I run the code, just before it runs out of memory:
I found this post in which people seem to be having similar problems. When I run a code suggested on that thread I see:
Gen RAM Free: 12.6 GB | Proc size: 188.8 MB
GPU RAM Free: 16280MB | Used: 0MB | Util 0% | Total 16280MB
which seems to suggest there is 16 GB of RAM free.
I'm confused.
You are getting out of memory in GPU. If you are running a python code, try to run this code before yours. It will show the amount of memory you have. Note that if you try in load images bigger than the total memory, it will fail.
# memory footprint support libraries/code
!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip install gputil
!pip install psutil
!pip install humanize
import psutil
import humanize
import os
import GPUtil as GPU
GPUs = GPU.getGPUs()
# XXX: only one GPU on Colab and isn’t guaranteed
gpu = GPUs[0]
def printm():
process = psutil.Process(os.getpid())
print("Gen RAM Free: " + humanize.naturalsize(psutil.virtual_memory().available), " | Proc size: " + humanize.naturalsize(process.memory_info().rss))
print("GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total {3:.0f}MB".format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal))
printm()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With