Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resource Exhausted OOM while loading VGG16

I am apologizing in advance if this issue seems to basic, but I am new to Tensorflow and appreciate any help.

I find that I have to frequently keep rebooting my computer to be able to load models such as VGG16 from keras.applications. I have a fairly high-end machine with 4 GeForce GTX 1080 Ti GPUs and Intel® Core™ i7-6850K CPU @ 3.60GHz × 12 for my CPU and use it only for Tensorflow (through Keras).

As soon as I reboot I will be able to successfully load models (such as VGG16) and train on large training datasets. But, if I let my computer sit idle for a while and rerun the same program, I will get a resource exhausted message (OOM) which can be fixed by rebooting my computer again. It is extremely frustrating to keep rebooting my computer every couple of hours. Does anyone know what's going on and how to solve this issue?

like image 218
Sharanya Arcot Desai Avatar asked Oct 24 '17 20:10

Sharanya Arcot Desai


1 Answers

If you have batch size > 1, try to use lower batch size, which could lower the memory requirements gor GPU.

Also, if you end with working with the network, check the GPU memory by nvidia-smi, if it was released or not. If not, kill the process which loaded the network (usually some python interpreter).

like image 183
Matěj Račinský Avatar answered Oct 14 '22 20:10

Matěj Račinský