I am apologizing in advance if this issue seems to basic, but I am new to Tensorflow and appreciate any help.
I find that I have to frequently keep rebooting my computer to be able to load models such as VGG16 from keras.applications. I have a fairly high-end machine with 4 GeForce GTX 1080 Ti GPUs and Intel® Core™ i7-6850K CPU @ 3.60GHz × 12 for my CPU and use it only for Tensorflow (through Keras).
As soon as I reboot I will be able to successfully load models (such as VGG16) and train on large training datasets. But, if I let my computer sit idle for a while and rerun the same program, I will get a resource exhausted message (OOM) which can be fixed by rebooting my computer again. It is extremely frustrating to keep rebooting my computer every couple of hours. Does anyone know what's going on and how to solve this issue?
If you have batch size > 1, try to use lower batch size, which could lower the memory requirements gor GPU.
Also, if you end with working with the network, check the GPU memory by nvidia-smi
, if it was released or not. If not, kill the process which loaded the network (usually some python interpreter).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With