I'm using a nvidia GTX1080 gpu(8GB) to run Inception model on tensorflow, when I set batch_size = 16 and image_size = 400, then after I start the program, my ubuntu14.04 will auto reboot.
I tracked the issue down to a faulty power supply. It had enough capacity according to spec, and limiting GPU power consumption by running "nvidia-smi -pl 150" didn't help at all. Probably it couldn't handle bursts in power consumption.
Anyway, after I changed the power supply from "Corsair CX750 Builder Series ATX 80 PLUS" to "Cooler Master V1000", the issue is gone.
See details of my investigation in the TensorFlow GitHub issue.
Make sure it is not a power supply unit problem. I was observing strange occasional reboots on my development machine. As I was increasing the size of input (batch size, larger NN) the rate of reboots was increasing as well. Turned out to be a PSU problem. A quick check is to limit GPU power consumption and see if this behavior will go away. For instance, you can limit power to about 150 watts with this command (you'll need a sudo rights):
sudo nvidia-smi -pl 150
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With