Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow: How do you monitor GPU performance during model training in real-time?

I am new to Ubuntu and GPUs and have recently been using a new PC with Ubuntu 16.04 and 4 NVIDIA 1080ti GPUs in our lab. The machine also has an i7 16 core processor.

I have some basic questions:

  1. Tensorflow is installed for GPU. I presume then, that it automatically prioritises GPU usage? If so, does it use all 4 together or does it use 1 and then recruit another if needed?

  2. Can I monitor in real-time, the GPU use/activity during training of a model?

I fully understand this is basic hardware stuff but clear definitive answers to these specific questions would be great.

EDIT:

Based on this output - it this really saying that nearly all the memory on each one of my GPUs is being used?

enter image description here

like image 814
GhostRider Avatar asked Aug 07 '17 10:08

GhostRider


1 Answers

I would suggest nvtop, it shows real-time status and easier to watch than nvidia-smi. It also shows in a graph.

$ sudo apt install nvtop
$ nvtop

enter image description here

like image 198
Zstack Avatar answered Sep 28 '22 11:09

Zstack