I am learning to use Tensorflow for object detection. To speed up the training process, I have taken a AWS g3.16xlarge instance which has 4 GPUs. I am using the following code to run training process:
export CUDA_VISIBLE_DEVICES=0,1,2,3
 python object_detection/train.py --logtostderr --pipeline_config_path=/home/ubuntu/builder/rcnn.config --train_dir=/home/ubuntu/builder/experiments/training/
Inside the rcnn.config - i have set the batch-size = 1.  During runtime I get the following output:
console output
2018-11-09 07:25:50.104310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2018-11-09 07:25:50.104385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 2 3 
2018-11-09 07:25:50.104395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0:   Y N N N 
2018-11-09 07:25:50.104402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1:   N Y N N 
2018-11-09 07:25:50.104409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 2:   N N Y N 
2018-11-09 07:25:50.104416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 3:   N N N Y 
2018-11-09 07:25:50.104429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla M60, pci bus id: 0000:00:1b.0, compute capability: 5.2)
2018-11-09 07:25:50.104439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Tesla M60, pci bus id: 0000:00:1c.0, compute capability: 5.2)
2018-11-09 07:25:50.104446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:2) -> (device: 2, name: Tesla M60, pci bus id: 0000:00:1d.0, compute capability: 5.2)
2018-11-09 07:25:50.104455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:3) -> (device: 3, name: Tesla M60, pci bus id: 0000:00:1e.0, compute capability: 5.2)
When I run nvidia-smi, I get the following output:
nvidia-smi output
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M60           Off  | 0000:00:1B.0     Off |                    0 |
| N/A   52C    P0   129W / 150W |   7382MiB /  7612MiB |     92%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla M60           Off  | 0000:00:1C.0     Off |                    0 |
| N/A   33C    P0    38W / 150W |   7237MiB /  7612MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla M60           Off  | 0000:00:1D.0     Off |                    0 |
| N/A   40C    P0    38W / 150W |   7237MiB /  7612MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla M60           Off  | 0000:00:1E.0     Off |                    0 |
| N/A   34C    P0    39W / 150W |   7237MiB /  7612MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     97860    C   python                                        7378MiB |
|    1     97860    C   python                                        7233MiB |
|    2     97860    C   python                                        7233MiB |
|    3     97860    C   python                                        7233MiB |
+-----------------------------------------------------------------------------+
and **nvidia-smi dmon** provides the following output:
# gpu   pwr  temp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     %     %     %     %   MHz   MHz
    0   158    69    90    69     0     0  2505  1177
    1    38    36     0     0     0     0  2505   556
    2    38    45     0     0     0     0  2505   556
    3    39    37     0     0     0     0  2505   556
I am confused with each of the output.  While I read the console output as the program is recognizing the availability of 4 different gpus, in the nvidia-smi output the volatile GPU-Util percentage is shown only for the first GPU and for the rest it is zero.  However the same table prints memory usage for all the 4 gpu's at the bottom.  And the nvidia-smi dmon prints the sm values only for first gpu and for the others it is zero.  From this blog I understand the zero in dmon indicates that GPU is free.  
What I want to understand is, does the train.py utilizes all the 4 GPU's that I have in my instance. If it is not utilizing all the GPU's how do I ensure the object_detection/train.py of tensorflow is optimized for all the GPU's.  
MirroredStrategy() will use all available GPUs. You can also specify which ones to use if you want, like this: mirrored_strategy = tf. distribute. MirroredStrategy(devices=["/gpu:0", "/gpu:1"]) .
If a TensorFlow operation has both CPU and GPU implementations, TensorFlow will automatically place the operation to run on a GPU device first. If you have more than one GPU, the GPU with the lowest ID will be selected by default. However, TensorFlow does not place operations into multiple GPUs automatically.
Strategy is a TensorFlow API to distribute training across multiple GPUs, multiple machines, or TPUs. Using this API, you can distribute your existing models and training code with minimal code changes. tf.
Check if it's returning list of all GPUs.
tf.test.gpu_device_name()
Returns the name of a GPU device if available or the empty string.
then you can do something like this to use all the available GPUs.
# Creates a graph.
c = []
for d in ['/device:GPU:2', '/device:GPU:3']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
  sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(sum))
You see below output:
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus
id: 0000:02:00.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla K20m, pci bus
id: 0000:03:00.0
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla K20m, pci bus
id: 0000:83:00.0
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla K20m, pci bus
id: 0000:84:00.0
Const_3: /job:localhost/replica:0/task:0/device:GPU:3
Const_2: /job:localhost/replica:0/task:0/device:GPU:3
MatMul_1: /job:localhost/replica:0/task:0/device:GPU:3
Const_1: /job:localhost/replica:0/task:0/device:GPU:2
Const: /job:localhost/replica:0/task:0/device:GPU:2
MatMul: /job:localhost/replica:0/task:0/device:GPU:2
AddN: /job:localhost/replica:0/task:0/cpu:0
[[  44.   56.]
 [  98.  128.]]
Python code to check if GPU is found and available for using with tensorflow:
## Libraries import
import tensorflow as tf
## Test GPU
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))
print('')
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With