I have a 4 GPU machine on which I run Tensorflow (GPU) with Keras. Some of my classification problems take several hours to complete. nvidia-smi returns Volatile GPU-Util which never exceeds 25% on any of my 4 GPUs. How can I increase GPU Util% and speed up my training? <img src="https://i.stack.imgur.com/zU6c4.jpg" alt="NVIDIA GPU Util">

If your GPU util is below 80%, this is generally the sign of an input pipeline bottleneck. What this means is that the GPU sits idle much of the time, waiting for the CPU to prepare the data:<img src="https://i.stack.imgur.com/UNNuq.png" alt="enter image description here"> What you want is the CPU to keep preparing batches while the GPU is training to keep the GPU fed. This is called prefetching:<img src="https://i.stack.imgur.com/AvZY8.png" alt="enter image description here"> Great, but if the batch preparation is still way longer than the model training, the GPU will still remain idle, waiting for the CPU to finish the next batch. To make the batch preparation faster we can parallelize the different preprocessing operations: <img src="https://i.stack.imgur.com/iazgI.png" alt="enter image description here"> We can go even further by parallelizing I/O: <img src="https://i.stack.imgur.com/KF0VC.png" alt="enter image description here"> Now to implement this in Keras, you need to use the Tensorflow Data API with Tensorflow version >= 1.9.0. Here is an example: Let's assume, for the sake of this example that you have two numpy arrays x and y. You can use tf.data for any type of data but this is simpler to understand. <pre class="prettyprint"><code>def preprocessing(x, y): # Can only contain TF operations ... return x, y dataset = tf.data.Dataset.from_tensor_slices((x, y)) # Creates a dataset object dataset = dataset.map(preprocessing, num_parallel_calls=64) # parallel preprocessing dataset = dataset.batch(batch_size) dataset = dataset.prefetch(None) # Will automatically prefetch batches .... model = tf.keras.model(...) model.fit(x=dataset) # Since tf 1.9.0 you can pass a dataset object </code></pre> tf.data is very flexible, but as anything in Tensorflow (except eager), it uses a static graph. This can be a pain sometimes but the speed up is worth it. To go further, you can have a look at the performance guide and the Tensorflow data guide.

How to fix low volatile GPU-Util with Tensorflow-GPU and Keras?

1 Answers

If your GPU util is below 80%, this is generally the sign of an input pipeline bottleneck. What this means is that the GPU sits idle much of the time, waiting for the CPU to prepare the data: enter image description here

What you want is the CPU to keep preparing batches while the GPU is training to keep the GPU fed. This is called prefetching: enter image description here

Great, but if the batch preparation is still way longer than the model training, the GPU will still remain idle, waiting for the CPU to finish the next batch. To make the batch preparation faster we can parallelize the different preprocessing operations: enter image description here

We can go even further by parallelizing I/O: enter image description here

Now to implement this in Keras, you need to use the Tensorflow Data API with Tensorflow version >= 1.9.0. Here is an example:

Let's assume, for the sake of this example that you have two numpy arrays x and y. You can use tf.data for any type of data but this is simpler to understand.

def preprocessing(x, y):
     # Can only contain TF operations
     ...
     return x, y

dataset = tf.data.Dataset.from_tensor_slices((x, y)) # Creates a dataset object 
dataset = dataset.map(preprocessing, num_parallel_calls=64) # parallel preprocessing
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(None) # Will automatically prefetch batches

....

model = tf.keras.model(...)
model.fit(x=dataset) # Since tf 1.9.0 you can pass a dataset object

tf.data is very flexible, but as anything in Tensorflow (except eager), it uses a static graph. This can be a pain sometimes but the speed up is worth it.

To go further, you can have a look at the performance guide and the Tensorflow data guide.

103

answered Nov 04 '22 02:11

Olivier Dehaene

Related questions
                            
                                TensorFlow: Restoring variables from from multiple checkpoints
                            
                                How can I compute element-wise conditionals on batches in TensorFlow?
                            
                                What does the error: `Loaded runtime CuDNN library: 5005 but source was compiled with 5103` mean?
                            
                                how to pip install 64 bit packages while having both 64 bit and 32 bit versions?
                            
                                Tensorflow on Docker: How to save the work on Jupyter notebook?
                            
                                Shut down server in TensorFlow
                            
                                Tensorflow while loop : dealing with lists
                            
                                Where Dropout should be inserted.? Fully Connected Layer.? Convolutional Layer.? or Both.? [closed]
                            
                                How to fill a tensor in C++
                            
                                How does "tf.train.replica_device_setter" work?
                            
                                Keras ML library: how to do weight clipping after gradient updates? TensorFlow backend
                            
                                Tensorflow object detection evaluation pycocotools missing
                            
                                How to use repeat() function when building data in Keras?
                            
                                TensorFlow in_top_k evaluation input argumants
                            
                                Convert a graph proto (pb/pbtxt) to a SavedModel for use in TensorFlow Serving or Cloud ML Engine
                            
                                Tensorflow on Raspberry Pi
                            
                                Error when installing Tensorflow - Python 3.8
                            
                                Tensorflow successfully installs on mac but gets ImportError on copyreg when used [closed]
                            
                                on colab - class_weight is causing a ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
                            
                                How do you load, label, and feed jpeg data into Tensorflow?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to fix low volatile GPU-Util with Tensorflow-GPU and Keras?

Tags:

tensorflow

nvidia

keras

multi-gpu

Sharanya Arcot Desai

People also ask

1 Answers

Olivier Dehaene

Recent Activity

Donate For Us