Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GPU only being used 1-5% Tensorflow-gpu and Keras

I just installed tensorflow for gpu and am using keras for my CNN. During training my GPU is only used about 5%, but 5 out of 6gb of the vram is being used during the training. Sometimes it glitches, prints 0.000000e+00 in the console and the gpu goes to 100% but then after a few seconds the training slows back down to 5%. My GPU is the Zotac gtx 1060 mini and I am using a Ryzen 5 1600x.

Epoch 1/25
 121/3860 [..............................] - ETA: 31:42 - loss: 3.0575 - acc: 0.0877 - val_loss: 0.0000e+00 - val_acc: 0.0000e+00Epoch 2/25
 121/3860 [..............................] - ETA: 29:48 - loss: 3.0005 - acc: 0.0994 - val_loss: 0.0000e+00 - val_acc: 0.0000e+00Epoch 3/25
  36/3860 [..............................] - ETA: 24:47 - loss: 2.9863 - acc: 0.1024
like image 668
Thijs van der Heijden Avatar asked Nov 23 '17 19:11

Thijs van der Heijden


People also ask

Can TensorFlow models run on a single GPU?

TensorFlow code, and tf.keras models will transparently run on a single GPU with no code changes required. Note: Use tf.config.list_physical_devices ('GPU') to confirm that TensorFlow is using the GPU.

How do I install keras with TensorFlow?

Keras gets installed automatically when you install TensorFlow so there is no need to install it separately. Keras is a high-level deep learning API to build and train all kinds of neural networks which uses TensorFlow as a backend to perform the heavy computations required by neural networks. There are two ways to install TensorFlow.

How do I fix GPU staggering in TensorFlow?

If you experience this kind of staggering of GPU kernels in your program’s trace view, the recommended action is to: Set the TensorFlow environment variable TF_GPU_THREAD_MODE to gpu_private. This environment variable will tell the host to keep threads for a GPU private.

How can I use tensorboard's GPU kernel stats to identify Tensor cores?

You can use TensorBoard's GPU kernel stats to visualize which GPU kernels are Tensor Core-eligible, and which kernels are using Tensor Cores. Enabling fp16 (see Enabling Mixed Precision section below) is one way to make your program’s General Matrix Multiply (GEMM) kernels (matmul ops) utilize the Tensor Core.


Video Answer


1 Answers

Usually, we want the bottleneck to be on the GPU (hence 100% utilization). If that's not happening, some other part of your code is taking a long time during each batch processing. It's hard to say what is it (specialy because you didn't add any code), but there's a few things you can try:

1. input data

Make sure the input data for your network is always available. Reading images from disk takes a long time, so use multiple workers and the multiprocessing interface:

model.fit(..., use_multiprocessing=True, workers=8)

2. Force the model into the GPU

This is hardly the problem, because /gpu:0 is the default device, but it's worth to make sure you are executing the model in the intended device:

with tf.device('/gpu:0'):
    x = Input(...)
    y = Conv2D(..)
    model = Model(x, y)

2. Check the model's size

If your batch size is large and allowed soft placement, parts of your network (which didn't fit in the GPU's memory) might be placed at the CPU. This considerably slows down the process.

If soft placement is on, try to disable and check if a memory error is thrown:

# make sure soft-placement is off
tf_config = tf.ConfigProto(allow_soft_placement=False)
tf_config.gpu_options.allow_growth = True
s = tf.Session(config=tf_config)
K.set_session(s)

with tf.device(...):
    ...

model.fit(...)

If that's the case, try to reduce the batch size until it fits and give you good GPU usage. Then turn soft placement on again.

like image 179
ldavid Avatar answered Oct 14 '22 05:10

ldavid