I just installed tensorflow for gpu and am using keras for my CNN. During training my GPU is only used about 5%, but 5 out of 6gb of the vram is being used during the training. Sometimes it glitches, prints 0.000000e+00 in the console and the gpu goes to 100% but then after a few seconds the training slows back down to 5%. My GPU is the Zotac gtx 1060 mini and I am using a Ryzen 5 1600x.
Epoch 1/25
121/3860 [..............................] - ETA: 31:42 - loss: 3.0575 - acc: 0.0877 - val_loss: 0.0000e+00 - val_acc: 0.0000e+00Epoch 2/25
121/3860 [..............................] - ETA: 29:48 - loss: 3.0005 - acc: 0.0994 - val_loss: 0.0000e+00 - val_acc: 0.0000e+00Epoch 3/25
36/3860 [..............................] - ETA: 24:47 - loss: 2.9863 - acc: 0.1024
TensorFlow code, and tf.keras models will transparently run on a single GPU with no code changes required. Note: Use tf.config.list_physical_devices ('GPU') to confirm that TensorFlow is using the GPU.
Keras gets installed automatically when you install TensorFlow so there is no need to install it separately. Keras is a high-level deep learning API to build and train all kinds of neural networks which uses TensorFlow as a backend to perform the heavy computations required by neural networks. There are two ways to install TensorFlow.
If you experience this kind of staggering of GPU kernels in your program’s trace view, the recommended action is to: Set the TensorFlow environment variable TF_GPU_THREAD_MODE to gpu_private. This environment variable will tell the host to keep threads for a GPU private.
You can use TensorBoard's GPU kernel stats to visualize which GPU kernels are Tensor Core-eligible, and which kernels are using Tensor Cores. Enabling fp16 (see Enabling Mixed Precision section below) is one way to make your program’s General Matrix Multiply (GEMM) kernels (matmul ops) utilize the Tensor Core.
Usually, we want the bottleneck to be on the GPU (hence 100% utilization). If that's not happening, some other part of your code is taking a long time during each batch processing. It's hard to say what is it (specialy because you didn't add any code), but there's a few things you can try:
1. input data
Make sure the input data for your network is always available. Reading images from disk takes a long time, so use multiple workers
and the multiprocessing
interface:
model.fit(..., use_multiprocessing=True, workers=8)
2. Force the model into the GPU
This is hardly the problem, because /gpu:0
is the default device, but it's worth to make sure you are executing the model in the intended device:
with tf.device('/gpu:0'):
x = Input(...)
y = Conv2D(..)
model = Model(x, y)
2. Check the model's size
If your batch size is large and allowed soft placement, parts of your network (which didn't fit in the GPU's memory) might be placed at the CPU. This considerably slows down the process.
If soft placement is on, try to disable and check if a memory error is thrown:
# make sure soft-placement is off
tf_config = tf.ConfigProto(allow_soft_placement=False)
tf_config.gpu_options.allow_growth = True
s = tf.Session(config=tf_config)
K.set_session(s)
with tf.device(...):
...
model.fit(...)
If that's the case, try to reduce the batch size until it fits and give you good GPU usage. Then turn soft placement on again.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With