I used this script to train a model & predict on a machine with GPU installed and enabled and it seems that it's using only the CPU in the prediction stage.
The device placement log I'm seeing during the .predict()
part is the following:
2020-09-01 06:08:19.085400: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op RangeDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.085617: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op RepeatDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.089558: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op MapDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.090003: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op PrefetchDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.097064: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op FlatMapDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.097647: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op TensorDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.097802: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op RepeatDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.097957: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op ZipDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.101284: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op ParallelMapDataset in device /job:localhost/replica:0/task:0/device:CPU:0
2020-09-01 06:08:19.101865: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op ModelDataset in device /job:localhost/replica:0/task:0/device:CPU:0
even though that when I run:
print(tf.config.experimental.list_physical_devices('GPU'))
I receive:
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU')]
The code I used can be found here. The full output logs can be seen here.
More context:
Python: 3.7.7
Tensorflow: 2.1.0
GPU: Nvidia Tesla V100-PCIE-16GB
CPU: Intel Xeon Gold 5218 CPU @ 2.30GHz
RAM: 394851272 KB
OS: Linux
keras models will transparently run on a single GPU with no code changes required.
If you use a TensorFlow, it handles compute resources (CPU, GPU) for you. If you load a model and call predict, TensorFlow uses the compute resources to make these predictions.
Since you already have a GPU, I assume that tf.test.is_gpu_available()
returns True
. You can use this piece of code to force TensorFlow
to use a specific device-
with tf.device('/gpu:0'):
// GPU stuff
This also works if you want to force it to use a CPU instead for some part of the code-
with tf.device('/cpu:0'):
// CPU stuff
An addon which might be helpful while using tf.device()
, you can use this function to list all the devices you have-
def get_available_devices():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos]
get_available_devices()
Though for the use-case you mentioned, I do not guarantee faster inferences with a GPU.
Sounds like you need to use a Distributed Strategy
per the docs. Your code would then become something like the following:
tf.debugging.set_log_device_placement(True)
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = keras.Sequential(
[
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10)
]
)
model.compile(
optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy']
)
model.fit(train_images, train_labels, epochs=10)
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
probability_model = tf.keras.Sequential(
[model, tf.keras.layers.Softmax()]
)
probability_model.predict(test_images)
Per the documentation, The best practice for using multiple GPUs is to use tf.distribute.Strategy.
Your predict function is using GPU. And, I have recalculated timing on NVIDIA 1080 GTX with your code & it is taking 100 ms for inference.
Either reboot the system or check if GPU is getting utilised or not.
Here is the line of your code stating inference is run on GPU:
2020-09-01 06:19:15.885778: I tensorflow/core/common_runtime/eager/execute.cc:573] Executing op __inference_distributed_function_58022 in device /job:localhost/replica:0/task:0/device:GPU:0
Are you using the correct tensorflow package? It could help to uninstall tensorflow and install tensorflow-gpu instead.
For documentation see: https://www.tensorflow.org/install/gpu
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With