I've trained a network on a multi GPU and CPU setup, and saved the resulting model as a tensorflow SavedModel
. I then have another script which can load the resulting model and run the required ops to make a prediction, i.e., run inference on the model. This works on the same setup that the model was trained on.
However, I need to deploy the model to run on a device with 1 CPU and no GPUs. When I try to run the same script, I get these errors:
InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Cannot assign a device for operation default_policy_1/tower_1/Variable: node default_policy_1/tower_1/Variable (defined at restore.py:56) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device. The requested device appears to be a GPU, but CUDA is not enabled. [[node default_policy_1/tower_1/Variable (defined at restore.py:56) ]]
This looked promising but the code did not change my graph at all, 0 nodes were removed - Remove operation graph tensorflow to run on CPU
In general it doesn't seem wise to just remove every operation that doesn't run on CPU anyway
I've tried wrapping everything in a with tf.device('CPU:0')
block, as well as using config = tf.ConfigProto(device_count={'GPU': 0})
but neither changed the error.
Relevant code:
from tensorflow.python.saved_model import loader
input_tensor_key_feed_dict = {'observations': np.array([[23]]), 'prev_action': np.array([0]),
'prev_reward': np.array([0]), 'is_training': False}
config = tf.ConfigProto(device_count={'GPU': 0})
with tf.device('CPU:0'):
with session.Session(None, graph=ops_lib.Graph(), config=config) as sess:
loader.load(sess, tag_set.split(','), saved_model_dir) #error occurs here
outputs = sess.run(output_tensor_names_sorted, feed_dict=inputs_feed_dict)
for i, output in enumerate(outputs):
output_tensor_key = output_tensor_keys_sorted[i]
print('Result for output key %s:\t%s' % (output_tensor_key, output))
I would initialize a new model without device specifications and then load only the model variables as it would be a standard training checkpoint with tf.Saver(). At this point you should be able to save a version of your SavedModel for which tensorflow can decide where to place the ops.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With