Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to rewrite a tensorflow graph to use CPU for all operations

I've trained a network on a multi GPU and CPU setup, and saved the resulting model as a tensorflow SavedModel. I then have another script which can load the resulting model and run the required ops to make a prediction, i.e., run inference on the model. This works on the same setup that the model was trained on.

However, I need to deploy the model to run on a device with 1 CPU and no GPUs. When I try to run the same script, I get these errors:

InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Cannot assign a device for operation default_policy_1/tower_1/Variable: node default_policy_1/tower_1/Variable (defined at restore.py:56) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device. The requested device appears to be a GPU, but CUDA is not enabled. [[node default_policy_1/tower_1/Variable (defined at restore.py:56) ]]

This looked promising but the code did not change my graph at all, 0 nodes were removed - Remove operation graph tensorflow to run on CPU

In general it doesn't seem wise to just remove every operation that doesn't run on CPU anyway

I've tried wrapping everything in a with tf.device('CPU:0') block, as well as using config = tf.ConfigProto(device_count={'GPU': 0}) but neither changed the error.

Relevant code:

from tensorflow.python.saved_model import loader

input_tensor_key_feed_dict = {'observations': np.array([[23]]), 'prev_action': np.array([0]),
                              'prev_reward': np.array([0]), 'is_training': False}

config = tf.ConfigProto(device_count={'GPU': 0})
with tf.device('CPU:0'):
    with session.Session(None, graph=ops_lib.Graph(), config=config) as sess:
        
        loader.load(sess, tag_set.split(','), saved_model_dir) #error occurs here
        
        outputs = sess.run(output_tensor_names_sorted, feed_dict=inputs_feed_dict)
        for i, output in enumerate(outputs):
            output_tensor_key = output_tensor_keys_sorted[i]
            print('Result for output key %s:\t%s' % (output_tensor_key, output))
like image 748
Marcin Kozłowski Avatar asked Nov 06 '22 17:11

Marcin Kozłowski


1 Answers

I would initialize a new model without device specifications and then load only the model variables as it would be a standard training checkpoint with tf.Saver(). At this point you should be able to save a version of your SavedModel for which tensorflow can decide where to place the ops.

like image 112
EdoardoG Avatar answered Nov 14 '22 15:11

EdoardoG