Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow: Saving/Restoring session, checkpoint, metagraph

I have been trying to restore a model in tensorflow, however I have been encountering some issues when I try to import a metagraph:

This is my code for importing the metagraph:

#Create a clean graph and import MetaGraphDef nodes
new_graph = tf.Graph()
with tf.Session(graph=new_graph) as sess:
    # Import the previously exported metagraph
    saver = tf.train.import_meta_graph('/tmp/saver-model.meta')
    saver.restore(sess, tf.train.latest_checkpoint('./'))

In my Model class I have specified the placeholders and collection as follows:

    """Place Holders"""
    self.input = tf.placeholder(tf.float32, [None, sl], name = 'input')
    self.labels = tf.placeholder(tf.int64, [None], name = 'labels')
    self.keep_prob = tf.placeholder("float", name= 'Drop_out_keep_prob')
    tf.add_to_collection('vars', self.input)
    tf.add_to_collection('vars', self.labels)
    tf.add_to_collection('vars', self.keep_prob)

I train my model as follows:

saver = tf.train.Saver(tf.global_variables())
# Session time
sess = tf.Session() # without context manager, close the session later.
writer = tf.summary.FileWriter("/tmp/model/log_tb", sess.graph) # Writer for tensorboard
sess.run(model.init_op)

self.init_op = tf.global_variables_initializer()

And exported using these three different options, including the undocumented export_scoped_meta_graph:

# Export the model to /tmp/my-model.meta.
scoped_meta = meta_graph.export_scoped_meta_graph(filename='/tmp/scoped.meta')
meta_graph_def = tf.train.export_meta_graph(filename='/tmp/my-model.meta')
saver.save(sess, '/tmp/saver-model')

This is the error I get when attempting to run under Windows 10:

E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "BestSplits" device_type: "CPU"') for unknown op: BestSplits
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "CountExtremelyRandomStats" device_type: "CPU"') for unknown op: CountExtremelyRandomStats
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "FinishedNodes" device_type: "CPU"') for unknown op: FinishedNodes
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "GrowTree" device_type: "CPU"') for unknown op: GrowTree
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "ReinterpretStringToFloat" device_type: "CPU"') for unknown op: ReinterpretStringToFloat
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "SampleInputs" device_type: "CPU"') for unknown op: SampleInputs
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "ScatterAddNdim" device_type: "CPU"') for unknown op: ScatterAddNdim
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "TopNInsert" device_type: "CPU"') for unknown op: TopNInsert
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "TopNRemove" device_type: "CPU"') for unknown op: TopNRemove
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "TreePredictions" device_type: "CPU"') for unknown op: TreePredictions
E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "UpdateFertileSlots" device_type: "CPU"') for unknown op: UpdateFertileSlots
TypeError: expected bytes, NoneType found

During handling of the above exception, another exception occurred:


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
TypeError: expected bytes, NoneType found

During handling of the above exception, another exception occurred:

SystemError                               Traceback (most recent call last)
<ipython-input-37-60792895b01c> in <module>()
      6     #saver = tf.train.import_meta_graph('/tmp/saver-model.meta')
      7     saver = tf.train.import_meta_graph('/tmp/my-model.meta')
----> 8     saver.restore(sess, tf.train.latest_checkpoint('./'))

c:\users\carlos\anaconda3\lib\site-packages\tensorflow\python\training\saver.py in restore(self, sess, save_path)
   1437       return
   1438     sess.run(self.saver_def.restore_op_name,
-> 1439              {self.saver_def.filename_tensor_name: save_path})
   1440 
   1441   @staticmethod

c:\users\carlos\anaconda3\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
    765     try:
    766       result = self._run(None, fetches, feed_dict, options_ptr,
--> 767                          run_metadata_ptr)
    768       if run_metadata:
    769         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

c:\users\carlos\anaconda3\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
    963     if final_fetches or final_targets:
    964       results = self._do_run(handle, final_targets, final_fetches,
--> 965                              feed_dict_string, options, run_metadata)
    966     else:
    967       results = []

c:\users\carlos\anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1013     if handle is None:
   1014       return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
-> 1015                            target_list, options, run_metadata)
   1016     else:
   1017       return self._do_call(_prun_fn, self._session, handle, feed_dict,

c:\users\carlos\anaconda3\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
   1020   def _do_call(self, fn, *args):
   1021     try:
-> 1022       return fn(*args)
   1023     except errors.OpError as e:
   1024       message = compat.as_text(e.message)

c:\users\carlos\anaconda3\lib\site-packages\tensorflow\python\client\session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1002         return tf_session.TF_Run(session, options,
   1003                                  feed_dict, fetch_list, target_list,
-> 1004                                  status, run_metadata)
   1005 
   1006     def _prun_fn(session, handle, feed_dict, fetch_list):

SystemError: <built-in function TF_Run> returned a result with an error set

When attempting to run this under debian:

I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1:   Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN X, pci bus id: 0000:02:00.0)
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1022, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
    status, run_metadata)
  File "/usr/lib/python3.4/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Unable to get element from the feed as bytes.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/training/saver.py", line 1439, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Unable to get element from the feed as bytes.
like image 498
SerialDev Avatar asked Mar 16 '17 10:03

SerialDev


People also ask

What does TensorFlow use to save and restore model parameters on the disk?

One new approach to saving and restoring a model in TensorFlow is to use the SavedModel, builder, and loader functionality. This actually wraps the Saver class in order to provide a higher-level serialization, which is more suitable for production purposes.

What does TF train saver do?

The Saver class adds ops to save and restore variables to and from checkpoints. It also provides convenience methods to run these ops. Checkpoints are binary files in a proprietary format which map variable names to tensor values. The best way to examine the contents of a checkpoint is to load it using a Saver .


1 Answers

I managed to solve it and decided to share in case someone comes accross this in the future:

Add all the placeholder to collections:

tf.add_to_collection('vars', input)
tf.add_to_collection('vars', labels)
tf.add_to_collection('vars', keep_prob)

merge and initialize variables independently (avoid using tf.global_variables_initializer()):

merged = tf.summary.merge([loss_summ, cost_summ, tloss_summ, acc_summ])

save the model during training:

if i%100 == 0:
    saver.save(sess, save_dir + 'model.ckpt', global_step=i+100)

Initialize a new metagraph, include the saver prior to importing the metagraph into the new session:

this will prevent saver.saver_def.filename_tensor_name error

The name 'save/Const:0' refers to a Tensor which does not exist

This is because:

* The default name scope for a tf.train.Saver is "save/" and the placeholder  
 is actually a tf.constant() whose name defaults to "Const:0", which explains  
 why the flag defaults to "save/Const:0".



saver = tf.train.Saver()
sess = tf.Session()
sess.run(init_op)

Get the checkpoint using tf.train.get_checkpoint_state():

sess =tf.Session()
ckpt = tf.train.get_checkpoint_state(save_dir)
saver.restore(sess, ckpt.model_checkpoint_path)
like image 111
SerialDev Avatar answered Sep 27 '22 22:09

SerialDev