Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Restoring TensorFlow model

I'm trying to restore TensorFlow model. I followed this example: http://nasdag.github.io/blog/2016/01/19/classifying-bees-with-google-tensorflow/

At the end of the code in the example I added these lines:

saver = tf.train.Saver()
save_path = saver.save(sess, "model.ckpt")
print("Model saved in file: %s" % save_path)

Two files were created: checkpoint and model.ckpt.

In a new python file (tomas_bees_predict.py), I have this code:

import tensorflow as tf

saver = tf.train.Saver()

with tf.Session() as sess:
  # Restore variables from disk.
  saver.restore(sess, "model.ckpt")
  print("Model restored.")

However when I execute the code, I get this error:

Traceback (most recent call last):
  File "tomas_bees_predict.py", line 3, in <module>
    saver = tf.train.Saver()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 705, in __init__
raise ValueError("No variables to save")

ValueError: No variables to save

Is there a way to read mode.ckpt file and see what variables are saved? Or maybe someone can help with saving the model and restoring it based on the example described above?

EDIT 1:

I think I tried running the same code in order to recreate model structure and I was getting the error. I think it could be related to the fact that code described here isn't using named variables: http://nasdag.github.io/blog/2016/01/19/classifying-bees-with-google-tensorflow/

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

So I did this experiment. I wrote two versions of the code (with and without named variables) to save the model and the code to restore the model.

tensor_save_named_vars.py:

import tensorflow as tf

# Create some variables.
v1 = tf.Variable(1, name="v1")
v2 = tf.Variable(2, name="v2")

# Add an op to initialize the variables.
init_op = tf.initialize_all_variables()

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, initialize the variables, do some work, save the
# variables to disk.
with tf.Session() as sess:
  sess.run(init_op)
  print "v1 = ", v1.eval()
  print "v2 = ", v2.eval()
  # Save the variables to disk.
  save_path = saver.save(sess, "/tmp/model.ckpt")
  print "Model saved in file: ", save_path

tensor_save_not_named_vars.py:

import tensorflow as tf

# Create some variables.
v1 = tf.Variable(1)
v2 = tf.Variable(2)

# Add an op to initialize the variables.
init_op = tf.initialize_all_variables()

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, initialize the variables, do some work, save the
# variables to disk.
with tf.Session() as sess:
  sess.run(init_op)
  print "v1 = ", v1.eval()
  print "v2 = ", v2.eval()
  # Save the variables to disk.
  save_path = saver.save(sess, "/tmp/model.ckpt")
  print "Model saved in file: ", save_path

tensor_restore.py:

import tensorflow as tf

# Create some variables.
v1 = tf.Variable(0, name="v1")
v2 = tf.Variable(0, name="v2")

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
  # Restore variables from disk.
  saver.restore(sess, "/tmp/model.ckpt")
  print "Model restored."
  print "v1 = ", v1.eval()
  print "v2 = ", v2.eval()

Here is what I get when I execute this code:

$ python tensor_save_named_vars.py 

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
v1 =  1
v2 =  2
Model saved in file:  /tmp/model.ckpt

$ python tensor_restore.py 

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
Model restored.
v1 =  1
v2 =  2

$ python tensor_save_not_named_vars.py 

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
v1 =  1
v2 =  2
Model saved in file:  /tmp/model.ckpt

$ python tensor_restore.py 
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
W tensorflow/core/common_runtime/executor.cc:1076] 0x7ff953881e40 Compute status: Not found: Tensor name "v2" not found in checkpoint files /tmp/model.ckpt
     [[Node: save/restore_slice_1 = RestoreSlice[dt=DT_INT32, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice_1/tensor_name, save/restore_slice_1/shape_and_slice)]]
W tensorflow/core/common_runtime/executor.cc:1076] 0x7ff953881e40 Compute status: Not found: Tensor name "v1" not found in checkpoint files /tmp/model.ckpt
     [[Node: save/restore_slice = RestoreSlice[dt=DT_INT32, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice/tensor_name, save/restore_slice/shape_and_slice)]]
Traceback (most recent call last):
  File "tensor_restore.py", line 14, in <module>
    saver.restore(sess, "/tmp/model.ckpt")
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 891, in restore
    sess.run([self._restore_op_name], {self._filename_tensor_name: save_path})
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 368, in run
    results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 444, in _do_run
    e.code)
tensorflow.python.framework.errors.NotFoundError: Tensor name "v2" not found in checkpoint files /tmp/model.ckpt
     [[Node: save/restore_slice_1 = RestoreSlice[dt=DT_INT32, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice_1/tensor_name, save/restore_slice_1/shape_and_slice)]]
Caused by op u'save/restore_slice_1', defined at:
  File "tensor_restore.py", line 8, in <module>
    saver = tf.train.Saver()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 713, in __init__
    restore_sequentially=restore_sequentially)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 432, in build
    filename_tensor, vars_to_save, restore_sequentially, reshape)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 191, in _AddRestoreOps
    values = self.restore_op(filename_tensor, vs, preferred_shard)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 106, in restore_op
    preferred_shard=preferred_shard)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/io_ops.py", line 189, in _restore_slice
    preferred_shard, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 271, in _restore_slice
    preferred_shard=preferred_shard, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 664, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1834, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1043, in __init__
    self._traceback = _extract_stack()

So perhaps the original code (see the external link above) could be modified to something like this:

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  weight_var = tf.Variable(initial, name="weight_var")
  return weight_var

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  bias_var = tf.Variable(initial, name="bias_var")
  return bias_var

But then the question I have: is restoring weight_var and bias_var variables sufficient to implement the prediction? I did the training on the powerful machine with GPU and I would like to copy the model to the less powerful computer without GPU to run predictions.

like image 592
Tomas Avatar asked Jan 24 '16 22:01

Tomas


2 Answers

There's a similar question here: Tensorflow: how to save/restore a model? TLDR; you need to recreate model structure using same sequence of TensorFlow API commands before using Saver object to restore the weights

This is suboptimal, follow Github issue #696 for progress on making this easier

like image 85
Yaroslav Bulatov Avatar answered Nov 02 '22 04:11

Yaroslav Bulatov


If a problem like this occurs then try to restart your kernel as the current variable overwrites the previous causing conflict between them, thus it shows notFoundError and other issues come up.

I encountered the same type of problem and restarting the kernel worked for me. (Caution: Try avoiding running your kernel multiple times as it can ruin your model file recreating variables that overwrite the existing one thus end up changing the original values.)

like image 38
Mahesh_Tripathi Avatar answered Nov 02 '22 04:11

Mahesh_Tripathi