I can't seem to be able to restore saved variables when using TensorFlow in a Jupyter notebook. I train an ANN, then I run <code>saver.save(sess, "params1.ckpt")</code> then I train it again, save the new result <code>saver.save(sess, "params2.ckpt")</code> but when I run <code>saver.restore(sess, "params1.ckpt")</code> my model doesn't load the values saved on <code>params1.ckpt</code> and keeps those in <code>params2.ckpt</code>. If I run the model, save it on <code>params.ckpt</code>, then close and halt, then try to load it again, I get the following error: <pre class="prettyprint"><code>--------------------------------------------------------------------------- StatusNotOK Traceback (most recent call last) StatusNotOK: Not found: Tensor name "Variable/Adam" not found in checkpoint files params.ckpt [[Node: save/restore_slice_1 = RestoreSlice[dt=DT_FLOAT, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice_1/tensor_name, save/restore_slice_1/shape_and_slice)]] During handling of the above exception, another exception occurred: SystemError Traceback (most recent call last) <ipython-input-6-39ae6b7641bd> in <module>() ----> 1 saver.restore(sess, "params.ckpt") /usr/local/lib/python3.5/site-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path) 889 save_path: Path where parameters were previously saved. 890 """ --> 891 sess.run([self._restore_op_name], {self._filename_tensor_name: save_path}) 892 893 /usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict) 366 367 # Run request and get response. --> 368 results = self._do_run(target_list, unique_fetch_targets, feed_dict_string) 369 370 # User may have fetched the same tensor multiple times, but we /usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, target_list, fetch_list, feed_dict) 426 427 return tf_session.TF_Run(self._session, feed_dict, fetch_list, --> 428 target_list) 429 430 except tf_session.StatusNotOK as e: SystemError: <built-in function delete_Status> returned a result with an error set </code></pre> My code for training is: <pre class="prettyprint"><code>def weight_variable(shape, name): initial = tf.truncated_normal(shape, stddev=1.0, name=name) return tf.Variable(initial) def bias_variable(shape, name): initial = tf.constant(1.0, shape=shape) return tf.Variable(initial, name=name) input_file = pd.read_csv('P2R0PC0.csv') features = #vector with 5 feature names targets = #vector with 4 feature names x_data = input_file.as_matrix(features) t_data = input_file.as_matrix(targets) x = tf.placeholder(tf.float32, [None, x_data.shape[1]]) hiddenDim = 5 b1 = bias_variable([hiddenDim], name = "b1") W1 = weight_variable([x_data.shape[1], hiddenDim], name = "W1") b2 = bias_variable([t_data.shape[1]], name = "b2") W2 = weight_variable([hiddenDim, t_data.shape[1]], name = "W2") hidden = tf.nn.sigmoid(tf.matmul(x, W1) + b1) y = tf.nn.sigmoid(tf.matmul(hidden, W2) + b2) t = tf.placeholder(tf.float32, [None, t_data.shape[1]]) lambda1 = 1 beta1 = 1 lambda2 = 1 beta2 = 1 error = -tf.reduce_sum(t * tf.log(tf.clip_by_value(y,1e-10,1.0)) + (1 - t) * tf.log(tf.clip_by_value(1 - y,1e-10,1.0))) complexity = lambda1 * tf.nn.l2_loss(W1) + beta1 * tf.nn.l2_loss(b1) + lambda2 * tf.nn.l2_loss(W2) + beta2 * tf.nn.l2_loss(b2) loss = error + complexity train_step = tf.train.AdamOptimizer(0.001).minimize(loss) sess = tf.Session() init = tf.initialize_all_variables() sess.run(init) ran = 25001 delta = 250 plot_data = np.zeros(int(ran / delta + 1)) k = 0; for i in range(ran): train_step.run({x: data, t: labels}, sess) if i % delta == 0: plot_data[k] = loss.eval({x: data, t: labels}, sess) #plot_training[k] = loss.eval({x: x_test, t: t_test}, sess) print(str(plot_data[k])) k = k + 1 plt.plot(np.arange(start=2, stop=int(ran / delta + 1)), plot_data[2:]) saver = tf.train.Saver() saver.save(sess, "params.ckpt") error.eval({x:data, t: labels}, session=sess) </code></pre> Am I doing anything wrong? Why can't I ever restore my variables?

It looks like you are using Jupyter to build your model. One possible issue, when constructing a <code>tf.Saver</code> with the default arguments is that it will use the (auto-generated) names for the variables as the keys in your checkpoint. Since in Jupyter its easy to re-execute code cells multiple times, you might be ending up with multiple copies of the variable nodes in the session that you save. See my answer to this question for an explanation of what can go wrong. There are a few possible solutions. Here are the easiest: <ul> <li>Call <code>tf.reset_default_graph()</code> before you build your model (and the <code>Saver</code>). This will ensure that the variables get the names you intended, but it will invalidate previously-created graphs.</li> <li> Use explicit arguments to <code>tf.train.Saver()</code> to specify the persistent names for the variables. For your example this shouldn't be too hard (though it becomes unwieldy for larger models): <pre class="prettyprint"><code>saver = tf.train.Saver(var_list={"b1": b1, "W1": W1, "b2": b2, "W2": W2}) </code></pre> </li> <li> Create a new <code>tf.Graph()</code> and make it the default each time you create the model. This can be tricky in Jupyter, since it forces you to put all of the model building code in one cell, but it works well for scripts: <pre class="prettyprint"><code>with tf.Graph().as_default(): # Model building and training/evaluation code goes here. </code></pre> </li> </ul>

TensorFlow on Jupyter: Can't restore variables

Tags:

python

tensorflow

jupyter

I can't seem to be able to restore saved variables when using TensorFlow in a Jupyter notebook. I train an ANN, then I run saver.save(sess, "params1.ckpt") then I train it again, save the new result saver.save(sess, "params2.ckpt") but when I run saver.restore(sess, "params1.ckpt") my model doesn't load the values saved on params1.ckpt and keeps those in params2.ckpt.

If I run the model, save it on params.ckpt, then close and halt, then try to load it again, I get the following error:

---------------------------------------------------------------------------
StatusNotOK                               Traceback (most recent call last)
StatusNotOK: Not found: Tensor name "Variable/Adam" not found in checkpoint files params.ckpt
     [[Node: save/restore_slice_1 = RestoreSlice[dt=DT_FLOAT, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice_1/tensor_name, save/restore_slice_1/shape_and_slice)]]

During handling of the above exception, another exception occurred:

SystemError                               Traceback (most recent call last)
<ipython-input-6-39ae6b7641bd> in <module>()
----> 1 saver.restore(sess, "params.ckpt")

/usr/local/lib/python3.5/site-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path)
    889       save_path: Path where parameters were previously saved.
    890     """
--> 891     sess.run([self._restore_op_name], {self._filename_tensor_name: save_path})
    892 
    893 

/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict)
    366 
    367     # Run request and get response.
--> 368     results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
    369 
    370     # User may have fetched the same tensor multiple times, but we

/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, target_list, fetch_list, feed_dict)
    426 
    427       return tf_session.TF_Run(self._session, feed_dict, fetch_list,
--> 428                                target_list)
    429 
    430     except tf_session.StatusNotOK as e:

SystemError: <built-in function delete_Status> returned a result with an error set

My code for training is:

def weight_variable(shape, name):
  initial = tf.truncated_normal(shape, stddev=1.0, name=name)
  return tf.Variable(initial)

def bias_variable(shape, name):
  initial = tf.constant(1.0, shape=shape)
  return tf.Variable(initial, name=name)

input_file = pd.read_csv('P2R0PC0.csv') 
features = #vector with 5 feature names
targets = #vector with 4 feature names
x_data = input_file.as_matrix(features)
t_data = input_file.as_matrix(targets)

x = tf.placeholder(tf.float32, [None, x_data.shape[1]])

hiddenDim = 5

b1 = bias_variable([hiddenDim], name = "b1")
W1 = weight_variable([x_data.shape[1], hiddenDim], name = "W1")

b2 = bias_variable([t_data.shape[1]], name = "b2")
W2 = weight_variable([hiddenDim, t_data.shape[1]], name = "W2")

hidden = tf.nn.sigmoid(tf.matmul(x, W1) + b1)
y = tf.nn.sigmoid(tf.matmul(hidden, W2) + b2)
t = tf.placeholder(tf.float32, [None, t_data.shape[1]])

lambda1 = 1
beta1 = 1
lambda2 = 1
beta2 = 1
error = -tf.reduce_sum(t * tf.log(tf.clip_by_value(y,1e-10,1.0)) + (1 - t) * tf.log(tf.clip_by_value(1 - y,1e-10,1.0)))
complexity = lambda1 * tf.nn.l2_loss(W1) + beta1 * tf.nn.l2_loss(b1) + lambda2 * tf.nn.l2_loss(W2) + beta2 * tf.nn.l2_loss(b2)
loss = error + complexity

train_step = tf.train.AdamOptimizer(0.001).minimize(loss)
sess = tf.Session()

init = tf.initialize_all_variables()
sess.run(init)

ran = 25001
delta = 250

plot_data = np.zeros(int(ran / delta + 1))
k = 0;
for i in range(ran):
    train_step.run({x: data, t: labels}, sess)
    if i % delta == 0:
        plot_data[k] = loss.eval({x: data, t: labels}, sess)
        #plot_training[k] = loss.eval({x: x_test, t: t_test}, sess)
        print(str(plot_data[k]))
        k = k + 1

plt.plot(np.arange(start=2, stop=int(ran / delta + 1)), plot_data[2:])

saver = tf.train.Saver()
saver.save(sess, "params.ckpt")

error.eval({x:data, t: labels}, session=sess)

Am I doing anything wrong? Why can't I ever restore my variables?

944

asked Jan 11 '16 17:01

Pedro Carvalho

1 Answers

It looks like you are using Jupyter to build your model. One possible issue, when constructing a tf.Saver with the default arguments is that it will use the (auto-generated) names for the variables as the keys in your checkpoint. Since in Jupyter its easy to re-execute code cells multiple times, you might be ending up with multiple copies of the variable nodes in the session that you save. See my answer to this question for an explanation of what can go wrong.

There are a few possible solutions. Here are the easiest:

Call tf.reset_default_graph() before you build your model (and the Saver). This will ensure that the variables get the names you intended, but it will invalidate previously-created graphs.
Use explicit arguments to tf.train.Saver() to specify the persistent names for the variables. For your example this shouldn't be too hard (though it becomes unwieldy for larger models):
```
saver = tf.train.Saver(var_list={"b1": b1, "W1": W1, "b2": b2, "W2": W2})
```
Create a new tf.Graph() and make it the default each time you create the model. This can be tricky in Jupyter, since it forces you to put all of the model building code in one cell, but it works well for scripts:
```
with tf.Graph().as_default():
  # Model building and training/evaluation code goes here.
```

120

answered Sep 23 '22 22:09

mrry

Related questions
                            
                                NLTK: How do I traverse a noun phrase to return list of strings?
                            
                                Python extract max value from nested dictionary
                            
                                Pandas datetime to unixtime
                            
                                Error from python worker: /bin/python: No module named pyspark
                            
                                Generated windows exe (pyinstaller) could not load _cffi_backend
                            
                                Pandas : Add new column with function based on index
                            
                                Why do i need to create object of `QApplication` and what is the purpose of it in PyQt GUI programming?
                            
                                how to save pil cropped image to image field in django
                            
                                100 digit floating point python
                            
                                Python - Using defaultdict to make dictionary of custom objects
                            
                                Why am I getting a data conversion warning?
                            
                                Protocol buffers, where to use them? [closed]
                            
                                Plotting Large Datasets in IPython Notebook (Bokeh)
                            
                                Session.run() /Tensor.eval() of Tensorflow run for a crazy long time
                            
                                When is getattr() not like normal attribute access? [duplicate]
                            
                                Find files in a directory containing desired string in Python
                            
                                Is it possible to use functions defined in the shell from python?
                            
                                How to set PYTHONPATH differently for version 2 and 3?
                            
                                Django /manage.py runserver doesn't work (Windows)
                            
                                how to delete kafka message after reading

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With