Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TensorFlow on Jupyter: Can't restore variables

I can't seem to be able to restore saved variables when using TensorFlow in a Jupyter notebook. I train an ANN, then I run saver.save(sess, "params1.ckpt") then I train it again, save the new result saver.save(sess, "params2.ckpt") but when I run saver.restore(sess, "params1.ckpt") my model doesn't load the values saved on params1.ckpt and keeps those in params2.ckpt.

If I run the model, save it on params.ckpt, then close and halt, then try to load it again, I get the following error:

---------------------------------------------------------------------------
StatusNotOK                               Traceback (most recent call last)
StatusNotOK: Not found: Tensor name "Variable/Adam" not found in checkpoint files params.ckpt
     [[Node: save/restore_slice_1 = RestoreSlice[dt=DT_FLOAT, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice_1/tensor_name, save/restore_slice_1/shape_and_slice)]]

During handling of the above exception, another exception occurred:

SystemError                               Traceback (most recent call last)
<ipython-input-6-39ae6b7641bd> in <module>()
----> 1 saver.restore(sess, "params.ckpt")

/usr/local/lib/python3.5/site-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path)
    889       save_path: Path where parameters were previously saved.
    890     """
--> 891     sess.run([self._restore_op_name], {self._filename_tensor_name: save_path})
    892 
    893 

/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict)
    366 
    367     # Run request and get response.
--> 368     results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
    369 
    370     # User may have fetched the same tensor multiple times, but we

/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, target_list, fetch_list, feed_dict)
    426 
    427       return tf_session.TF_Run(self._session, feed_dict, fetch_list,
--> 428                                target_list)
    429 
    430     except tf_session.StatusNotOK as e:

SystemError: <built-in function delete_Status> returned a result with an error set

My code for training is:

def weight_variable(shape, name):
  initial = tf.truncated_normal(shape, stddev=1.0, name=name)
  return tf.Variable(initial)

def bias_variable(shape, name):
  initial = tf.constant(1.0, shape=shape)
  return tf.Variable(initial, name=name)

input_file = pd.read_csv('P2R0PC0.csv') 
features = #vector with 5 feature names
targets = #vector with 4 feature names
x_data = input_file.as_matrix(features)
t_data = input_file.as_matrix(targets)

x = tf.placeholder(tf.float32, [None, x_data.shape[1]])

hiddenDim = 5

b1 = bias_variable([hiddenDim], name = "b1")
W1 = weight_variable([x_data.shape[1], hiddenDim], name = "W1")

b2 = bias_variable([t_data.shape[1]], name = "b2")
W2 = weight_variable([hiddenDim, t_data.shape[1]], name = "W2")

hidden = tf.nn.sigmoid(tf.matmul(x, W1) + b1)
y = tf.nn.sigmoid(tf.matmul(hidden, W2) + b2)
t = tf.placeholder(tf.float32, [None, t_data.shape[1]])

lambda1 = 1
beta1 = 1
lambda2 = 1
beta2 = 1
error = -tf.reduce_sum(t * tf.log(tf.clip_by_value(y,1e-10,1.0)) + (1 - t) * tf.log(tf.clip_by_value(1 - y,1e-10,1.0)))
complexity = lambda1 * tf.nn.l2_loss(W1) + beta1 * tf.nn.l2_loss(b1) + lambda2 * tf.nn.l2_loss(W2) + beta2 * tf.nn.l2_loss(b2)
loss = error + complexity

train_step = tf.train.AdamOptimizer(0.001).minimize(loss)
sess = tf.Session()

init = tf.initialize_all_variables()
sess.run(init)

ran = 25001
delta = 250

plot_data = np.zeros(int(ran / delta + 1))
k = 0;
for i in range(ran):
    train_step.run({x: data, t: labels}, sess)
    if i % delta == 0:
        plot_data[k] = loss.eval({x: data, t: labels}, sess)
        #plot_training[k] = loss.eval({x: x_test, t: t_test}, sess)
        print(str(plot_data[k]))
        k = k + 1

plt.plot(np.arange(start=2, stop=int(ran / delta + 1)), plot_data[2:])

saver = tf.train.Saver()
saver.save(sess, "params.ckpt")

error.eval({x:data, t: labels}, session=sess)

Am I doing anything wrong? Why can't I ever restore my variables?

like image 944
Pedro Carvalho Avatar asked Jan 11 '16 17:01

Pedro Carvalho


People also ask

Can we use Tensorflow in Jupyter notebook?

No install necessary—run the TensorFlow tutorials directly in the browser with Colaboratory, a Google research project created to help disseminate machine learning education and research. It's a Jupyter notebook environment that requires no setup to use and runs entirely in the cloud.


1 Answers

It looks like you are using Jupyter to build your model. One possible issue, when constructing a tf.Saver with the default arguments is that it will use the (auto-generated) names for the variables as the keys in your checkpoint. Since in Jupyter its easy to re-execute code cells multiple times, you might be ending up with multiple copies of the variable nodes in the session that you save. See my answer to this question for an explanation of what can go wrong.

There are a few possible solutions. Here are the easiest:

  • Call tf.reset_default_graph() before you build your model (and the Saver). This will ensure that the variables get the names you intended, but it will invalidate previously-created graphs.

  • Use explicit arguments to tf.train.Saver() to specify the persistent names for the variables. For your example this shouldn't be too hard (though it becomes unwieldy for larger models):

    saver = tf.train.Saver(var_list={"b1": b1, "W1": W1, "b2": b2, "W2": W2})
    
  • Create a new tf.Graph() and make it the default each time you create the model. This can be tricky in Jupyter, since it forces you to put all of the model building code in one cell, but it works well for scripts:

    with tf.Graph().as_default():
      # Model building and training/evaluation code goes here.
    
like image 120
mrry Avatar answered Sep 23 '22 22:09

mrry