I can't seem to be able to restore saved variables when using TensorFlow in a Jupyter notebook. I train an ANN, then I run saver.save(sess, "params1.ckpt")
then I train it again, save the new result saver.save(sess, "params2.ckpt")
but when I run saver.restore(sess, "params1.ckpt")
my model doesn't load the values saved on params1.ckpt
and keeps those in params2.ckpt
.
If I run the model, save it on params.ckpt
, then close and halt, then try to load it again, I get the following error:
---------------------------------------------------------------------------
StatusNotOK Traceback (most recent call last)
StatusNotOK: Not found: Tensor name "Variable/Adam" not found in checkpoint files params.ckpt
[[Node: save/restore_slice_1 = RestoreSlice[dt=DT_FLOAT, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice_1/tensor_name, save/restore_slice_1/shape_and_slice)]]
During handling of the above exception, another exception occurred:
SystemError Traceback (most recent call last)
<ipython-input-6-39ae6b7641bd> in <module>()
----> 1 saver.restore(sess, "params.ckpt")
/usr/local/lib/python3.5/site-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path)
889 save_path: Path where parameters were previously saved.
890 """
--> 891 sess.run([self._restore_op_name], {self._filename_tensor_name: save_path})
892
893
/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict)
366
367 # Run request and get response.
--> 368 results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
369
370 # User may have fetched the same tensor multiple times, but we
/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, target_list, fetch_list, feed_dict)
426
427 return tf_session.TF_Run(self._session, feed_dict, fetch_list,
--> 428 target_list)
429
430 except tf_session.StatusNotOK as e:
SystemError: <built-in function delete_Status> returned a result with an error set
My code for training is:
def weight_variable(shape, name):
initial = tf.truncated_normal(shape, stddev=1.0, name=name)
return tf.Variable(initial)
def bias_variable(shape, name):
initial = tf.constant(1.0, shape=shape)
return tf.Variable(initial, name=name)
input_file = pd.read_csv('P2R0PC0.csv')
features = #vector with 5 feature names
targets = #vector with 4 feature names
x_data = input_file.as_matrix(features)
t_data = input_file.as_matrix(targets)
x = tf.placeholder(tf.float32, [None, x_data.shape[1]])
hiddenDim = 5
b1 = bias_variable([hiddenDim], name = "b1")
W1 = weight_variable([x_data.shape[1], hiddenDim], name = "W1")
b2 = bias_variable([t_data.shape[1]], name = "b2")
W2 = weight_variable([hiddenDim, t_data.shape[1]], name = "W2")
hidden = tf.nn.sigmoid(tf.matmul(x, W1) + b1)
y = tf.nn.sigmoid(tf.matmul(hidden, W2) + b2)
t = tf.placeholder(tf.float32, [None, t_data.shape[1]])
lambda1 = 1
beta1 = 1
lambda2 = 1
beta2 = 1
error = -tf.reduce_sum(t * tf.log(tf.clip_by_value(y,1e-10,1.0)) + (1 - t) * tf.log(tf.clip_by_value(1 - y,1e-10,1.0)))
complexity = lambda1 * tf.nn.l2_loss(W1) + beta1 * tf.nn.l2_loss(b1) + lambda2 * tf.nn.l2_loss(W2) + beta2 * tf.nn.l2_loss(b2)
loss = error + complexity
train_step = tf.train.AdamOptimizer(0.001).minimize(loss)
sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)
ran = 25001
delta = 250
plot_data = np.zeros(int(ran / delta + 1))
k = 0;
for i in range(ran):
train_step.run({x: data, t: labels}, sess)
if i % delta == 0:
plot_data[k] = loss.eval({x: data, t: labels}, sess)
#plot_training[k] = loss.eval({x: x_test, t: t_test}, sess)
print(str(plot_data[k]))
k = k + 1
plt.plot(np.arange(start=2, stop=int(ran / delta + 1)), plot_data[2:])
saver = tf.train.Saver()
saver.save(sess, "params.ckpt")
error.eval({x:data, t: labels}, session=sess)
Am I doing anything wrong? Why can't I ever restore my variables?
No install necessary—run the TensorFlow tutorials directly in the browser with Colaboratory, a Google research project created to help disseminate machine learning education and research. It's a Jupyter notebook environment that requires no setup to use and runs entirely in the cloud.
It looks like you are using Jupyter to build your model. One possible issue, when constructing a tf.Saver
with the default arguments is that it will use the (auto-generated) names for the variables as the keys in your checkpoint. Since in Jupyter its easy to re-execute code cells multiple times, you might be ending up with multiple copies of the variable nodes in the session that you save. See my answer to this question for an explanation of what can go wrong.
There are a few possible solutions. Here are the easiest:
Call tf.reset_default_graph()
before you build your model (and the Saver
). This will ensure that the variables get the names you intended, but it will invalidate previously-created graphs.
Use explicit arguments to tf.train.Saver()
to specify the persistent names for the variables. For your example this shouldn't be too hard (though it becomes unwieldy for larger models):
saver = tf.train.Saver(var_list={"b1": b1, "W1": W1, "b2": b2, "W2": W2})
Create a new tf.Graph()
and make it the default each time you create the model. This can be tricky in Jupyter, since it forces you to put all of the model building code in one cell, but it works well for scripts:
with tf.Graph().as_default():
# Model building and training/evaluation code goes here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With