I have run the distributed mnist example: https://github.com/tensorflow/tensorflow/blob/r0.12/tensorflow/tools/dist_test/python/mnist_replica.py
Though I have set the
saver = tf.train.Saver(max_to_keep=0)
In previous release, like r11, I was able to run over each check point model and evaluate the precision of the model. This gave me a plot of the progress of the precision versus global steps (or iterations).
Prior to r12, tensorflow checkpoint models were saved in two files, model.ckpt-1234
and model-ckpt-1234.meta
. One could restore a model by passing the model.ckpt-1234
filename like so saver.restore(sess,'model.ckpt-1234')
.
However, I've noticed that in r12, there are now three output files model.ckpt-1234.data-00000-of-000001
, model.ckpt-1234.index
, and model.ckpt-1234.meta
.
I see that the the restore documentation says that a path such as /train/path/model.ckpt
should be given to restore instead of a filename. Is there any way to load one checkpoint file at a time to evaluate it? I have tried passing the model.ckpt-1234.data-00000-of-000001
, model.ckpt-1234.index
, and model.ckpt-1234.meta
files, but get errors like below:
W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open logdir/2016-12-08-13-54/model.ckpt-0.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
NotFoundError (see above for traceback): Tensor name "hid_b" not found in checkpoint files logdir/2016-12-08-13-54/model.ckpt-0.index
[[Node: save/RestoreV2_1 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_1/tensor_names, save/RestoreV2_1/shape_and_slices)]]
W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open logdir/2016-12-08-13-54/model.ckpt-0.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
I'm running on OSX Sierra with tensorflow r12 installed via pip.
Any guidance would be helpful.
Thank you.
The model restoring is done using the tf. saved_model. loader and restores the saved variables, signatures, and assets in the scope of a session.
b) Checkpoint file: This is a binary file which contains all the values of the weights, biases, gradients and all the other variables saved. This file has an extension .ckpt. However, Tensorflow has changed this from version 0.11.
You can restore the model like this:
saver = tf.train.import_meta_graph('./src/models/20170512-110547/model-20170512-110547.meta')
saver.restore(sess,'./src/models/20170512-110547/model-20170512-110547.ckpt-250000'))
Where the path '/src/models/20170512-110547/' contains three files:
model-20170512-110547.meta
model-20170512-110547.ckpt-250000.index
model-20170512-110547.ckpt-250000.data-00000-of-00001
And if in one directory there are more than one checkpoints,eg: there are checkpoint files in the path ./20170807-231648/:
checkpoint
model-20170807-231648-0.data-00000-of-00001
model-20170807-231648-0.index
model-20170807-231648-0.meta
model-20170807-231648-100000.data-00000-of-00001
model-20170807-231648-100000.index
model-20170807-231648-100000.meta
you can see that there are two checkpoints, so you can use this:
saver = tf.train.import_meta_graph('/home/tools/Tools/raoqiang/facenet/models/facenet/20170807-231648/model-20170807-231648-0.meta')
saver.restore(sess,tf.train.latest_checkpoint('/home/tools/Tools/raoqiang/facenet/models/facenet/20170807-231648/'))
use only model.ckpt-1234
at least it works for me
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With