Say that I have trained a Tensorflow Estimator:
estimator = tf.contrib.learn.Estimator(
model_fn=model_fn,
model_dir=MODEL_DIR,
config=some_config)
And I fit it to some train data:
estimator.fit(input_fn=input_fn_train, steps=None)
The idea is that a model is fit to my MODEL_DIR. This folder contains a checkpoint and several files of .meta
and .index
.
This works perfectly. I want to do some predictions using my functions:
estimator = tf.contrib.Estimator(
model_fn=model_fn,
model_dir=MODEL_DIR,
config=some_config)
predictions = estimator.predict(input_fn=input_fn_test)
My solution works perfectly but there is one big disadvantage: you need to know model_fn, which is my model defined in Python. But if I change the model by adding a dense layer in my Python code, this model is incorrect for the saved data in MODEL_DIR, leading to incorrect results:
NotFoundError (see above for traceback): Key xxxx/dense/kernel not found in checkpoint
How do I cope with this? How can I load my model / estimator such that I can make predictions on some new data? How can I load model_fn or the estimator from MODEL_DIR?
Using save_weights() method It saves the weights of the layers contained in the model. It is advised to use the save() method to save h5 models instead of save_weights() method for saving a model using tensorflow. However, h5 models can also be saved using save_weights() method.
Restoring a model's state from a checkpoint only works if the model and checkpoint are compatible. For example, suppose you trained a DNNClassifier
Estimator containing two hidden layers, each having 10 nodes:
classifier = tf.estimator.DNNClassifier(
feature_columns=feature_columns,
hidden_units=[10, 10],
n_classes=3,
model_dir='models/iris')
classifier.train(
input_fn=lambda:train_input_fn(train_x, train_y, batch_size=100),
steps=200)
After training (and, therefore, after creating checkpoints in models/iris
), imagine that you changed the number of neurons in each hidden layer from 10 to 20 and then attempted to retrain the model:
classifier2 = tf.estimator.DNNClassifier(
feature_columns=my_feature_columns,
hidden_units=[20, 20], # Change the number of neurons in the model.
n_classes=3,
model_dir='models/iris')
classifier.train(
input_fn=lambda:train_input_fn(train_x, train_y, batch_size=100),
steps=200)
Since the state in the checkpoint is incompatible with the model described in classifier2
, retraining fails with the following error:
...
InvalidArgumentError (see above for traceback): tensor_name =
dnn/hiddenlayer_1/bias/t_0/Adagrad; shape in shape_and_slice spec [10]
does not match the shape stored in checkpoint: [20]
To run experiments in which you train and compare slightly different versions of a model, save a copy of the code that created each model_dir
, possibly by creating a separate git branch for each version. This separation will keep your checkpoints recoverable.
copy from tensorflow checkpoints doc.
https://www.tensorflow.org/get_started/checkpoints
hope that can help you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With