Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loading trained Tensorflow model into estimator

Say that I have trained a Tensorflow Estimator:

estimator = tf.contrib.learn.Estimator(
  model_fn=model_fn,
  model_dir=MODEL_DIR,
  config=some_config)

And I fit it to some train data:

estimator.fit(input_fn=input_fn_train, steps=None)

The idea is that a model is fit to my MODEL_DIR. This folder contains a checkpoint and several files of .meta and .index.

This works perfectly. I want to do some predictions using my functions:

estimator = tf.contrib.Estimator(
  model_fn=model_fn,
  model_dir=MODEL_DIR,
  config=some_config)

predictions = estimator.predict(input_fn=input_fn_test)

My solution works perfectly but there is one big disadvantage: you need to know model_fn, which is my model defined in Python. But if I change the model by adding a dense layer in my Python code, this model is incorrect for the saved data in MODEL_DIR, leading to incorrect results:

NotFoundError (see above for traceback): Key xxxx/dense/kernel not found in checkpoint

How do I cope with this? How can I load my model / estimator such that I can make predictions on some new data? How can I load model_fn or the estimator from MODEL_DIR?

like image 504
Guido Avatar asked Jan 10 '18 16:01

Guido


People also ask

How do I save a model after training TensorFlow?

Using save_weights() method It saves the weights of the layers contained in the model. It is advised to use the save() method to save h5 models instead of save_weights() method for saving a model using tensorflow. However, h5 models can also be saved using save_weights() method.


1 Answers

Avoiding a bad restoration

Restoring a model's state from a checkpoint only works if the model and checkpoint are compatible. For example, suppose you trained a DNNClassifier Estimator containing two hidden layers, each having 10 nodes:

classifier = tf.estimator.DNNClassifier(
    feature_columns=feature_columns,
    hidden_units=[10, 10],
    n_classes=3,
    model_dir='models/iris')

classifier.train(
    input_fn=lambda:train_input_fn(train_x, train_y, batch_size=100),
        steps=200)

After training (and, therefore, after creating checkpoints in models/iris), imagine that you changed the number of neurons in each hidden layer from 10 to 20 and then attempted to retrain the model:

classifier2 = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    hidden_units=[20, 20],  # Change the number of neurons in the model.
    n_classes=3,
    model_dir='models/iris')

classifier.train(
    input_fn=lambda:train_input_fn(train_x, train_y, batch_size=100),
        steps=200)

Since the state in the checkpoint is incompatible with the model described in classifier2, retraining fails with the following error:

...
InvalidArgumentError (see above for traceback): tensor_name =
dnn/hiddenlayer_1/bias/t_0/Adagrad; shape in shape_and_slice spec [10]
does not match the shape stored in checkpoint: [20]

To run experiments in which you train and compare slightly different versions of a model, save a copy of the code that created each model_dir, possibly by creating a separate git branch for each version. This separation will keep your checkpoints recoverable.

copy from tensorflow checkpoints doc.

https://www.tensorflow.org/get_started/checkpoints

hope that can help you.

like image 100
Colin Wang Avatar answered Oct 07 '22 17:10

Colin Wang