Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Object detection Classfication /A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights)

I am try to classfication with object detection at the colab.I am using "ssd_resnet101_v1_fpn_640x640_coco17_tpu-8.config"When I start to training I get error. Training=

!python model_main_tf2.py \
    --pipeline_config_path=training/ssd_resnet101_v1_fpn_640x640_coco17_tpu-8.config \
    --model_dir=training \
    --alsologtostderr
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
W1130 13:39:27.991891 140559633127296 util.py:158] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
like image 935
shine1189 Avatar asked Nov 30 '20 13:11

shine1189


People also ask

What is TF train checkpoint?

The checkpoint includes variables created by this object and any trackable objects it depends on at the time Checkpoint. write() is called. write does not number checkpoints, increment save_counter , or update the metadata used by tf. train.

How do you continue training in Tensorflow?

To continue training a loaded model with checkpoints, we simply rerun the model. fit function with the callback still parsed. This however overwrites the currently saved best model, so make sure to change the checkpoint file path if this is undesired.


2 Answers

I was dealing with the same error. I assume that the training stopped when you got the error you cited above. If so, you might want to check your folder paths.

I was able to get rid of the error myself when I figured out that I was trying to create a new model but TF was looking to a 'model_dir' folder that contained checkpoints from my previous model. Because my num_steps was not greater than the num_steps used in the previous model, TF effectively stopped running the training because the num_steps had already been completed.

By changing the model_dir to a brand new folder, I was able to overcome this error and begin training a new model. Hopefully this works for you as well.

like image 155
Brad G Grounds Avatar answered Oct 23 '22 00:10

Brad G Grounds


If anyone is trying to continue their training, the solution as @GbG mentioned is to update your num_steps value in the pipeline.config:

Original:

  num_steps: 25000
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: .04
          total_steps: 25000

Updated:

  num_steps: 50000
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: .04
          total_steps: 50000
like image 27
TomSelleck Avatar answered Oct 23 '22 00:10

TomSelleck