I am try to classfication with object detection at the colab.I am using "ssd_resnet101_v1_fpn_640x640_coco17_tpu-8.config"When I start to training I get error. Training=
!python model_main_tf2.py \
--pipeline_config_path=training/ssd_resnet101_v1_fpn_640x640_coco17_tpu-8.config \
--model_dir=training \
--alsologtostderr
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
W1130 13:39:27.991891 140559633127296 util.py:158] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
The checkpoint includes variables created by this object and any trackable objects it depends on at the time Checkpoint. write() is called. write does not number checkpoints, increment save_counter , or update the metadata used by tf. train.
To continue training a loaded model with checkpoints, we simply rerun the model. fit function with the callback still parsed. This however overwrites the currently saved best model, so make sure to change the checkpoint file path if this is undesired.
I was dealing with the same error. I assume that the training stopped when you got the error you cited above. If so, you might want to check your folder paths.
I was able to get rid of the error myself when I figured out that I was trying to create a new model but TF was looking to a 'model_dir' folder that contained checkpoints from my previous model. Because my num_steps was not greater than the num_steps used in the previous model, TF effectively stopped running the training because the num_steps had already been completed.
By changing the model_dir to a brand new folder, I was able to overcome this error and begin training a new model. Hopefully this works for you as well.
If anyone is trying to continue their training, the solution as @GbG mentioned is to update your num_steps
value in the pipeline.config
:
Original:
num_steps: 25000
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: .04
total_steps: 25000
Updated:
num_steps: 50000
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: .04
total_steps: 50000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With