Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"tensorflow.python.framework.errors_impl.FailedPreconditionError" while running "model_main_tf2.py" for training object detection model in tensorflow

Many people have also faced this issue, but it alway seems to have happened because of some mistake in the command line argument

This is the command I'm running

!python "/content/drive/My Drive/Tensorflow/models/research/object_detection/model_main_tf2.py" --model_dir="/content/drive/My Drive/Tensorflow/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8" --pipeline_config_path="/content/drive/My Drive/Tensorflow/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/pipeline.config"

There doesn't seem to be any mistake in it.

This is the stack trace

    Traceback (most recent call last):
  File "/content/drive/My Drive/Tensorflow/models/research/object_detection/model_main_tf2.py", line 113, in <module>
    tf.compat.v1.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/content/drive/My Drive/Tensorflow/models/research/object_detection/model_main_tf2.py", line 110, in main
    record_summaries=FLAGS.record_summaries)
  File "/usr/local/lib/python3.6/dist-packages/object_detection/model_lib_v2.py", line 630, in train_loop
    manager.save()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_management.py", line 819, in save
    self._record_state()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_management.py", line 728, in _record_state
    save_relative_paths=True)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_management.py", line 248, in update_checkpoint_state_internal
    text_format.MessageToString(ckpt))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/lib/io/file_io.py", line 570, in atomic_write_string_to_file
    rename(temp_pathname, filename, overwrite)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/lib/io/file_io.py", line 529, in rename
    rename_v2(oldname, newname, overwrite)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/lib/io/file_io.py", line 546, in rename_v2
    compat.as_bytes(src), compat.as_bytes(dst), overwrite)

Error message:

tensorflow.python.framework.errors_impl.FailedPreconditionError: /content/drive/My Drive/Tensorflow/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/checkpoint.tmp91048f3bf67645619be6603094546de1; Is a directory

The error is raised from _pywrap_file_io.RenameFile(), where _pywrap_file_io is imported from tensorflow.python. I tried to look into the source code to find the problem, but I couldn't find it anywhere.

Could the problem have arraised because I'm running this on colab ?

Tensorflow version: 2.3 Python version: 3.6

Can someone please help me with this.

like image 517
Harish Babu Avatar asked Dec 17 '22 12:12

Harish Babu


1 Answers

The problem was that the program was trying to create a file with the name "checkpoint" but there was a folder with the same name in the downloaded model. There are two ways to overcome this issue,

  1. Create a new folder and set its path as the argument for --model_dir
  2. Check if there is a folder named 'checkpoint', if there is, then change the folder name. In my case, I changed it to 'checkpoint0'.
like image 175
Harish Babu Avatar answered Apr 28 '23 02:04

Harish Babu