Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TensorFlow object detection training error with TPU

I'm following along with Google's object detection on a TPU post and have hit a wall when it comes to training.

Looking at the job logs, I can see that ml-engine runs a ton of pip installs for various packages, provisions a TPU, and then submits the following:

Running command: python -m object_detection.model_tpu_main 
--model_dir=gs://{MY_BUCKET}/train --tpu_zone us-central1 
--pipeline_config_path=gs://{MY_BUCKET}/data/pipeline.config 
--job-dir gs://{MY_BUCKET}/train

It then errors with:

message:  "Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/object_detection/model_tpu_main.py", line 30, in <module>
from object_detection import model_lib
File "/root/.local/lib/python2.7/site-packages/object_detection/model_lib.py", line 26, in <module>
from object_detection import eval_util
File "/root/.local/lib/python2.7/site-packages/object_detection/eval_util.py", line 28, in <module>
from object_detection.metrics import coco_evaluation
File "/root/.local/lib/python2.7/site-packages/object_detection/metrics/coco_evaluation.py", line 20, in <module>
from object_detection.metrics import coco_tools
File "/root/.local/lib/python2.7/site-packages/object_detection/metrics/coco_tools.py", line 47, in <module>
from pycocotools import coco
File "/root/.local/lib/python2.7/site-packages/pycocotools/coco.py", 
line 49
import matplotlibnmatplotlib.use('Agg')nimport matplotlib.pyplot as plt
                                ^
SyntaxError: invalid syntax
"   

This is my first time using ml-engine and I'm stuck. I find it odd that the error references python2.7, as I submitted the job from my laptop in a python3.6 environment.

Any ideas on where to go from here or what to do?

like image 758
Gshock Avatar asked Dec 04 '22 19:12

Gshock


1 Answers

Based on the stack trace, three different lines of code somehow fell on the same line (line 49). I believe I've encountered the same problem recently playing with the new Tensorflow object detection API, and the problem was in models/research/object_detection/dataset_tools/create_pycocotools_package.sh, specifically the following line:

sed "s/import matplotlib\.pyplot as plt/import matplotlib\nmatplotlib\.use\(\'Agg\'\)\nimport matplotlib\.pyplot as plt/g" pycocotools/coco.py > coco.py.updated

The problem for me was that the new-line characters weren't recognized, and I solved it by using literal new lines like the following:

sed "s/import matplotlib\.pyplot as plt/import matplotlib\\ matplotlib\.use\(\'Agg\'\)\\ import matplotlib\.pyplot as plt/g" pycocotools/coco.py > coco.py.updated

Hope this helps.

like image 172
K.Lee Avatar answered Jan 28 '23 07:01

K.Lee