I execute:
gcloud beta ml jobs submit training ${JOB_NAME} --config config.yaml
and after about 5 minutes the job errors out with this error:
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 232, in <module> tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 228, in main run_training()
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 129, in run_training data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py", line 212, in read_data_sets with open(local_file, 'rb') as f: IOError: [Errno 2] No such file or directory: 'gs://my-bucket/mnist/train/train-images.gz'
The strange thing is, as far as I can tell, that file exists at that url.
This error usually indicates you are using a multi-region GCS bucket for your output. To avoid this error you should use a regional GCS bucket. Regional buckets provide stronger consistency guarantees which are needed to avoid these types of errors.
For more information about properly setting up GCS buckets for Cloud ML please refer to the Cloud ML Docs
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With