Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

'No such file or directory' error after submitting a training job

Tags:

I execute:

gcloud beta ml jobs submit training ${JOB_NAME} --config config.yaml

and after about 5 minutes the job errors out with this error:

Traceback (most recent call last): 
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) 
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals 
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 232, in <module> tf.app.run() 
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run sys.exit(main(sys.argv[:1] + flags_passthrough)) 
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 228, in main run_training() 
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 129, in run_training data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data) 
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py", line 212, in read_data_sets with open(local_file, 'rb') as f: IOError: [Errno 2] No such file or directory: 'gs://my-bucket/mnist/train/train-images.gz'

The strange thing is, as far as I can tell, that file exists at that url.

like image 380
Amir Hormati Avatar asked Sep 29 '16 16:09

Amir Hormati


1 Answers

This error usually indicates you are using a multi-region GCS bucket for your output. To avoid this error you should use a regional GCS bucket. Regional buckets provide stronger consistency guarantees which are needed to avoid these types of errors.

For more information about properly setting up GCS buckets for Cloud ML please refer to the Cloud ML Docs

like image 100
Amir Hormati Avatar answered Sep 25 '22 14:09

Amir Hormati