Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Unable to get Filesystem for path" error when training neural network on google cloud

I am using Google Cloud to train a neural network on the cloud like in the following example:

https://cloud.google.com/blog/big-data/2016/12/how-to-classify-images-with-tensorflow-using-google-cloud-machine-learning-and-cloud-dataflow

To start I set the following to environmental variables:

PROJECT_ID=$(gcloud config list project --format "value(core.project)")
BUCKET_NAME=${PROJECT_ID}-mlengine

I then uploaded my training and evaluation data, both csv's with the names eval_set.csv and train_set.csv to Google cloud storage with the following command:

gsutil cp -r data gs://$BUCKET_NAME

I then verified that these two csv files where in the polar-terminal-160506-mlengine/data directory on my Google Cloud storage.

I then did the following environmental variable assignments

# Assign appropriate values.
PROJECT=$(gcloud config list project --format "value(core.project)")
JOB_ID="flowers_${USER}_$(date +%Y%m%d_%H%M%S)"
GCS_PATH="${BUCKET}/${USER}/${JOB_ID}"
DICT_FILE=gs://cloud-ml-data/img/flower_photos/dict.txt

Before trying to preprocess my evaluation data like so:

# Preprocess the eval set.
python trainer/preprocess.py \
  --input_dict "$DICT_FILE" \
  --input_path "gs://cloud-ml-data/img/flower_photos/eval_set.csv" \
  --output_path "${GCS_PATH}/preproc/eval" \
  --cloud

Sadly, this runs for a bit and then crashes outputting the following error:

ValueError: Unable to get the Filesystem for path gs://polar-terminal-160506-mlengine/data/eval_set.csv

This doesn't seem possible as I have confirmed with my eyes via my Google Cloud Storage console that eval_set.csv is stored at this location. Is this perhaps a permissions issue or something I am not seeing?

Edit:

I have found the cause of this run time error to be from a certain line in the trainer.preprocess.py file. The line is this one:

read_input_source = beam.io.ReadFromText(
      opt.input_path, strip_trailing_newlines=True)

Seems like a pretty good clue but I am still not really sure what is going on. When I google "beam.io.ReadFromText ValueError: Unable to get the Filesystem for path" nothing relevant at all appears which is a bit odd. Thoughts?

like image 438
sometimesiwritecode Avatar asked May 23 '17 01:05

sometimesiwritecode


3 Answers

Try pip install apache_beam[gcp]. This will help you.

like image 23
New_Coder Avatar answered Nov 17 '22 15:11

New_Coder


It looks like your apache-beam library installation might be incomplete.

try pip install apache-beam[gcp]

It allows apache beam to access files stored on Google Cloud Storage.

Apache Beam package available here

like image 81
Jean-Christophe Rodrigue Avatar answered Nov 17 '22 15:11

Jean-Christophe Rodrigue


Just as Jean-Christophe described, I believe your installation is incomplete.

The apache-beam package doesn't include all the stuff to read/write from GCP. To get all that, as well as the runner for being able to deploy your pipeline to CloudDataflow (the DataRunner), you'll need to install it via pip.

pip install google-cloud-dataflow

This is how I was able to resolve the same issue.

like image 32
adityajones Avatar answered Nov 17 '22 17:11

adityajones