Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Cloud Storage: Output path does not exist or is not writeable

I am trying to follow this simple Dataflow example from google cloud site.

I have successfully installed the dataflow pipeline plugin and gcloud SDK (as well as Python 2.7). I have also set up a project on google cloud and enabled billing and all the necessary API's - as specified in the instructions above.

However, when I go to the run configurations and change the Pipeline Arguments tab to select BlockingDataflowPipelineRunner, after entering creating a bucket and setting my project-id, hitting run gives me:

Caused by: java.lang.IllegalArgumentException: Output path does not exist or is not writeable: gs://my-cloud-dataflow-bucket
    at com.google.cloud.dataflow.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
    at com.google.cloud.dataflow.sdk.util.DataflowPathValidator.verifyPathIsAccessible(DataflowPathValidator.java:79)
    at com.google.cloud.dataflow.sdk.util.DataflowPathValidator.validateOutputFilePrefixSupported(DataflowPathValidator.java:62)
    at com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.fromOptions(DataflowPipelineRunner.java:255)
    at com.google.cloud.dataflow.sdk.runners.BlockingDataflowPipelineRunner.fromOptions(BlockingDataflowPipelineRunner.java:82)
    ... 9 more

I have used my terminal to execute 'gcloud auth login' and I see in the browser that I am successfully logged in.

I am really not sure what I have done wrong here. Can anyone confirm if this is a known issue with using dataflow pipeline and google buckets?

Thanks!

like image 297
RoshP Avatar asked Mar 19 '16 10:03

RoshP


3 Answers

I had a similar issue with GCS bucket permissions, though I certainly had write permissions and I could upload files into the bucket. What solved the problem for me was acquiring roles/dataflow.admin permission for the project I was submitting the pipeline to.

like image 75
deadmoto Avatar answered Oct 21 '22 23:10

deadmoto


When submitting pipelines to the Google Cloud Dataflow Service, the pipeline runner on your local machine uploads files, which are necessary for execution in the cloud, to a "staging location" in Google Cloud Storage.

The pipeline runner on your local machine seems to be unable to write the required files to the staging location provided (gs://my-cloud-dataflow-bucket). It could be that the location doesn't exist, or that it belongs to a different GCP project than you authenticated against, or that there are more specific permissions set on that bucket, etc.

You can start debugging the issue via gsutil command-line too. For example, try running gsutil ls gs://my-cloud-dataflow-bucket to attempt to list the contents of the bucket. Then, try to upload via gsutil cp command. This will perhaps produce enough information to root-cause the issue you are facing.

like image 3
Davor Bonaci Avatar answered Oct 22 '22 00:10

Davor Bonaci


Try to provide zone parameter, it works in my case with similar error. And of course export GOOGLE_APPLICATION_CREDENTIALS environment variable before running your app.

 ...
 -Dexec.args="--runner=DataflowRunner \
 --gcpTempLocation=gs://bucket/tmp \
 --zone=bucket-zone \
 ...
like image 2
Eugene Avatar answered Oct 22 '22 00:10

Eugene