I am trying to follow this simple Dataflow example from google cloud site.
I have successfully installed the dataflow pipeline plugin and gcloud SDK (as well as Python 2.7). I have also set up a project on google cloud and enabled billing and all the necessary API's - as specified in the instructions above.
However, when I go to the run configurations and change the Pipeline Arguments tab to select BlockingDataflowPipelineRunner, after entering creating a bucket and setting my project-id, hitting run gives me:
Caused by: java.lang.IllegalArgumentException: Output path does not exist or is not writeable: gs://my-cloud-dataflow-bucket
at com.google.cloud.dataflow.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
at com.google.cloud.dataflow.sdk.util.DataflowPathValidator.verifyPathIsAccessible(DataflowPathValidator.java:79)
at com.google.cloud.dataflow.sdk.util.DataflowPathValidator.validateOutputFilePrefixSupported(DataflowPathValidator.java:62)
at com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.fromOptions(DataflowPipelineRunner.java:255)
at com.google.cloud.dataflow.sdk.runners.BlockingDataflowPipelineRunner.fromOptions(BlockingDataflowPipelineRunner.java:82)
... 9 more
I have used my terminal to execute 'gcloud auth login' and I see in the browser that I am successfully logged in.
I am really not sure what I have done wrong here. Can anyone confirm if this is a known issue with using dataflow pipeline and google buckets?
Thanks!
I had a similar issue with GCS bucket permissions, though I certainly had write permissions and I could upload files into the bucket. What solved the problem for me was acquiring roles/dataflow.admin permission for the project I was submitting the pipeline to.
When submitting pipelines to the Google Cloud Dataflow Service, the pipeline runner on your local machine uploads files, which are necessary for execution in the cloud, to a "staging location" in Google Cloud Storage.
The pipeline runner on your local machine seems to be unable to write the required files to the staging location provided (gs://my-cloud-dataflow-bucket
). It could be that the location doesn't exist, or that it belongs to a different GCP project than you authenticated against, or that there are more specific permissions set on that bucket, etc.
You can start debugging the issue via gsutil
command-line too. For example, try running gsutil ls gs://my-cloud-dataflow-bucket
to attempt to list the contents of the bucket. Then, try to upload via gsutil cp
command. This will perhaps produce enough information to root-cause the issue you are facing.
Try to provide zone
parameter, it works in my case with similar error. And of course export GOOGLE_APPLICATION_CREDENTIALS
environment variable before running your app.
...
-Dexec.args="--runner=DataflowRunner \
--gcpTempLocation=gs://bucket/tmp \
--zone=bucket-zone \
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With