Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Missing object or bucket in path when running on Dataflow

When trying to run a pipeline on the Dataflow service, I specify the staging and temp buckets (in GCS) on the command line. When the program executes, I get a RuntimeException before my pipeline runs, where the root cause is that I'm missing something in the path.

Caused by: java.lang.RuntimeException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions) ... Caused by: java.lang.IllegalArgumentException: Missing object or bucket in path: 'gs://df-staging-bucket-57763/', did you mean: 'gs://some-bucket/df-staging-bucket-57763'?

gs://df-staging-bucket-57763/ already exists in my project, and I have access to it. What do I need to add to make this work?

like image 220
Thomas Groh Avatar asked May 31 '17 17:05

Thomas Groh


1 Answers

The DataflowRunner requires that the staging location and temp locations be a location within a bucket rather than the top-level of a bucket. Adding a directory (such as --stagingLocation=gs://df-staging-bucket-57763/staging or --tempLocation=gs://df-staging-bucket-57763/temp) to your arguments (for each of the stagingLocation and gcpTempLocation arguments) will be sufficient to run the pipeline.

like image 191
Thomas Groh Avatar answered Oct 14 '22 15:10

Thomas Groh