When trying to run a pipeline on the Dataflow service, I specify the staging and temp buckets (in GCS) on the command line. When the program executes, I get a RuntimeException before my pipeline runs, where the root cause is that I'm missing something in the path.
Caused by: java.lang.RuntimeException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions)
...
Caused by: java.lang.IllegalArgumentException: Missing object or bucket in path: 'gs://df-staging-bucket-57763/', did you mean: 'gs://some-bucket/df-staging-bucket-57763'?
gs://df-staging-bucket-57763/
already exists in my project, and I have access to it. What do I need to add to make this work?
The DataflowRunner requires that the staging location and temp locations be a location within a bucket rather than the top-level of a bucket. Adding a directory (such as --stagingLocation=gs://df-staging-bucket-57763/staging
or --tempLocation=gs://df-staging-bucket-57763/temp
) to your arguments (for each of the stagingLocation
and gcpTempLocation
arguments) will be sufficient to run the pipeline.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With