Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DataflowRunner requires gcpTempLocation, but failed to retrieve a value from PipelineOptions

I am creating a demo pipeline to load a CSV file into BigQuery with Dataflow using my free google account. This is what I am facing.

When I read from a GCS file and just log the data, this works perfectly. below is my sample code.

This code runs okay

DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
options.setProject("project12345");
options.setStagingLocation("gs://mybucket/staging");
options.setRunner(DataflowRunner.class);
DataflowRunner.fromOptions(options);
Pipeline p = Pipeline.create(options);
p.apply(TextIO.read().from("gs://mybucket/charges.csv")).apply(ParDo.of(new DoFn<String, Void>() {
            @ProcessElement
            public void processElement(ProcessContext c) {
                LOG.info(c.element());
            }

}));

However, when I add a temp folder location with a path to a bucket I created, I get an error, below is my code.


        LOG.debug("Starting Pipeline");
        DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
        options.setProject("project12345");
        options.setStagingLocation("gs://mybucket/staging");
        options.setTempLocation("gs://project12345/temp");
        options.setJobName("csvtobq");
        options.setRunner(DataflowRunner.class);
    
        DataflowRunner.fromOptions(options);
        Pipeline p = Pipeline.create(options);

        boolean isStreaming = false;
        TableReference tableRef = new TableReference();
        tableRef.setProjectId("project12345");
        tableRef.setDatasetId("charges_data");
        tableRef.setTableId("charges_data_id");

        p.apply("Loading Data from GCS", TextIO.read().from("gs://mybucket/charges.csv"))
                .apply("Convert to BiqQuery Table Row", ParDo.of(new FormatForBigquery()))
                .apply("Write into Data in to Big Query",
                        BigQueryIO.writeTableRows().to(tableRef).withSchema(FormatForBigquery.getSchema())
                                .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
                                .withWriteDisposition(isStreaming ? BigQueryIO.Write.WriteDisposition.WRITE_APPEND
                                        : BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE));

        p.run().waitUntilFinish();
    } 

When I run this, I get the following error

Exception in thread "main" java.lang.IllegalArgumentException: DataflowRunner requires gcpTempLocation, but failed to retrieve a value from PipelineOptions
    at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions(DataflowRunner.java:242)
    at demobigquery.StarterPipeline.main(StarterPipeline.java:74)
Caused by: java.lang.IllegalArgumentException: Error constructing default value for gcpTempLocation: tempLocation is not a valid GCS path, gs://project12345/temp. 
    at org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:247)
    at org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:228)
    at org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:592)
    at org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:533)
    at org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:155)
    at com.sun.proxy.$Proxy15.getGcpTempLocation(Unknown Source)
    at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions(DataflowRunner.java:240)

Is this an issue with authentication?, because I am using JSON credentials as project owner from GCP via Eclipse Dataflow plugin.

Any help would be highly appreciated.

like image 636
IsaacK Avatar asked Dec 14 '25 14:12

IsaacK


2 Answers

Looks like the error message thrown from[1]. The default GCS validator is implemented in[2]. As you can see Beam code also throws cause exception for the IllegalArgumentException. So you can check a stack further for an exception happened in GcsPathValidator.

[1] https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcpOptions.java#L278

[2]https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsPathValidator.java#L29

like image 120
Rui Wang Avatar answered Dec 19 '25 05:12

Rui Wang


There could be multiple reasons for this:

  1. You are not logged in with the right GCP project credentials - Either the wrong user (or there is no logged in user) or the wrong project is being logged into

    Ensure that the GOOGLE_APPLICATION_CREDENTIALS environment variable is for the right user and project. If not obtain the right credentials using

    gcloud auth application-default login

    Download the json, and change the GOOGLE_APPLICATION_CREDENTIALS to the downloaded file. Restart your system and then try again

  2. You could be logging into the right project with the right user ID, but the requisite permissions for bucket access might be absent. Ensure that you have the following accesses:

    • Storage Admin
    • Storage Legacy Bucket Owner
    • Storage Legacy Object Owner (Optional)
  3. The URL you are trying does not exist or is misspelt

like image 34
Nishant Avatar answered Dec 19 '25 06:12

Nishant



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!