Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dataflow job run failing when templateLocation argument is set

Tags:

Dataflow job is failing with below exception when I pass parameters staging,temp & output GCS bucket locations.

Java code:

final String[] used = Arrays.copyOf(args, args.length + 1); 
used[used.length - 1] = "--project=OVERWRITTEN"; final T options = 
PipelineOptionsFactory.fromArgs(used).withValidation().as(clazz); 
options.setProject(PROJECT_ID); 
options.setStagingLocation("gs://abc/staging/"); 
options.setTempLocation("gs://abc/temp"); 
options.setRunner(DataflowRunner.class); 
options.setGcpTempLocation("gs://abc");

The error:

INFO: Staging pipeline description to gs://ups-heat-dev- tmp/mniazstaging_ingest_validation/staging/
May 10, 2018 11:56:35 AM org.apache.beam.runners.dataflow.util.PackageUtil tryStagePackage
INFO: Uploading <42088 bytes, hash E7urYrjAOjwy6_5H-UoUxA> to gs://ups-heat-dev-tmp/mniazstaging_ingest_validation/staging/pipeline-E7urYrjAOjwy6_5H-UoUxA.pb
Dataflow SDK version: 2.4.0
May 10, 2018 11:56:38 AM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: Printed job specification to gs://ups-heat-dev-tmp/mniazstaging_ingest_validation/templates/DataValidationPipeline
May 10, 2018 11:56:40 AM org.apache.beam.runners.dataflow.DataflowRunner run
INFO: Template successfully created.
Exception in thread "main" java.lang.NullPointerException
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.getJobWithRetries(DataflowPipelineJob.java:501)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.getStateWithRetries(DataflowPipelineJob.java:477)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:312)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:248)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:202)
    at org.apache.beam.runners.dataflow.DataflowPipelineJob.waitUntilFinish(DataflowPipelineJob.java:195)
    at com.example.DataValidationPipeline.main(DataValidationPipeline.java:66)
like image 614
Mohammed Niaz Avatar asked May 10 '18 06:05

Mohammed Niaz


1 Answers

I was also facing the same issue, the error was throwing at p.run().waitForFinish();. Then I have tried following code

   PipelineResult result = p.run();
   System.out.println(result.getState().hasReplacementJob());
   result.waitUntilFinish();

This was throwing the following exception

    java.lang.UnsupportedOperationException: The result of template creation should not be used.
    at org.apache.beam.runners.dataflow.util.DataflowTemplateJob.getState (DataflowTemplateJob.java:67)

Then to fix the issue I used the following code

    PipelineResult result = pipeline.run();
    try {
        result.getState();
        result.waitUntilFinish();
    } catch (UnsupportedOperationException e) {
       // do nothing
    } catch (Exception e) {
        e.printStackTrace();
    }
like image 125
SANN3 Avatar answered Sep 28 '22 18:09

SANN3