Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery unable to insert job. Workflow failed

I need to run a batch job from GCS to BigQuery via Dataflow and Beam. All my files are avro with the same schema. I've created a dataflow java application that is successful on a smaller set of data (~1gb, about 5 files). But when I try to run it on a bigger set of data ( >500gb, >1000 files), i receive an error message

java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: Failed to create load job with id prefix 1b83679a4f5d48c5b45ff20b2b822728_6e48345728d4da6cb51353f0dc550c1b_00001_00000, reached max retries: 3, last failed load job: ...

After 3 retries it terminates with:

Workflow failed. Causes: S57....... A work item was attempted 4 times without success....

This step is the load to BigQuery.

Stack Driver says the processing is stuck in step ....for 10m00s... and

Request failed with code 409, performed 0 retries due to IOExceptions, performed 0 retries due to unsuccessful status codes.....

I looked up the 409 error code stating that I might have an existing job, dataset, or table. I've removed all the tables and re-ran the application but it still shows the same error message.

I am currently limited on 65 workers and I have them using n1-standard-4 cpus.

I believe there are other ways to move the data from gcs to bq, but i need to demonstrate dataflow.

like image 871
andrew Avatar asked Nov 18 '22 15:11

andrew


1 Answers

"java.lang.RuntimeException: Failed to create job with prefix beam_load_csvtobigqueryxxxxxxxxxxxxxx, reached max retries: 3, last failed job: null. at org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJob.runJob(BigQueryHelpers.java:198)..... "

  • One of the possible cause could be the privilege issue. Ensure the user account which interacts with the BigQuery has privilege "bigquery.jobs.create" in the predefined role "*BigQuery User"
like image 178
Muthu Avatar answered Dec 19 '22 15:12

Muthu