How to recover from Cloud Dataflow job failed on com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone

Question

My Cloud Dataflow job, after running for 4 hours, mysteriously failed because a worker is throwing this exception four times (in a span of an hour). The exception stack looks like this.

java.io.IOException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone { "code" : 500, "errors" : [ { "domain" : "global", "message" : "Backend Error", "reason" : "backendError" } ], "message" : "Backend Error" }

at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:431)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:289)
at com.google.cloud.dataflow.sdk.io.FileBasedSink$FileBasedWriter.close(FileBasedSink.java:516)
at com.google.cloud.dataflow.sdk.io.FileBasedSink$FileBasedWriter.close(FileBasedSink.java:419)
at com.google.cloud.dataflow.sdk.io.Write$Bound$2.finishBundle(Write.java:201) Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone { "code" : 500, "errors" : [ { "domain" : "global", "message" : "Backend Error", "reason" : "backendError" } ], "message" : "Backend Error" }
at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:357)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

None of the class in the stacktrace is from my job directly, so I cannot even catch and recover.

I checked my region, Cloud storage (owned by the same project) etc, they are all OK. Other workers were also running fine. Looks like some kind of bug in Dataflow? If nothing else I really would like to know how to recover from this: the job spend 30+ hours in totally and now produced a bunch of temp files that I don't know how complete they are... If I re-run the job I am concerned that it would fail again.

The job id is 2016-08-25_21_50_44-3818926540093331568 , for the Google folks. Thanks!!

Sam McVeety · Accepted Answer

The solution was to specify withNumShards() on the output with a fixed value < 10000. This is a limitation that we hope to remove in the future.

How to recover from Cloud Dataflow job failed on com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone

Tags:

google-cloud-platform

google-cloud-dataflow

Eric Xu

1 Answers

Sam McVeety

Recent Activity

Donate For Us

How to recover from Cloud Dataflow job failed on com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone

Tags:

google-cloud-platform

google-cloud-dataflow

Eric Xu

1 Answers

Sam McVeety

Related questions

Recent Activity

Donate For Us