Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are failed tasks resubmitted in Apache Spark?

Tags:

apache-spark

Are failed tasks automatically resubmitted in Apache Spark to the same or another executor?

like image 309
poiuytrez Avatar asked Oct 08 '14 14:10

poiuytrez


People also ask

What happens when Spark job fails?

If an executor runs into memory issues, it will fail the task and restart where the last task left off. If that task fails after 3 retries (4 attempts total by default) then that Stage will fail and cause the Spark job as a whole to fail.

How do I manage failed tasks in spark?

And stages are divided into tasks. When you have failed tasks, you need to find the Stage that the tasks belong to. To do this, click on Stages in the Spark UI and then look for the Failed Stages section at the bottom of the page. If an executor runs into memory issues, it will fail the task and restart where the last task left off.

How do I find a failed stage in spark?

To do this, click on Stages in the Spark UI and then look for the Failed Stages section at the bottom of the page. If an executor runs into memory issues, it will fail the task and restart where the last task left off.

Why can't I replicate a spark job on one node?

The reason's because of which spark would re-submit your task could be: Executor not responding (no heartbeat signal) or data corrupted etc. which you can not replicate on a single node. What you can try to do is on a hadoop cluster you submit a job and restart one of the node which would make your executor non-responding.


2 Answers

I believe failed tasks are resubmitted because I have seen the same failed task submitted multiple times on the Web UI. However, if the same task fails multiple times, the full job fail:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 120 in stage 91.0 failed 4 times, most recent failure: Lost task 120.3 in stage 91.0
like image 67
poiuytrez Avatar answered Oct 19 '22 14:10

poiuytrez


Yes, but there is a parameter set for the max number of failures

spark.task.maxFailures  4   Number of individual task failures before giving up on the job. Should be greater than or equal to 1. Number of allowed retries = this value - 1.
like image 33
chanllen Avatar answered Oct 19 '22 16:10

chanllen