I have the following stages in Spark Web page (used with yarn): <img src="https://i.stack.imgur.com/H6CWY.png" alt="enter image description here"> The thing I'm surprised by the <code>Stage 0</code> retry 1, retry 2. What can cause such a thing? I tried to reproduce it by myself and killed all executor processes (<code>CoarseGrainedExecutorBackend</code>) on one of my cluster machine, but all I got is some failed tasks with the description <code>Resubmitted (resubmitted due to lost executor)</code>. What is the reason of the whole stage retry? And what's I'm curious about is that the number of Records read at each stage attempt was different: <img src="https://i.stack.imgur.com/5ysFw.png" alt="enter image description here"> and <img src="https://i.stack.imgur.com/E4R0L.png" alt="enter image description here"> Notice the <code>3011506</code> in the <code>Attempt 1</code> and <code>195907736</code> in the <code>Attempt 0</code>. Does stage retry cause Spark to re-reads some records twice?

Stage failure might be due to the FetchFailure in Spark Fetch Failure: Reduce task is not able to perform shuffle Read i.e. not able to locate shuffle file at disk written shuffle map task. Spark will retry the stage if stageFailureCount < maxStageFailures otherwise It aborts the stage and corresponding Job. https://youtu.be/rpKjcMoega0?t=1309

What can cause a stage to reattempt in Spark

Tags:

scala

apache-spark

I have the following stages in Spark Web page (used with yarn):

enter image description here

The thing I'm surprised by the Stage 0 retry 1, retry 2. What can cause such a thing?

I tried to reproduce it by myself and killed all executor processes (CoarseGrainedExecutorBackend) on one of my cluster machine, but all I got is some failed tasks with the description Resubmitted (resubmitted due to lost executor).

What is the reason of the whole stage retry? And what's I'm curious about is that the number of Records read at each stage attempt was different:

enter image description here

and

enter image description here

Notice the 3011506 in the Attempt 1 and 195907736 in the Attempt 0. Does stage retry cause Spark to re-reads some records twice?

230

asked Nov 10 '18 08:11

Some Name

1 Answers

Stage failure might be due to the FetchFailure in Spark

Fetch Failure: Reduce task is not able to perform shuffle Read i.e. not able to locate shuffle file at disk written shuffle map task.

Spark will retry the stage if stageFailureCount < maxStageFailures otherwise It aborts the stage and corresponding Job.

https://youtu.be/rpKjcMoega0?t=1309

answered Oct 15 '22 10:10

Shiva Garg

Related questions
                            
                                How to give predicted and label columns in BinaryClassificationMetrics evaluation for Naive Bayes model
                            
                                Scala one-liner to generate MD5 Hash from string
                            
                                Scala Play Json JSResultException Validation Error
                            
                                Prediction.io - pio train fails
                            
                                How to convert RDD to DataFrame in Spark Streaming, not just Spark
                            
                                Gradle Scala Plugin - how to specify zincClasspath
                            
                                Apache Toree and Spark Scala Not Working in Jupyter
                            
                                porting python to scala
                            
                                why scala Map does not implement unapply?
                            
                                How to implement a ScalaTest FunSuite to avoid boilerplate Spark code and import implicits
                            
                                How to log malformed rows from Scala Spark DataFrameReader csv
                            
                                Can the subflows of groupBy depend on the keys they were generated from ?
                            
                                Unsupported literal type class in Apache Spark in scala
                            
                                Scala behaviour when assigning literals or variables to Char
                            
                                How to automatically generate a function to match a sealed case class family with implicit instances?
                            
                                Spark Streaming Guarantee Specific Start Window Time
                            
                                Understanding IO monad in Scala
                            
                                How read table with non utf-8 encoding in aws gllue?
                            
                                Converting Java to Scala, how to deal with calling super class constructor?
                            
                                Working with options in Scala (best practices)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What can cause a stage to reattempt in Spark

Tags:

scala

apache-spark

Some Name

People also ask

1 Answers

Shiva Garg

Recent Activity

Donate For Us