Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark-shell meaning of displayed Number on Stage

Tags:

apache-spark

[Stage 5:=====>  (26372 + 264) / 27840] 

[stage 6:=========> (0 + 200 ) / 200 ] 

Hi, I'm using spark 1.6.1.

I use spark-shell to see the data and i want to know the meaning of each number here.

( A + B / C ) 
like image 920
reapasisow Avatar asked May 16 '17 04:05

reapasisow


People also ask

How do I read Spark progress bar?

Towards the end, as the last few tasks execute, B will start decreasing until it reaches 0, at which point A should equal C, the stage is done, and spark moves to the next stage. C will stay constant during the whole time, remember it is the total number of tasks in the stage and never changes.

How number of stages are decided in Spark?

Basically, there are two types of stages in spark- ShuffleMapstage and ResultStage.

What is result stage in Spark?

ResultStage in Spark By running a function on a spark RDD Stage that executes a Spark action in a user program is a ResultStage. It is considered as a final stage in spark. ResultStage implies as a final stage in a job that applies a function on one or many partitions of the target RDD in Spark.

What happens at a stage boundary in Spark?

At each stage boundary, data is written to disk by tasks in the parent stages and then fetched over the network by tasks in the child stage. Because they incur heavy disk and network I/O, stage boundaries can be expensive and should be avoided when possible.

What does the number of tasks per stage mean in spark?

The number of tasks you could see in each stage is the number of partitions that spark is going to work on and each task inside a stage is the same work that will be done by spark but on a different partition of data.

What do the numbers on the progress bar mean in spark?

What do the numbers on the progress bar mean in Spark shell or Spark UI? We are sure you have seen the below progress bar before in either Spark shell or while refreshing the Spark UI as your Spark job execution is in progress. You probably wondered what these numbers mean. Stage 2 – quite simple. Indicates the current stage in execution

What does the stage tab in spark show?

The Stage tab displays a summary page that shows the current state of all stages of all Spark jobs in the spark application The number of tasks you could see in each stage is the number of partitions that spark is going to work on and each task inside a stage is the same work that will be done by spark but on a different partition of data.

What is an spark stage?

Spark Stage- An Introduction to Physical Execution plan Boost your career with Big Data Get Exclusive Offers on Big Data Course!! A stage is nothing but a step in a physical execution plan. It is basically a physical unit of the execution plan.


1 Answers

The meaning of [Stage 5:=====> (26372 + 264) / 27840] is

(numCompletedTasks + numActiveTasks) / totalNumOfTasksInThisStage)
  • Number of Completed Tasks = 26372
  • Number of Active Tasks = 264
  • Total number of tasks in this stages = 27840
like image 161
koiralo Avatar answered Oct 02 '22 20:10

koiralo