Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Spark terminal output during stages [duplicate]

Tags:

apache-spark

I'm new to Spark and am trying to understand the log output of its stages on my terminal. I'm working with a very large data set on my local machine and during actions, I'll see something like:

[Stage: 4 ==>           (10 + 4) / 200]

I understand that stages are all the operations that happen to the RDD, but what about the numbers at the end? Do they represent tasks?

(10 + 4) / 200] 
  • 10 the number of tasks completed?
  • 4 the number of concurrent tasks running (ie the number of cores on my machine?)
  • 200 the total number of tasks for this stage?
like image 705
SVT Avatar asked Oct 22 '16 21:10

SVT


1 Answers

It's called a Console Progress Bar. For the mentioned stage, here's what the numbers mean,

[(numCompletedTasks + numActiveTasks) / totalNumOfTasksInThisStage]

Hope this helps, Cheers.

like image 163
Chitral Verma Avatar answered Oct 20 '22 05:10

Chitral Verma