I'm new to Spark and am trying to understand the log output of its stages on my terminal. I'm working with a very large data set on my local machine and during actions, I'll see something like:
[Stage: 4 ==> (10 + 4) / 200]
I understand that stages are all the operations that happen to the RDD, but what about the numbers at the end? Do they represent tasks?
(10 + 4) / 200]
10
the number of tasks completed?4
the number of concurrent tasks running (ie the number of cores on my machine?)200
the total number of tasks for this stage?It's called a Console Progress Bar. For the mentioned stage, here's what the numbers mean,
[(numCompletedTasks + numActiveTasks) / totalNumOfTasksInThisStage]
Hope this helps, Cheers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With