Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Scheduler vs Standalone Scheduler in the Spark Stack

What is the difference between the scheduler in the Spark core and the Standalone Scheduler in the following Spark Stack (From Learning Spark: Lightning-Fast Big Data Analysis book)?

enter image description here

like image 630
OmG Avatar asked Oct 14 '25 06:10

OmG


1 Answers

The difference between these two is highlighted in the overview of spark scheduling:

Spark has several facilities for scheduling resources between computations. First, recall that, as described in the cluster mode overview, each Spark application (instance of SparkContext) runs an independent set of executor processes. The cluster managers that Spark runs on provide facilities for scheduling across applications. Second, within each Spark application, multiple “jobs” (Spark actions) may be running concurrently if they were submitted by different threads. This is common if your application is serving requests over the network. Spark includes a fair scheduler to schedule resources within each SparkContext.

In short: on the one hand, what you referred to as standalone scheduler is at cluster manager level and deals with resources for multiple spark applications. On the other hand, multiple jobs within an application are managed by what you referred to as the spark core scheduler.

like image 154
ernest_k Avatar answered Oct 16 '25 23:10

ernest_k



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!