What is the difference between the scheduler in the Spark core and the Standalone Scheduler in the following Spark Stack (From Learning Spark: Lightning-Fast Big Data Analysis book)?
The difference between these two is highlighted in the overview of spark scheduling:
Spark has several facilities for scheduling resources between computations. First, recall that, as described in the cluster mode overview, each Spark application (instance of SparkContext) runs an independent set of executor processes. The cluster managers that Spark runs on provide facilities for scheduling across applications. Second, within each Spark application, multiple “jobs” (Spark actions) may be running concurrently if they were submitted by different threads. This is common if your application is serving requests over the network. Spark includes a fair scheduler to schedule resources within each SparkContext.
In short: on the one hand, what you referred to as standalone scheduler is at cluster manager level and deals with resources for multiple spark applications. On the other hand, multiple jobs within an application are managed by what you referred to as the spark core scheduler.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With