Spark Scheduler vs Standalone Scheduler in the Spark Stack

Question

What is the difference between the scheduler in the Spark core and the Standalone Scheduler in the following Spark Stack (From Learning Spark: Lightning-Fast Big Data Analysis book)?

enter image description here

ernest_k · Accepted Answer

The difference between these two is highlighted in the overview of spark scheduling:

Spark has several facilities for scheduling resources between computations. First, recall that, as described in the cluster mode overview, each Spark application (instance of SparkContext) runs an independent set of executor processes. The cluster managers that Spark runs on provide facilities for scheduling across applications. Second, within each Spark application, multiple “jobs” (Spark actions) may be running concurrently if they were submitted by different threads. This is common if your application is serving requests over the network. Spark includes a fair scheduler to schedule resources within each SparkContext.

In short: on the one hand, what you referred to as standalone scheduler is at cluster manager level and deals with resources for multiple spark applications. On the other hand, multiple jobs within an application are managed by what you referred to as the spark core scheduler.

Spark Scheduler vs Standalone Scheduler in the Spark Stack

Tags:

architecture

apache-spark

OmG

1 Answers

ernest_k

Recent Activity

Donate For Us

Spark Scheduler vs Standalone Scheduler in the Spark Stack

Tags:

architecture

apache-spark

OmG

1 Answers

ernest_k

Related questions

Recent Activity

Donate For Us