I am using YARN environment to run spark programs,
with option --master yarn-cluster
.
When I open a spark application's application master, I saw a lot of Scheduler Delay
in a stage. Some of them are even more than 10 minutes. I wonder what are they and why it takes so long?
Update: Usually operations like aggregateByKey take much more time (i.e. scheduler delay) before executors really start doing tasks. Why is it?
Scheduler DelaySpark relies on data locality and tries to execute tasks as close to the data as possible to minimize data transfer. Task location can either be a host or a pair of a host and an executor. If an available executor does not satisfy its data locality, it keeps waiting until a timeout is reached.
Summary metrics for all task are represented in a table and in a timeline. Tasks deserialization time. Duration of tasks. GC time is the total JVM garbage collection time. Result serialization time is the time spent serializing the task result on an executor before sending it back to the driver.
As a core component of data processing platform, scheduler is responsible for schedule tasks on compute units. Built on a Directed Acyclic Graph (DAG) compute model, Spark Scheduler works together with Block Manager and Cluster Backend to efficiently utilize cluster resources for high performance of various workloads.
When you click on a job on the summary page, you see the details page for that job. The details page further shows the event timeline, DAG visualization, and all stages of the job. When you click on a specific job, you can see the detailed information of this job.
It shows this tooltip: Scheduler delay includes time to ship the task from the scheduler to the executor, and time to send the task result from the executor to the scheduler. If scheduler delay is large, consider decreasing the size of tasks or decreasing the size of task results.
Spark includes a fair scheduler to schedule resources within each SparkContext. When running on a cluster, each Spark application gets an independent set of executor JVMs that only run tasks and store data for that application.
Spark’s scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users). By default, Spark’s scheduler runs jobs in FIFO fashion.
Result serialization time is the time spent serializing the task result on an executor before sending it back to the driver. Getting result time is the time that the driver spends fetching task results from workers. Scheduler delay is the time the task waits to be scheduled for execution.
Open the "Show Additional Metrics" (click the right-pointing triangle so it points down) and mouse over the check box for "Scheduler Delay". It shows this tooltip:
Scheduler delay includes time to ship the task from the scheduler to the executor, and time to send the task result from the executor to the scheduler. If scheduler delay is large, consider decreasing the size of tasks or decreasing the size of task results.
The scheduler is part of the master that divides the job into stages of tasks and works with the underlying cluster infrastructure to distribute them around the cluster.
Have a look at TaskSetManager's class comment:
..Schedules the tasks within a single TaskSet in the TaskSchedulerImpl. This class keeps track of each task, retries tasks if they fail (up to a limited number of times), and handles locality-aware scheduling for this TaskSet via delay scheduling...
I assume it is the result of the following paper, on which Matei Zaharia was working (co-founder and Chief Technologist of Databricks which develop Spark) ,too: https://cs.stanford.edu/~matei/
Thus, Spark is checking the partition's locality of a pending task. If the locality-level is low (e.g. not on local jvm) the task gets not directly killed or ignored, Instead it gets a launch delay, which is fair.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With