Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is scheduler delay in spark UI's event timeline

Tags:

apache-spark

I am using YARN environment to run spark programs, with option --master yarn-cluster.

When I open a spark application's application master, I saw a lot of Scheduler Delay in a stage. Some of them are even more than 10 minutes. I wonder what are they and why it takes so long?

Update: Usually operations like aggregateByKey take much more time (i.e. scheduler delay) before executors really start doing tasks. Why is it?

like image 305
worldterminator Avatar asked Jul 23 '15 04:07

worldterminator


People also ask

What is scheduler delay Spark?

Scheduler DelaySpark relies on data locality and tries to execute tasks as close to the data as possible to minimize data transfer. Task location can either be a host or a pair of a host and an executor. If an available executor does not satisfy its data locality, it keeps waiting until a timeout is reached.

What is task Deserialization time in Spark?

Summary metrics for all task are represented in a table and in a timeline. Tasks deserialization time. Duration of tasks. GC time is the total JVM garbage collection time. Result serialization time is the time spent serializing the task result on an executor before sending it back to the driver.

What are schedulers in Spark?

As a core component of data processing platform, scheduler is responsible for schedule tasks on compute units. Built on a Directed Acyclic Graph (DAG) compute model, Spark Scheduler works together with Block Manager and Cluster Backend to efficiently utilize cluster resources for high performance of various workloads.

Where is the DAG in Spark UI?

When you click on a job on the summary page, you see the details page for that job. The details page further shows the event timeline, DAG visualization, and all stages of the job. When you click on a specific job, you can see the detailed information of this job.

What does scheduler delay include?

It shows this tooltip: Scheduler delay includes time to ship the task from the scheduler to the executor, and time to send the task result from the executor to the scheduler. If scheduler delay is large, consider decreasing the size of tasks or decreasing the size of task results.

What is a fair scheduler in spark?

Spark includes a fair scheduler to schedule resources within each SparkContext. When running on a cluster, each Spark application gets an independent set of executor JVMs that only run tasks and store data for that application.

Is spark scheduler thread safe?

Spark’s scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users). By default, Spark’s scheduler runs jobs in FIFO fashion.

What is result serialization time and scheduler delay?

Result serialization time is the time spent serializing the task result on an executor before sending it back to the driver. Getting result time is the time that the driver spends fetching task results from workers. Scheduler delay is the time the task waits to be scheduled for execution.


2 Answers

Open the "Show Additional Metrics" (click the right-pointing triangle so it points down) and mouse over the check box for "Scheduler Delay". It shows this tooltip:

Scheduler delay includes time to ship the task from the scheduler to the executor, and time to send the task result from the executor to the scheduler. If scheduler delay is large, consider decreasing the size of tasks or decreasing the size of task results.

The scheduler is part of the master that divides the job into stages of tasks and works with the underlying cluster infrastructure to distribute them around the cluster.

like image 142
Dean Wampler Avatar answered Sep 19 '22 13:09

Dean Wampler


Have a look at TaskSetManager's class comment:

..Schedules the tasks within a single TaskSet in the TaskSchedulerImpl. This class keeps track of each task, retries tasks if they fail (up to a limited number of times), and handles locality-aware scheduling for this TaskSet via delay scheduling...

I assume it is the result of the following paper, on which Matei Zaharia was working (co-founder and Chief Technologist of Databricks which develop Spark) ,too: https://cs.stanford.edu/~matei/

Thus, Spark is checking the partition's locality of a pending task. If the locality-level is low (e.g. not on local jvm) the task gets not directly killed or ignored, Instead it gets a launch delay, which is fair.

like image 43
whopp3r Avatar answered Sep 18 '22 13:09

whopp3r