How jobs are assigned to executors in Spark Streaming?

1 Answers

Actually, in the current implementation of Spark Streaming and under default configuration, only job is active (i.e. under execution) at any point of time. So if one batch's processing takes longer than 10 seconds, then then next batch's jobs will stay queued.

This can be changed with an experimental Spark property "spark.streaming.concurrentJobs" which is by default set to 1. Its not currently documented (maybe I should add it).

The reason it is set to 1 is that concurrent jobs can potentially lead to weird sharing of resources and which can make it hard to debug the whether there is sufficient resources in the system to process the ingested data fast enough. With only 1 job running at a time, it is easy to see that if batch processing time < batch interval, then the system will be stable. Granted that this may not be the most efficient use of resources under certain conditions. We definitely hope to improve this in the future.

There is a little bit of material regarding the internals of Spark Streaming in this meetup slides (sorry, about the shameless self advertising :) ). That may be useful to you.

176

answered Jan 01 '23 19:01

Tathagata Das

Related questions
                            
                                Distributed Job scheduling, management, and reporting
                            
                                Implementing first fit like algorithm
                            
                                Windows Task Scheduler Doesn't Run VBScript
                            
                                Play Framework: Impact of Jobs on the stateless model
                            
                                Running a Cron job daily from 6 am to 11:30 pm [closed]
                            
                                What is exactly mean by 'DisallowConcurrentExecution' in Quartz.net
                            
                                What's best practice for HA gearman job servers
                            
                                How to tell Condor to dispatch jobs only to machines on the cluster, that have "numpy" installed on them?
                            
                                CPU Scheduling : Finding burst time
                            
                                Android JobScheduler running way too often when using setPeriodic()
                            
                                Combining Quartz.Net with UI
                            
                                Meaning of pending machine in autosys
                            
                                Maximize profit in scheduling unit tasks with dependencies
                            
                                GcmNetworkManager scheduling issues
                            
                                when a quartz job fires, is it a new job class instance?
                            
                                What is a good Sidekiq-like job system for node.js?
                            
                                How to best run Apache Airflow tasks on a Kubernetes cluster?
                            
                                How to check whether Quartz cron job is running?
                            
                                Design a generic job scheduler [closed]
                            
                                Android JobScheduler onStartJob called multiple times

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How jobs are assigned to executors in Spark Streaming?

Tags:

apache-spark

executor

job-scheduling

gprivitera

People also ask

1 Answers

Tathagata Das

Recent Activity

Donate For Us