Distributed Celery scheduler

Tags:

I'm looking for a distributed cron-like framework for Python, and found Celery. However, the docs says "You have to ensure only a single scheduler is running for a schedule at a time, otherwise you would end up with duplicate tasks", Celery is using celery.beat.PersistentScheduler which store the schedule to a local file.

So, my question, is there another implementation than the default that can put the schedule "into the cluster" and coordinate task execution so that each task is only run once? My goal is to be able to run celerybeat with identical schedules on all hosts in the cluster.

Thanks

914

asked Aug 10 '11 13:08

Jonas Bergström

1 Answers

tl;dr: No Celerybeat is not suitable for your use case. You have to run just one process of celerybeat, otherwise your tasks will be duplicated.

I know this is a very old question. I will try to make a small summary because I have the same problem/question (in the year 2018).

Some background: We're running Django application (with Celery) in the Kubernetes cluster. Cluster (EC2 instances) and Pods (~containers) are autoscaled: simply said, I do not know when and how many instances of the application are running.

It's your responsibility to run only one process of the celerybeat, otherwise, your tasks will be duplicated. [1] There was this feature request in the Celery repository: [2]

Requiring the user to ensure that only one instance of celerybeat exists across their cluster creates a substantial implementation burden (either creating a single point-of-failure or encouraging users to roll their own distributed mutex).

celerybeat should either provide a mechanism to prevent inadvertent concurrency, or the documentation should suggest a best-practice approach.

After some time, this feature request was rejected by the author of Celery for lack of resources. [3] I highly recommend reading the entire thread on the Github. People there recommend these project/solutions:

https://github.com/ybrs/single-beat
https://github.com/sibson/redbeat
Use locking mechanism (http://docs.celeryproject.org/en/latest/tutorials/task-cookbook.html#ensuring-a-task-is-only-executed-one-at-a-time)

I did not try anything from the above (I do not want another dependency in my app and I do not like locking tasks /you need to deal with fail-over etc./).

I ended up using CronJob in Kubernetes (https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/).

[1] celerybeat - multiple instances & monitoring

[2] https://github.com/celery/celery/issues/251

[3] https://github.com/celery/celery/issues/251#issuecomment-228214951

163

answered Oct 19 '22 22:10

illagrenan

Related questions
                            
                                Automate firefox with python? [closed]
                            
                                Python's re module - saving state?
                            
                                How to print tuples of unicode strings in original language (not u'foo' form)
                            
                                Syntax Highlight for Mako in Eclipse or TextMate?
                            
                                Connecting to APNS for iPhone Using Python
                            
                                Nicely representing a floating-point number in python [duplicate]
                            
                                Python: How can I use Twisted as the transport for SUDS?
                            
                                Getting and trapping HTTP response using Mechanize in Python
                            
                                How do I write to the apache log files when using mod_wsgi
                            
                                SQLAlchemy declarative syntax with autoload (reflection) in Pylons
                            
                                Running Python from a virtualenv with Apache/mod_wsgi, on Windows
                            
                                Autodoc params?
                            
                                Can't install a Python package
                            
                                How to use a string as stdin
                            
                                Continuous 3D plotting (i.e. figure update) using python-matplotlib?
                            
                                How to scrape HTTPS javascript web pages
                            
                                how to transform a OpenCV cvMat back to ndarray in numpy ？
                            
                                How and why does PyCharm alter Python's import logic?
                            
                                How are basic data types (strings and integers) implemented in Python and Perl
                            
                                Recursive directory download with Paramiko?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Distributed Celery scheduler

Tags:

python

celery

Jonas Bergström

People also ask

1 Answers

illagrenan

Recent Activity

Donate For Us