Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distributed task queues (Ex. Celery) vs crontab scripts

I'm having trouble understanding the purpose of 'distributed task queues'. For example, python's celery library.

I know that in celery, the python framework, you can set timed windows for functions to get executed. However, that can also be easily done in a linux crontab directed at a python script.

And as far as I know, and shown from my own django-celery webapps, celery consumes much more RAM memory than just setting up a raw crontab. Few hundred MB difference for a relatively small app.

Can someone please help me with this distinction? Perhaps a high level explanation of how task queues / crontabs work in general would be nice also.

Thank you.

like image 843
Lucas Ou-Yang Avatar asked Apr 26 '13 09:04

Lucas Ou-Yang


People also ask

What is Celery crontab?

The Celery crontab is a time based job scheduler. It schedules tasks to run at fixed times, dates or even intervals in an elegant, flexible manner. The Celery implementation of crontab heavily borrows from the Unix cron which is extremely efficient at all matters scheduling.

What is Celery in message queue?

Celery is an open source asynchronous task queue or job queue which is based on distributed message passing. While it supports scheduling, its focus is on operations in real time. Celery. Stable release. 5.2.3 / December 29, 2021.


1 Answers

It depends what you want your tasks to be doing, if you need to distribute them, and how you want to manage them.

A crontab is capable of executing a script every N intervals. It runs, and then returns. Essentially you get a single execution each interval. You could just direct a crontab to execute a django management command and get access to the entire django environment, so celery doesn't really help you there.

What celery brings to the table, with the help of a message queue, is distributed tasks. Many servers can join the pool of workers and each receive a work item without fear of double handling. It's also possible to execute a task as soon as it is ready. With cron, you're limited to a minimum of one minute.

As an example, imagine you've just launched a new web application and you're receiving hundreds of sign ups that require an email to be sent to each user. Sending an email may take a long time (comparatively) so you decide that you'll handle activation emails via tasks.

If you were using cron, you'd need to ensure that every minute cron is able to process all of the emails that need to be sent. If you have several servers you now need to make sure that you aren't sending multiple activation emails to the same user - you need some kind of synchronization.

With celery, you add a task to the queue. You may have several workers per server so you've already scaled ahead of a cronjob. You may also have several servers allowing you to scale even more. Synchronization is handled as part of the 'queue'.

You can use celery as a cron replacement but that's not really its primary use. It is used for farming out asynchronous tasks across a distributed cluster.

And of course, celery has a big list of features that cron does not.

like image 112
Josh Smeaton Avatar answered Sep 20 '22 21:09

Josh Smeaton