I am using django as a web framework. I need a workflow engine that can do synchronous as well as asynchronous(batch tasks) chain of tasks. I found celery and luigi as batch processing workflow. My first question is what is the difference between these two modules.
Luigi allows us to rerun failed chain of task and only failed sub-tasks get re-executed. What about celery: if we rerun the chain (after fixing failed sub-task code), will it rerun the already succeed sub-tasks?
Suppose I have two sub-tasks. The first one creates some files and the second one reads those files. When I put these into chain in celery, the whole chain fails due to buggy code in second task. What happens when I rerun the chain after fixing the code in second task? Will the first task try to recreate those files?
Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operations but supports scheduling as well. The execution units, called tasks, are executed concurrently on one or more worker servers using multiprocessing, Eventlet, or gevent.
Luigi is a Python package that manages long-running batch processing, which is the automated running of data processing jobs on batches of items. Luigi allows you to define a data processing job as a set of dependent tasks. For example, task B depends on the output of task A.
Introduction. Celery is a task queue/job queue based on asynchronous message passing. It can be used as a background task processor for your application in which you dump your tasks to execute in the background or at any given moment. It can be configured to execute your tasks synchronously or asynchronously.
But unlike Airflow, Luigi doesn't use DAGs. Instead, Luigi refers to “tasks” and “targets.” Targets are both the results of a task and the input for the next task. Luigi has 3 steps to construct a pipeline: requires() defines the dependencies between the tasks.
(I'm the author of Luigi)
Luigi is not meant for synchronous low-latency framework. It's meant for large batch processes that run for hours or days. So I think for your use case, Celery might actually be slightly better
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With