Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python based asynchronous workflow modules : What is difference between celery workflow and luigi workflow?

I am using django as a web framework. I need a workflow engine that can do synchronous as well as asynchronous(batch tasks) chain of tasks. I found celery and luigi as batch processing workflow. My first question is what is the difference between these two modules.

Luigi allows us to rerun failed chain of task and only failed sub-tasks get re-executed. What about celery: if we rerun the chain (after fixing failed sub-task code), will it rerun the already succeed sub-tasks?

Suppose I have two sub-tasks. The first one creates some files and the second one reads those files. When I put these into chain in celery, the whole chain fails due to buggy code in second task. What happens when I rerun the chain after fixing the code in second task? Will the first task try to recreate those files?

like image 917
user3343061 Avatar asked Feb 23 '14 11:02

user3343061


People also ask

What is celery workflow?

Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operations but supports scheduling as well. The execution units, called tasks, are executed concurrently on one or more worker servers using multiprocessing, Eventlet, or gevent.

What is Luigi Python?

Luigi is a Python package that manages long-running batch processing, which is the automated running of data processing jobs on batches of items. Luigi allows you to define a data processing job as a set of dependent tasks. For example, task B depends on the output of task A.

Are celery tasks async?

Introduction. Celery is a task queue/job queue based on asynchronous message passing. It can be used as a background task processor for your application in which you dump your tasks to execute in the background or at any given moment. It can be configured to execute your tasks synchronously or asynchronously.

Does Luigi use DAGs?

But unlike Airflow, Luigi doesn't use DAGs. Instead, Luigi refers to “tasks” and “targets.” Targets are both the results of a task and the input for the next task. Luigi has 3 steps to construct a pipeline: requires() defines the dependencies between the tasks.


1 Answers

(I'm the author of Luigi)

Luigi is not meant for synchronous low-latency framework. It's meant for large batch processes that run for hours or days. So I think for your use case, Celery might actually be slightly better

like image 128
Erik Bernhardsson Avatar answered Sep 21 '22 06:09

Erik Bernhardsson