Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Celery - running a set of tasks with complex dependencies

In the application I'm working on, a user can perform a "transition" which consists of "steps". A step can have an arbitrary number of dependencies on other steps. I'd like to be able to call a transition and have the steps execute in parallel as separate Celery tasks.

Ideally, I'd like something along the lines of celery-tasktree, except for directed acyclic graphs in general, rather than only trees, but it doesn't appear that such a library exists as yet.

The first solution that comes to mind is a parallel adaptation of a standard topological sort - rather than determining a linear ordering of steps which satisfy the dependency relation, we determine the entire set of steps that can be executed in parallel at the beginning, followed by the entire set of steps that can be executed in round 2, and so on.

However, this is not optimal when tasks take a variable amount of time and workers have to idle waiting for a longer running task while there are tasks that are now ready to run. (For my specific application, this solution is probably fine for now, but I'd still like to figure out how to optimise this.)

As noted in https://cs.stackexchange.com/questions/2524/getting-parallel-items-in-dependency-resolution, a better way is operate directly off the DAG - after each task finishes, check whether any of its dependent tasks are now able to run, and if so, run them.

What would be the best way to go about implementing something like this? It's not clear to me that there's an easy way to do this.

From what I can tell, Celery's group/chain/chord primitives aren't flexible enough to allow me to express a full DAG - though I might be wrong here?

I'm thinking I could create a wrapper for tasks which notifies dependent tasks once the current task finishes - I'm not sure what the best way to handle such a notification would be though. Accessing the application's Django database isn't particularly neat, and would make it hard to spin this out into a generic library, but Celery itself doesn't provide obvious mechanisms for this.

like image 622
ajd Avatar asked Oct 31 '22 06:10

ajd


1 Answers

I also faced this problem but i couldn't really find a better solution or library except for one library, For anyone still interested, you can checkout https://github.com/selinon/selinon. Although its only for python 3, It seems to be the only thing that does exactly what you want.

Airflow is another option but airflow is used in a more static environment just like other dag libraries.

like image 200
daniels Avatar answered Nov 13 '22 04:11

daniels