Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Task Scheduling with complex dependencies

I'm looking for a way of scheduling tasks where a task starts once several previous tasks have completed.

I have several hundred "collector" processes which collect data from a variety of sources and dump it to a database. Once these have finished collecting (anywhere from 1 second to a few minutes) I want to immediately kick off a bunch of "data-processing" processes to analyse and make sense of the data in the database. When all of these have finished I want a final task to start and send me an email of the summary data.

I'm currently using a Gearman queue and starting the data-processing tasks on timers once I expect the "collector" processes to have completed, but this means that the processing step starts after 10 minutes, even if the collector processes finished after 3 (or worse, have not yet finished).

Ideally I'd be able to specify specific rules like "start process X when process A and (B or C) complete", or "start process Y when 95% of the specified processes have completed or 10 minutes have elapsed".

The processes and dependencies need to be automatically created as it will be run with different parameters each time (ie. I'm not doing an identical calculation each time).

I could write some kind of graph-dependency framework myself using queues and monitors, but it seems like the sort of thing that must have already been solved and I'm looking for anyone who has used something like I describe.

like image 555
Crashthatch Avatar asked Jul 26 '11 19:07

Crashthatch


1 Answers

"start process X when process A and (B or C) complete"

Why not let worker X launch subworkers A, B and C and wait for them to complete before proceeding? You can have a process X that is both a Gearman worker and a client at the same time.

like image 110
Goran Rakic Avatar answered Nov 15 '22 17:11

Goran Rakic