I have n (typically n < 10 but it should scale) processes running on different machines and communicating through amqp using RabbitMQ. Processes are typically long running and may be implemented in any language (though most are java/python).
Each process requires a number of inputs (numbers/strings) and produces a number of outputs (also just numbers or strings). Executing a process happens asynchronously: sending a message on its input queue and waiting for a callback to be triggered by the output queue.
Ideally the user specifies some inputs and desired outputs and the system should:
A node should fire if its input is ready, allowing parallelism per branch. I can assume no cycles for now, but eventually there will be cycles (e.g., two processes may need to iterate until the output no longer changes).
This should be a known problem from (data)flow programming (discussed here before) and I want to avoid re-inventing the wheel. I would prefer a python solution and a search leads to Trellis and Pypes. Trellis is no longer developed but seems to support cycles, while pypes does not. Also not sure how actively developed pypes is.
Further searches reveal a whole list of event based programming frameworks, none of which I am particularly knowledgeable about. There are of course workflow environments like Taverna and KNIME, but that seems overkill.
Does anybody have any experience tackling this type of problem or with the libraries mentioned?
Edit: Other libraries I found are:
python.org has a Wiki page on "Flow Based Programming" -- http://wiki.python.org/moin/FlowBasedProgramming
The bottom line is that if you can reinvent the wheel in a small number of lines of code ( a few hundred) which you completely understand and can document, then do it.
This is an area where the abstractions used are not that hard to implement given some basic foundation tools. RabbitMQ is such a tool. Node.js is another. There are lots of libraries around that implement useful ways to manages dataflows, workflows, finite state machines, etc., but they have a lot of overlap and they tend to be incomplete. Probably the original developer just built enough to get over his initial problem, and since this type of programming was not that popular, there was not the critical mass to keep development going.
There is a lot to be said for ranking all the possible solutions by popularity, picking the most popular one, and putting your effort into making it work (while sharing your work, of course).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With