I have been searching for a re-usable execution pipeline library in C++ (job scheduler library?). I could not find anything within Boost. So I eventually found out two candidates:
Am I missing any other candidates ? Has anyone used them ? How good are they with regard to parallel io and multithreading ? Those libraries still seems to be missing dependencies handling. For instance it does not seems clear to me how one would write something like:
$ cat /dev/urandom | tr P Q | head -3
In this very simple case, pipeline is walked bottom up, and the first cat
stops executing when head
process stops pulling.
However I do not see how I can benefit from multi-threading and or parallel io in case such as:
$ cat /raid1/file1 /raid2/file2 | tr P Q > /tmp/file3
There is no way for me to say: execute tr
on 7 threads when 8 processors available.
What are you looking for is a dataflow framework. Pipeline is a specialized form of dataflow, where all components have 1 consumer and 1 producer.
Boost supports dataflow, but unfortunatelly, I'm not familiar with Boost. Here's the link: http://dancinghacker.com/code/dataflow/dataflow/introduction/dataflow.html
Anyway, you should write your components as separate programs and use Unix pipes. Especially, if your data characteristic is (or can be easily transform into) lines/text.
Also an option is to write your own dataflow thing. It's not too hard, especially, when you have restrictions (I mean pipe: 1-consumer/1-producer), you should not implement a full dataflow framework. Piping is just about binding some kind of functions together, passing one's result into next one's arg. A dataflow framework is about a component interface/pattern and a bind technique. (It's fun, I've written one.)
I would give Threading Building Blocks http://threadingbuildingblocks.org/ a try. It is open source and cross plattform. The Wikipedia article is pretty good: http://en.wikipedia.org/wiki/Intel_Threading_Building_Blocks
I just read today about RaftLib, which uses templates and classes to create pipeline elements called "kernels". It allows for a serial pipeline like the Bash example you've shown, in addition to parallel dataflows. From the Hello world example on the front page:
#include <raft>
#include <raftio>
#include <cstdlib>
#include <string>
class hi : public raft::kernel
{
public:
hi() : raft::kernel()
{
output.addPort< std::string >( "0" );
}
virtual raft::kstatus run()
{
output[ "0" ].push( std::string( "Hello World\n" ) );
return( raft::stop );
}
};
int
main( int argc, char **argv )
{
/** instantiate print kernel **/
raft::print< std::string > p;
/** instantiate hello world kernel **/
hi hello;
/** make a map object **/
raft::map m;
/** add kernels to map, both hello and p are executed concurrently **/
m += hello >> p;
/** execute the map **/
m.exe();
return( EXIT_SUCCESS );
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With