C++ library to build up execution pipeline

Question

I have been searching for a re-usable execution pipeline library in C++ (job scheduler library?). I could not find anything within Boost. So I eventually found out two candidates:

google-concurrency-library
libpipeline

Am I missing any other candidates ? Has anyone used them ? How good are they with regard to parallel io and multithreading ? Those libraries still seems to be missing dependencies handling. For instance it does not seems clear to me how one would write something like:

$ cat /dev/urandom | tr P Q | head -3

In this very simple case, pipeline is walked bottom up, and the first cat stops executing when head process stops pulling.

However I do not see how I can benefit from multi-threading and or parallel io in case such as:

$ cat /raid1/file1 /raid2/file2 | tr P Q > /tmp/file3

There is no way for me to say: execute tr on 7 threads when 8 processors available.

ern0 · Accepted Answer

What are you looking for is a dataflow framework. Pipeline is a specialized form of dataflow, where all components have 1 consumer and 1 producer.

Boost supports dataflow, but unfortunatelly, I'm not familiar with Boost. Here's the link: http://dancinghacker.com/code/dataflow/dataflow/introduction/dataflow.html

Anyway, you should write your components as separate programs and use Unix pipes. Especially, if your data characteristic is (or can be easily transform into) lines/text.

Also an option is to write your own dataflow thing. It's not too hard, especially, when you have restrictions (I mean pipe: 1-consumer/1-producer), you should not implement a full dataflow framework. Piping is just about binding some kind of functions together, passing one's result into next one's arg. A dataflow framework is about a component interface/pattern and a bind technique. (It's fun, I've written one.)

Markus Schumann · Answer

I would give Threading Building Blocks http://threadingbuildingblocks.org/ a try. It is open source and cross plattform. The Wikipedia article is pretty good: http://en.wikipedia.org/wiki/Intel_Threading_Building_Blocks

Erich Gubler · Answer

I just read today about RaftLib, which uses templates and classes to create pipeline elements called "kernels". It allows for a serial pipeline like the Bash example you've shown, in addition to parallel dataflows. From the Hello world example on the front page:

#include <raft>
#include <raftio>
#include <cstdlib>
#include <string>

class hi : public raft::kernel
{
public:
    hi() : raft::kernel()
    {
       output.addPort< std::string >( "0" ); 
    }

    virtual raft::kstatus run()
    {
        output[ "0" ].push( std::string( "Hello World
" ) );
        return( raft::stop ); 
    }
};


int
main( int argc, char **argv )
{
    /** instantiate print kernel **/
    raft::print< std::string > p;
    /** instantiate hello world kernel **/
    hi hello;
    /** make a map object **/
    raft::map m;
    /** add kernels to map, both hello and p are executed concurrently **/
    m += hello >> p;
    /** execute the map **/
    m.exe();
    return( EXIT_SUCCESS );
}

C++ library to build up execution pipeline

Tags:

c++

pipeline

malat

3 Answers

ern0

Markus Schumann

Erich Gubler

Recent Activity

Donate For Us

C++ library to build up execution pipeline

Tags:

c++

pipeline

malat

3 Answers

ern0

Markus Schumann

Erich Gubler

Related questions

Recent Activity

Donate For Us