Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ library to build up execution pipeline

Tags:

c++

pipeline

I have been searching for a re-usable execution pipeline library in C++ (job scheduler library?). I could not find anything within Boost. So I eventually found out two candidates:

  • google-concurrency-library
  • libpipeline

Am I missing any other candidates ? Has anyone used them ? How good are they with regard to parallel io and multithreading ? Those libraries still seems to be missing dependencies handling. For instance it does not seems clear to me how one would write something like:

$ cat /dev/urandom | tr P Q | head -3

In this very simple case, pipeline is walked bottom up, and the first cat stops executing when head process stops pulling.

However I do not see how I can benefit from multi-threading and or parallel io in case such as:

$ cat /raid1/file1 /raid2/file2 | tr P Q > /tmp/file3

There is no way for me to say: execute tr on 7 threads when 8 processors available.

like image 727
malat Avatar asked Mar 08 '13 15:03

malat


3 Answers

What are you looking for is a dataflow framework. Pipeline is a specialized form of dataflow, where all components have 1 consumer and 1 producer.

Boost supports dataflow, but unfortunatelly, I'm not familiar with Boost. Here's the link: http://dancinghacker.com/code/dataflow/dataflow/introduction/dataflow.html

Anyway, you should write your components as separate programs and use Unix pipes. Especially, if your data characteristic is (or can be easily transform into) lines/text.

Also an option is to write your own dataflow thing. It's not too hard, especially, when you have restrictions (I mean pipe: 1-consumer/1-producer), you should not implement a full dataflow framework. Piping is just about binding some kind of functions together, passing one's result into next one's arg. A dataflow framework is about a component interface/pattern and a bind technique. (It's fun, I've written one.)

like image 63
ern0 Avatar answered Nov 09 '22 13:11

ern0


I would give Threading Building Blocks http://threadingbuildingblocks.org/ a try. It is open source and cross plattform. The Wikipedia article is pretty good: http://en.wikipedia.org/wiki/Intel_Threading_Building_Blocks

like image 2
Markus Schumann Avatar answered Nov 09 '22 13:11

Markus Schumann


I just read today about RaftLib, which uses templates and classes to create pipeline elements called "kernels". It allows for a serial pipeline like the Bash example you've shown, in addition to parallel dataflows. From the Hello world example on the front page:

#include <raft>
#include <raftio>
#include <cstdlib>
#include <string>

class hi : public raft::kernel
{
public:
    hi() : raft::kernel()
    {
       output.addPort< std::string >( "0" ); 
    }

    virtual raft::kstatus run()
    {
        output[ "0" ].push( std::string( "Hello World\n" ) );
        return( raft::stop ); 
    }
};


int
main( int argc, char **argv )
{
    /** instantiate print kernel **/
    raft::print< std::string > p;
    /** instantiate hello world kernel **/
    hi hello;
    /** make a map object **/
    raft::map m;
    /** add kernels to map, both hello and p are executed concurrently **/
    m += hello >> p;
    /** execute the map **/
    m.exe();
    return( EXIT_SUCCESS );
}
like image 2
Erich Gubler Avatar answered Nov 09 '22 11:11

Erich Gubler