Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to multithread "tail call" recursion using TBB

I am trying to use tbb to multi-thread an existing recursive algorithm. The single-thread version uses tail-call recursion, structurally it looks something like this:

void my_func() {
    my_recusive_func (0);
}

bool doSomeWork (int i, int& a, int& b, int& c) {
    // do some work
}

void my_recusive_func (int i) {
    int a, b, c;
    bool notDone = doSomeWork (i, a, b, c);
    if (notDone) {
        my_recusive_func (a);
        my_recusive_func (b);
        my_recusive_func (c);
    }
}

I am a tbb novice so my first attempt used the parallel_invoke function:

void my_recusive_func (int i) {
    int a, b, c;
    bool notDone = doSomeWork (i, a, b, c);
    if (notDone) {
        tbb::parallel_invoke (
                [a]{my_recusive_func (a);},
                [b]{my_recusive_func (b);},
                [c]{my_recusive_func (c);});
    }
}

This does work and it runs faster than the single-threaded version but it doesn't seem to scale well with number of cores. The machine I'm targeting has 16 cores (32 hyper-threads) so scalability is very important for this project, but this version only gets about 8 times speedup at best on that machine and many cores seem idle while the algorithm is running.

My theory is that tbb is waiting for the child tasks to complete after the parallel_invoke so there may be many tasks sitting around idle waiting unnecessarily? Would this explain the idle cores? Is there any way to get the parent task to return without waiting for the children? I was thinking perhaps something like this but I don't know enough about the scheduler yet to know if this is OK or not:

void my_func()
{
    tbb::task_group g;
    my_recusive_func (0, g);
    g.wait();
}

void my_recusive_func (int i, tbb::task_group& g) {
    int a, b, c;
    bool notDone = doSomeWork (i, a, b, c);
    if (notDone) {
        g.run([a,&g]{my_recusive_func(a, g);});
        g.run([b,&g]{my_recusive_func(b, g);});
        my_recusive_func (c, g);
    }
}

My first question is is tbb::task_group::run() thread-safe? I couldn't figure that out from the documentation. Also, is there better way to go about this? Perhaps I should be using the low-level scheduler calls instead?

(I typed this code without compiling so please forgive typos.)

like image 814
atb Avatar asked May 24 '14 13:05

atb


2 Answers

There are really two questions here:

  1. Is the TBB implementation of task_group::run thread-safe? Yes. (We should document this more clearly).
  2. Is having many threads invoke method run() on the same task_group scalable? No. (I believe the Microsoft documentation mentions this somewhere.) The reason is that the task_group becomes a centralized point of contention. It's just a fetch-and-add in the implementation, but that's still ultimately unscalable since the affected cache line has to bounce around.

It's generally best to spawn a small number of tasks from a task_group. If using recursive parallelism, give each level its own task_group. Though the performance will likely not be any better than using parallel_invoke.

The low-level tbb::task interfaces is the best bet. You can even code the tail-recursion in that, using the trick where tasK::execute returns a pointer to the tail-call task.

But I'm a bit concerned about the idling threads. I'm wondering if there is enough work to keep the threads busy. Consider doing work-span analysis first. If you are using the Intel compiler (or gcc 4.9) you might try experimenting with a Cilk version first. If that won't speed up, then even the low-level tbb::task interface is unlikely to help, and higher-level issues (work and span) need to be examined.

like image 154
Arch D. Robison Avatar answered Oct 23 '22 17:10

Arch D. Robison


I'm fairly sure tbb::task_group::run() is thread-safe. I can't find a mention in the documentation, which is quite surprising.

However,

  • This 2008 blog post contains a primitive implementation of task_group, whose run() method is clearly noted to be thread-safe. The current implementation is pretty similar.
  • The testing code for tbb::task_group (in src/test/test_task_group.cpp) comes with a test designed to test the thread-safety of task_group (it spawns a bunch of threads, each of which calls run() a thousand times or more on the same task_group object).
  • The sudoku example code (in examples/task_group/sudoku/sudoku.cpp) that comes with TBB also calls task_group::run from multiple threads in a recursive function, essentially the same way your proposed code is doing.
  • task_group is one of the features shared between TBB and Microsoft's PPL, whose task_group is thread-safe. While the TBB documentation cautions that the behavior can still differ between the TBB and the PPL versions, it would be quite surprising if something as fundamental as thread-safety (and hence the need for external synchronization) is different.
  • tbb::structured_task_group (described as "like a task_group, but has only a subset of the functionality") has an explicit restriction that "Methods run, run_and_wait, cancel, and wait should be called only by the thread that created the structured_task_group".
like image 41
T.C. Avatar answered Oct 23 '22 18:10

T.C.