Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parallelize a divide-and-conquer algorithm efficiently?

I have been refreshing my memory about sorting algorithms the past few days and I've come across a situation where I can't find what the best solution is.

I wrote a basic implementation of quicksort, and I wanted to boost its performance by parallelizing its execution.

What I've got is that:

template <typename IteratorType>
void quicksort(IteratorType begin, IteratorType end)
{
  if (distance(begin, end) > 1)
  {
    const IteratorType pivot = partition(begin, end);

    if (distance(begin, end) > 10000)
    {
      thread t1([&begin, &pivot](){ quicksort(begin, pivot); });
      thread t2([&pivot, &end](){ quicksort(pivot + 1, end); });

      t1.join();
      t2.join();
    }
  }
}

While this works better than the naive "without-threads" implementation, this has serious limitations, namely:

  • If the array to sort is too big or the recursion goes too deep, the system can run out of threads and the execution fails miserably.
  • The cost of creating threads in each recursive call could probably be avoided, especially given that threads are not an infinite resource.

I wanted to use a thread pool to avoid the late-thread creation but I face then another problem:

  • Most of the thread I create do all their work at first, then do nothing while they are awaited for completion. This results in a lot of threads just waiting for sub-calls to finish which seems rather sub-optimal.

Is there a technique/entity I could use to avoid wasting threads (allow their reuse)?

I can use boost or any C++11 facilities.

like image 436
ereOn Avatar asked Apr 28 '13 08:04

ereOn


People also ask

What is divide and conquer algorithm?

Like Greedy and Dynamic Programming, Divide and Conquer is an algorithmic paradigm. A typical Divide and Conquer algorithm solves a problem using following three steps. 1. Divide: Break the given problem into subproblems of same type. 2. Conquer: Recursively solve these subproblems.

How to solve a recursive problem with a divide and conquer?

We will also compare the divide and conquer approach versus other approaches to solve a recursive problem. A divide and conquer algorithm is a strategy of solving a large problem by combining them to get the desired output. To use the divide and conquer algorithm, recursion is used. Learn about recursion in different programming languages:

How do you find the complexity of a divide and conquer?

Time Complexity. The complexity of the divide and conquer algorithm is calculated using the master theorem. T(n) = aT(n/b) + f(n), where, n = size of input a = number of subproblems in the recursion n/b = size of each subproblem.

What is the difference between naive method and divide and conquer?

The complexity for the multiplication of two matrices using the naive method is O (n 3), whereas using the divide and conquer approach (i.e. Strassen's matrix multiplication) is O (n 2.8074). This approach also simplifies other problems, such as the Tower of Hanoi. This approach is suitable for multiprocessing systems.


1 Answers

If the array to sort is too big or the recursion goes too deep, the system can run out of threads and the execution fails miserably.

So go sequential after a maximum depth...

template <typename IteratorType>
void quicksort(IteratorType begin, IteratorType end, int depth = 0)
{
  if (distance(begin, end) > 1)
  {
    const IteratorType pivot = partition(begin, end);

    if (distance(begin, end) > 10000)
    {
      if (depth < 5) // <--- HERE
      { // PARALLEL
        thread t1([&begin, &pivot](){ quicksort(begin, pivot, depth+1); });
        thread t2([&pivot, &end](){ quicksort(pivot + 1, end, depth+1); });

        t1.join();
        t2.join();
      }
      else
      { // SEQUENTIAL
        quicksort(begin, pivot, depth+1);
        quicksort(pivot + 1, end, depth+1);
      }
    }
  }
}

With depth < 5 it will create a maximum of ~50 threads, which will easily saturate most multi-core CPUs - further parallism will yield no benefit.

The cost of creating threads in each recursive call could probably be avoided, especially given that threads are not an infinite resource.

Sleeping threads don't really cost as much as people think, but there is no point in creating two new threads at each branch, may as well reuse the current thread, rather than put it to sleep...

template <typename IteratorType>
void quicksort(IteratorType begin, IteratorType end, int depth = 0)
{
  if (distance(begin, end) > 1)
  {
    const IteratorType pivot = partition(begin, end);

    if (distance(begin, end) > 10000)
    {
      if (depth < 5)
      {
        thread t1([&begin, &pivot](){ quicksort(begin, pivot, depth+1); });
        quicksort(pivot + 1, end, depth+1);   // <--- HERE

        t1.join();
      } else {
        quicksort(begin, pivot, depth+1);
        quicksort(pivot + 1, end, depth+1);
      }
    }
  }
}

Alternatively to using depth, you can set a global thread limit, and then only create a new thread if the limit hasn't been reached - if it has, than do it sequentially. This thread limit can be process wide so parallel calls to quicksort will back off co-operatively from creating too many threads.

like image 93
Andrew Tomazos Avatar answered Oct 05 '22 23:10

Andrew Tomazos