How to parallelize a divide-and-conquer algorithm efficiently?

Tags:

I have been refreshing my memory about sorting algorithms the past few days and I've come across a situation where I can't find what the best solution is.

I wrote a basic implementation of quicksort, and I wanted to boost its performance by parallelizing its execution.

What I've got is that:

template <typename IteratorType>
void quicksort(IteratorType begin, IteratorType end)
{
  if (distance(begin, end) > 1)
  {
    const IteratorType pivot = partition(begin, end);

    if (distance(begin, end) > 10000)
    {
      thread t1([&begin, &pivot](){ quicksort(begin, pivot); });
      thread t2([&pivot, &end](){ quicksort(pivot + 1, end); });

      t1.join();
      t2.join();
    }
  }
}

While this works better than the naive "without-threads" implementation, this has serious limitations, namely:

If the array to sort is too big or the recursion goes too deep, the system can run out of threads and the execution fails miserably.
The cost of creating threads in each recursive call could probably be avoided, especially given that threads are not an infinite resource.

I wanted to use a thread pool to avoid the late-thread creation but I face then another problem:

Most of the thread I create do all their work at first, then do nothing while they are awaited for completion. This results in a lot of threads just waiting for sub-calls to finish which seems rather sub-optimal.

Is there a technique/entity I could use to avoid wasting threads (allow their reuse)?

I can use boost or any C++11 facilities.

436

asked Apr 28 '13 08:04

ereOn

1 Answers

If the array to sort is too big or the recursion goes too deep, the system can run out of threads and the execution fails miserably.

So go sequential after a maximum depth...

template <typename IteratorType>
void quicksort(IteratorType begin, IteratorType end, int depth = 0)
{
  if (distance(begin, end) > 1)
  {
    const IteratorType pivot = partition(begin, end);

    if (distance(begin, end) > 10000)
    {
      if (depth < 5) // <--- HERE
      { // PARALLEL
        thread t1([&begin, &pivot](){ quicksort(begin, pivot, depth+1); });
        thread t2([&pivot, &end](){ quicksort(pivot + 1, end, depth+1); });

        t1.join();
        t2.join();
      }
      else
      { // SEQUENTIAL
        quicksort(begin, pivot, depth+1);
        quicksort(pivot + 1, end, depth+1);
      }
    }
  }
}

With depth < 5 it will create a maximum of ~50 threads, which will easily saturate most multi-core CPUs - further parallism will yield no benefit.

The cost of creating threads in each recursive call could probably be avoided, especially given that threads are not an infinite resource.

Sleeping threads don't really cost as much as people think, but there is no point in creating two new threads at each branch, may as well reuse the current thread, rather than put it to sleep...

template <typename IteratorType>
void quicksort(IteratorType begin, IteratorType end, int depth = 0)
{
  if (distance(begin, end) > 1)
  {
    const IteratorType pivot = partition(begin, end);

    if (distance(begin, end) > 10000)
    {
      if (depth < 5)
      {
        thread t1([&begin, &pivot](){ quicksort(begin, pivot, depth+1); });
        quicksort(pivot + 1, end, depth+1);   // <--- HERE

        t1.join();
      } else {
        quicksort(begin, pivot, depth+1);
        quicksort(pivot + 1, end, depth+1);
      }
    }
  }
}

Alternatively to using depth, you can set a global thread limit, and then only create a new thread if the limit hasn't been reached - if it has, than do it sequentially. This thread limit can be process wide so parallel calls to quicksort will back off co-operatively from creating too many threads.

answered Oct 05 '22 23:10

Andrew Tomazos

Related questions
                            
                                In a TIFF create a Sub IFD with thumbnail (libtiff)
                            
                                Is it possible to use std::enable_if to select a member template specialization?
                            
                                Explanation of virtual table [duplicate]
                            
                                Questions about C++ virtual Inheritance
                            
                                Query about memory location
                            
                                OpenGL VAO's - Handling VAO's in multiple contexts
                            
                                Same Program code with same compiler leads to different binaries
                            
                                Why getline() throws 'std::ios_base::failure' when exception mask is not set to eofbit?
                            
                                Can I alias a member of a base class in a derived class?
                            
                                c++ custom output stream with indentation
                            
                                Mat to unsigned char*
                            
                                How can I make the contents of an #include-file a compile-time constant in a cpp-file?
                            
                                cmake obtain source list
                            
                                Including .pdb files with librarian in Visual Studio
                            
                                Windows Phone: Log to console
                            
                                Removing code duplication from const and non-const methods that return iterators
                            
                                How to make the default UI elements in Qt more beautiful?
                            
                                Cannot build Opencv project with cmake
                            
                                efficient way to handle 2d line segments
                            
                                Can you SWIG a boost::optional<>?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to parallelize a divide-and-conquer algorithm efficiently?

Tags:

c++

sorting

multithreading

c++11

parallel-processing

ereOn

People also ask

1 Answers

Andrew Tomazos

Recent Activity

Donate For Us