Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ 2011 : std::thread : simple example to parallelize a loop?

C++ 2011 includes very cool new features, but I can't find a lot of example to parallelize a for-loop. So my very naive question is : how do you parallelize a simple for loop (like using "omp parallel for") with std::thread ? (I search for an example).

Thank you very much.

like image 850
Vincent Avatar asked May 29 '12 01:05

Vincent


3 Answers

std::thread is not necessarily meant to parallize loops. It is meant to be the lowlevel abstraction to build constructs like a parallel_for algorithm. If you want to parallize your loops, you should either wirte a parallel_for algorithm yourself or use existing libraires which offer task based parallism.

The following example shows how you could parallize a simple loop but on the other side also shows the disadvantages, like the missing load-balancing and the complexity for a simple loop.

  typedef std::vector<int> container;
  typedef container::iterator iter;

  container v(100, 1);

  auto worker = [] (iter begin, iter end) {
    for(auto it = begin; it != end; ++it) {
      *it *= 2;
    }
  };


  // serial
  worker(std::begin(v), std::end(v));

  std::cout << std::accumulate(std::begin(v), std::end(v), 0) << std::endl; // 200

  // parallel
  std::vector<std::thread> threads(8);
  const int grainsize = v.size() / 8;

  auto work_iter = std::begin(v);
  for(auto it = std::begin(threads); it != std::end(threads) - 1; ++it) {
    *it = std::thread(worker, work_iter, work_iter + grainsize);
    work_iter += grainsize;
  }
  threads.back() = std::thread(worker, work_iter, std::end(v));

  for(auto&& i : threads) {
    i.join();
  }

  std::cout << std::accumulate(std::begin(v), std::end(v), 0) << std::endl; // 400

Using a library which offers a parallel_for template, it can be simplified to

parallel_for(std::begin(v), std::end(v), worker);
like image 68
Stephan Dollberg Avatar answered Oct 16 '22 17:10

Stephan Dollberg


Well obviously it depends on what your loop does, how you choose to parallellize, and how you manage the threads lifetime.

I'm reading the book from the std C++11 threading library (that is also one of the boost.thread maintainer and wrote Just Thread ) and I can see that "it depends".

Now to give you an idea of basics using the new standard threading, I would recommend to read the book as it gives plenty of examples. Also, take a look at http://www.justsoftwaresolutions.co.uk/threading/ and https://stackoverflow.com/questions/415994/boost-thread-tutorials

like image 44
Klaim Avatar answered Oct 16 '22 17:10

Klaim


Can't provide a C++11 specific answer since we're still mostly using pthreads. But, as a language-agnostic answer, you parallelise something by setting it up to run in a separate function (the thread function).

In other words, you have a function like:

def processArraySegment (threadData):
    arrayAddr = threadData->arrayAddr
    startIdx  = threadData->startIdx
    endIdx    = threadData->endIdx

    for i = startIdx to endIdx:
        doSomethingWith (arrayAddr[i])

    exitThread()

and, in your main code, you can process the array in two chunks:

int xyzzy[100]

threadData->arrayAddr = xyzzy
threadData->startIdx  = 0
threadData->endIdx    = 49
threadData->done      = false
tid1 = startThread (processArraySegment, threadData)

// caveat coder: see below.
threadData->arrayAddr = xyzzy
threadData->startIdx  = 50
threadData->endIdx    = 99
threadData->done      = false
tid2 = startThread (processArraySegment, threadData)

waitForThreadExit (tid1)
waitForThreadExit (tid2)

(keeping in mind the caveat that you should ensure thread 1 has loaded the data into its local storage before the main thread starts modifying it for thread 2, possibly with a mutex or by using an array of structures, one per thread).

In other words, it's rarely a simple matter of just modifying a for loop so that it runs in parallel, though that would be nice, something like:

for {threads=10} ({i} = 0; {i} < ARR_SZ; {i}++)
    array[{i}] = array[{i}] + 1;

Instead, it requires a bit of rearranging your code to take advantage of threads.

And, of course, you have to ensure that it makes sense for the data to be processed in parallel. If you're setting each array element to the previous one plus 1, no amount of parallel processing will help, simply because you have to wait for the previous element to be modified first.

This particular example above simply uses an argument passed to the thread function to specify which part of the array it should process. The thread function itself contains the loop to do the work.

like image 3
paxdiablo Avatar answered Oct 16 '22 17:10

paxdiablo