Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel.For vs regular threads

I'm trying to understand why Parallel.For is able to outperform a number of threads in the following scenario: consider a batch of jobs that can be processed in parallel. While processing these jobs, new work may be added, which then needs to be processed as well. The Parallel.For solution would look as follows:

var jobs = new List<Job> { firstJob };
int startIdx = 0, endIdx = jobs.Count;
while (startIdx < endIdx) {
  Parallel.For(startIdx, endIdx, i => WorkJob(jobs[i]));
  startIdx = endIdx; endIdx = jobs.Count;
}

This means that there are multiple times where the Parallel.For needs to synchronize. Consider a bread-first graph algorithm algorithm; the number of synchronizations would be quite large. Waste of time, no?

Trying the same in the old-fashioned threading approach:

var queue = new ConcurrentQueue<Job> { firstJob };
var threads = new List<Thread>();
var waitHandle = new AutoResetEvent(false);
int numBusy = 0;
for (int i = 0; i < maxThreads; i++) 
  threads.Add(new Thread(new ThreadStart(delegate {
    while (!queue.IsEmpty || numBusy > 0) {
      if (queue.IsEmpty)
        // numbusy > 0 implies more data may arrive
        waitHandle.WaitOne();

      Job job;
      if (queue.TryDequeue(out job)) {
        Interlocked.Increment(ref numBusy);
        WorkJob(job); // WorkJob does a waitHandle.Set() when more work was found
        Interlocked.Decrement(ref numBusy);
      }
    }
    // others are possibly waiting for us to enable more work which won't happen
    waitHandle.Set(); 
})));
threads.ForEach(t => t.Start());
threads.ForEach(t => t.Join());

The Parallel.For code is of course much cleaner, but what I cannot comprehend, it's even faster as well! Is the task scheduler just that good? The synchronizations were eleminated, there's no busy waiting, yet the threaded approach is consistently slower (for me). What's going on? Can the threading approach be made faster?

Edit: thanks for all the answers, I wish I could pick multiple ones. I chose to go with the one that also shows an actual possible improvement.

like image 575
Frank Razenberg Avatar asked Oct 25 '12 14:10

Frank Razenberg


2 Answers

The two code samples are not really the same.

The Parallel.ForEach() will use a limited amount of threads and re-use them. The 2nd sample is already starting way behind by having to create a number of threads. That takes time.

And what is the value of maxThreads ? Very critical, in Parallel.ForEach() it is dynamic.

Is the task scheduler just that good?

It is pretty good. And TPL uses work-stealing and other adaptive technologies. You'll have a hard time to do any better.

like image 98
Henk Holterman Avatar answered Oct 11 '22 14:10

Henk Holterman


Parallel.For doesn't actually break up the items into single units of work. It breaks up all the work (early on) based on the number of threads it plans to use and the number of iterations to be executed. Then has each thread synchronously process that batch (possibly using work stealing or saving some extra items to load-balance near the end). By using this approach the worker threads are virtually never waiting on each other, while your threads are constantly waiting on each other due to the heavy synchronization you're using before/after every single iteration.

On top of that since it's using thread pool threads many of the threads it needs are likely already created, which is another advantage in its favor.

As for synchronization, the entire point of a Parallel.For is that all of the iterations can be done in parallel, so there is almost no synchronization that needs to take place (at least in their code).

Then of course there is the issue of the number of threads. The threadpool has a lot of very good algorithms and heuristics to help it determine how many threads are need at that instant in time, based on the current hardware, the load from other applications, etc. It's possible that you're using too many, or not enough threads.

Also, since the number of items that you have isn't known before you start I would suggest using Parallel.ForEach rather than several Parallel.For loops. It is simply designed for the situation that you're in, so it's heuristics will apply better. (It also makes for even cleaner code.)

BlockingCollection<Job> queue = new BlockingCollection<Job>();

//add jobs to queue, possibly in another thread
//call queue.CompleteAdding() when there are no more jobs to run

Parallel.ForEach(queue.GetConsumingEnumerable(),
    job => job.DoWork());
like image 31
Servy Avatar answered Oct 11 '22 12:10

Servy