Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different summation results with Parallel.ForEach

Tags:

I have a foreach loop that I am parallelizing and I noticed something odd. The code looks like

double sum = 0.0;  Parallel.ForEach(myCollection, arg => {      sum += ComplicatedFunction(arg); });  // Use sum variable below 

When I use a regular foreach loop I get different results. There may be something deeper down inside the ComplicatedFunction but it is possible that the sum variable is being unexpectantly affected by the parallelization?

like image 504
Brian Triplett Avatar asked Jul 29 '10 21:07

Brian Triplett


2 Answers

it is possible that the sum variable is being unexpectantly affected by the parallelization?

Yes.
Access to a double is not atomic and the sum += ... operation is never thread-safe, not even for types that are atomic. So you have multiple race conditions and the result is unpredictable.

You could use something like:

double sum = myCollection.AsParallel().Sum(arg => ComplicatedFunction(arg)); 

or, in a shorter notation

double sum = myCollection.AsParallel().Sum(ComplicatedFunction); 
like image 73
Henk Holterman Avatar answered Oct 21 '22 15:10

Henk Holterman


Like the others answers mentioned, updating the sum variable from multiple threads (which is what Parallel.ForEach does) is not a thread-safe operation. The trivial fix of acquiring a lock before doing the update will fix that problem.

double sum = 0.0; Parallel.ForEach(myCollection, arg =>  {    lock (myCollection)   {     sum += ComplicatedFunction(arg);   } }); 

However, that introduces yet another problem. Since the lock is acquired on each iteration then that means the execution of each iteration will be effectively serialized. In other words, it would have been better to just use a plain old foreach loop.

Now, the trick in getting this right is to partition the problem in separate and independent chucks. Fortunately that is super easy to do when all you want to do is sum the result of the iterations because the sum operation is commutative and associative and because the intermediate results of the iterations are independent.

So here is how you do it.

double sum = 0.0; Parallel.ForEach(myCollection,     () => // Initializer     {         return 0D;     },     (item, state, subtotal) => // Loop body     {         return subtotal += ComplicatedFunction(item);     },     (subtotal) => // Accumulator     {         lock (myCollection)         {           sum += subtotal;         }     }); 
like image 21
Brian Gideon Avatar answered Oct 21 '22 16:10

Brian Gideon