Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel.For and Break() misunderstanding?

I'm investigating the Parallelism Break in a For loop.

After reading this and this I still have a question:

I'd expect this code :

 Parallel.For(0, 10, (i,state) =>  
     { 
                Console.WriteLine(i); if (i == 5) state.Break(); 
     }

To yield at most 6 numbers (0..6). not only he is not doing it but have different result length :

02351486
013542
0135642

Very annoying. (where the hell is Break() {after 5} here ??)

So I looked at msdn

Break may be used to communicate to the loop that no other iterations after the current iteration need be run. If Break is called from the 100th iteration of a for loop iterating in parallel from 0 to 1000, all iterations less than 100 should still be run, but the iterations from 101 through to 1000 are not necessary.

Quesion #1 :

Which iterations ? the overall iteration counter ? or per thread ? I'm pretty sure it is per thread. please approve.

Question #2 :

Lets assume we are using Parallel + range partition (due to no cpu cost change between elements) so it divides the data among threads . So if we have 4 cores (and perfect divisions among them):

core #1 got 0..250
core #2 got 251..500
core #3 got 501..750
core #4 got 751..1000

so the thread in core #1 will meet value=100 sometime and will break. this will be his iteration number 100 . But the thread in core #4 got more quanta and he is on 900 now. he is way beyond his 100'th iteration. He doesnt have index less 100 to be stopped !! - so he will show them all.

Am I right ? is that is the reason why I get more than 5 elements in my example ?

Question #3 :

How cn I truly break when (i == 5) ?

p.s.

I mean , come on ! when I do Break() , I want things the loop to stop. excactly as I do in regular For loop.

like image 342
Royi Namir Avatar asked Oct 08 '12 15:10

Royi Namir


3 Answers

To yield at most 6 numbers (0..6).

The problem is that this won't yield at most 6 numbers.

What happens is, when you hit a loop with an index of 5, you send the "break" request. Break() will cause the loop to no longer process any values >5, but process all values <5.

However, any values greater than 5 which were already started will still get processed. Since the various indices are running in parallel, they're no longer ordered, so you get various runs where some values >5 (such as 8 in your example) are still being executed.

Which iterations ? the overall iteration counter ? or per thread ? I'm pretty sure it is per thread. please approve.

This is the index being passed into Parallel.For. Break() won't prevent items from being processed, but provides a guarantee that all items up to 100 get processed, but items above 100 may or may not get processed.

Am I right ? is that is the reason why I get more than 5 elements in my example ?

Yes. If you use a partitioner like you've shown, as soon as you call Break(), items beyond the one where you break will no longer get scheduled. However, items (which is the entire partition) already scheduled will get processed fully. In your example, this means you're likely to always process all 1000 items.

How can I truly break when (i == 5) ?

You are - but when you run in Parallel, things change. What is the actual goal here? If you only want to process the first 6 items (0-5), you should restrict the items before you loop through them via a LINQ query or similar. You can then process the 6 items in Parallel.For or Parallel.ForEach without a Break() and without worry.

I mean , come on ! when I do Break() , I want things the loop to stop. excactly as I do in regular For loop.

You should use Stop() instead of Break() if you want things to stop as quickly as possible. This will not prevent items already running from stopping, but will no longer schedule any items (including ones at lower indices or earlier in the enumeration than your current position).

like image 86
Reed Copsey Avatar answered Oct 13 '22 08:10

Reed Copsey


If Break is called from the 100th iteration of a for loop iterating in parallel from 0 to 1000

The 100th iteration of the loop is not necessarily (in fact probably not) the one with the index 99.

Your threads can and will run in an indeterminent order. When the .Break() instruction is encountered, no further loop iterations will be started. Exactly when that happens depends on the specifics of thread scheduling for a particular run.

I strongly recommend reading

Patterns of Parallel Programming

(free PDF from Microsoft)

to understand the design decisions and design tradeoffs that went into the TPL.

like image 25
Eric J. Avatar answered Oct 13 '22 09:10

Eric J.


Which iterations ? the overall iteration counter ? or per thread ?

Off all the iterations scheduled (or yet to be scheduled).

Remember the delegate may be run out of order, there is no guarantee that iteration i == 5 will be the sixth to execute, rather this is unlikely to be the case except in rare cases.

Q2: Am I right ?

No, the scheduling is not so simplistic. Rather all the tasks are queued up and then the queue is processed. But the threads each use their own queue until it is empty when they steal from other the threads. This leads no way to predict which thread will process what delegate.

If the delegates are sufficiently trivial it might all be processed on the original calling thread (no other thread gets a chance to steal work).

Q3: How cn I truly break when (i == 5) ?

Don't use concurrently if you want linear (in specific) processing.

The Break method is there to support speculative execution: try various ways and stop as soon as any one completes.

like image 25
Richard Avatar answered Oct 13 '22 10:10

Richard