Why is Parallel.ForEach much faster then AsParallel().ForAll() even though MSDN suggests otherwise?

Tags:

I've been doing some investigation to see how we can create a multithreaded application that runs through a tree.

To find how this can be implemented in the best way I've created a test application that runs through my C:\ disk and opens all directories.

class Program {     static void Main(string[] args)     {         //var startDirectory = @"C:\The folder\RecursiveFolder";         var startDirectory = @"C:\";          var w = Stopwatch.StartNew();          ThisIsARecursiveFunction(startDirectory);          Console.WriteLine("Elapsed seconds: " + w.Elapsed.TotalSeconds);          Console.ReadKey();     }      public static void ThisIsARecursiveFunction(String currentDirectory)     {         var lastBit = Path.GetFileName(currentDirectory);         var depth = currentDirectory.Count(t => t == '\\');         //Console.WriteLine(depth + ": " + currentDirectory);          try         {             var children = Directory.GetDirectories(currentDirectory);              //Edit this mode to switch what way of parallelization it should use             int mode = 3;              switch (mode)             {                 case 1:                     foreach (var child in children)                     {                         ThisIsARecursiveFunction(child);                     }                     break;                 case 2:                     children.AsParallel().ForAll(t =>                     {                         ThisIsARecursiveFunction(t);                     });                     break;                 case 3:                     Parallel.ForEach(children, t =>                     {                         ThisIsARecursiveFunction(t);                     });                     break;                 default:                     break;             }          }         catch (Exception eee)         {             //Exception might occur for directories that can't be accessed.         }     } }

What I have encountered however is that when running this in mode 3 (Parallel.ForEach) the code completes in around 2.5 seconds (yes I have an SSD ;) ). Running the code without parallelization it completes in around 8 seconds. And running the code in mode 2 (AsParalle.ForAll()) it takes a near infinite amount of time.

When checking in process explorer I also encounter a few strange facts:

Mode1 (No Parallelization): Cpu:     ~25% Threads: 3 Time to complete: ~8 seconds  Mode2 (AsParallel().ForAll()): Cpu:     ~0% Threads: Increasing by one per second (I find this strange since it seems to be waiting on the other threads to complete or a second timeout.) Time to complete: 1 second per node so about 3 days???  Mode3 (Parallel.ForEach()): Cpu:     100% Threads: At most 29-30 Time to complete: ~2.5 seconds

What I find especially strange is that Parallel.ForEach seems to ignore any parent threads/tasks that are still running while AsParallel().ForAll() seems to wait for the previous Task to either complete (which won't soon since all parent Tasks are still waiting on their child tasks to complete).

Also what I read on MSDN was: "Prefer ForAll to ForEach When It Is Possible"

Source: http://msdn.microsoft.com/en-us/library/dd997403(v=vs.110).aspx

Does anyone have a clue why this could be?

Edit 1:

As requested by Matthew Watson I've first loaded the tree in memory before looping through it. Now the loading of the tree is done sequentially.

The results however are the same. Unparallelized and Parallel.ForEach now complete the whole tree in about 0.05 seconds while AsParallel().ForAll still only goes around 1 step per second.

Code:

class Program {     private static DirWithSubDirs RootDir;      static void Main(string[] args)     {         //var startDirectory = @"C:\The folder\RecursiveFolder";         var startDirectory = @"C:\";          Console.WriteLine("Loading file system into memory...");         RootDir = new DirWithSubDirs(startDirectory);         Console.WriteLine("Done");           var w = Stopwatch.StartNew();          ThisIsARecursiveFunctionInMemory(RootDir);          Console.WriteLine("Elapsed seconds: " + w.Elapsed.TotalSeconds);          Console.ReadKey();     }              public static void ThisIsARecursiveFunctionInMemory(DirWithSubDirs currentDirectory)     {         var depth = currentDirectory.Path.Count(t => t == '\\');         Console.WriteLine(depth + ": " + currentDirectory.Path);          var children = currentDirectory.SubDirs;          //Edit this mode to switch what way of parallelization it should use         int mode = 2;          switch (mode)         {             case 1:                 foreach (var child in children)                 {                     ThisIsARecursiveFunctionInMemory(child);                 }                 break;             case 2:                 children.AsParallel().ForAll(t =>                 {                     ThisIsARecursiveFunctionInMemory(t);                 });                 break;             case 3:                 Parallel.ForEach(children, t =>                 {                     ThisIsARecursiveFunctionInMemory(t);                 });                 break;             default:                 break;         }     } }  class DirWithSubDirs {     public List<DirWithSubDirs> SubDirs = new List<DirWithSubDirs>();     public String Path { get; private set; }      public DirWithSubDirs(String path)     {         this.Path = path;         try         {             SubDirs = Directory.GetDirectories(path).Select(t => new DirWithSubDirs(t)).ToList();         }         catch (Exception eee)         {             //Ignore directories that can't be accessed         }     } }

Edit 2:

After reading the update on Matthew's comment I've tried to add the following code to the program:

ThreadPool.SetMinThreads(4000, 16); ThreadPool.SetMaxThreads(4000, 16);

This however does not change how the AsParallel peforms. Still the first 8 steps are being executed in an instant before slowing down to 1 step / second.

(Extra note, I'm currently ignoring the exceptions that occur when I can't access a Directory by the Try Catch block around the Directory.GetDirectories())

Edit 3:

Also what I'm mainly interested in is the difference between Parallel.ForEach and AsParallel.ForAll because to me it's just strange that for some reason the second one creates one Thread for every recursion it does while the first once handles everything in around 30 threads max. (And also why MSDN suggests to use the AsParallel even though it creates so much threads with a ~1 second timeout)

Edit 4:

Another strange thing I found out: When I try to set the MinThreads on the Thread pool above 1023 it seems to ignore the value and scale back to around 8 or 16: ThreadPool.SetMinThreads(1023, 16);

Still when I use 1023 it does the first 1023 elements very fast followed by going back to the slow pace I've been experiencing all the time.

Note: Also literally more then 1000 threads are now created (compared to 30 for the whole Parallel.ForEach one).

Does this mean Parallel.ForEach is just way smarter in handling tasks?

Some more info, this code prints twice 8 - 8 when you set the value above 1023: (When you set the values to 1023 or lower it prints the correct value)

        int threadsMin;         int completionMin;         ThreadPool.GetMinThreads(out threadsMin, out completionMin);         Console.WriteLine("Cur min threads: " + threadsMin + " and the other thing: " + completionMin);          ThreadPool.SetMinThreads(1023, 16);         ThreadPool.SetMaxThreads(1023, 16);          ThreadPool.GetMinThreads(out threadsMin, out completionMin);         Console.WriteLine("Now min threads: " + threadsMin + " and the other thing: " + completionMin);

Edit 5:

As of Dean's request I've created another case to manually create tasks:

case 4:     var taskList = new List<Task>();     foreach (var todo in children)     {         var itemTodo = todo;         taskList.Add(Task.Run(() => ThisIsARecursiveFunctionInMemory(itemTodo)));     }     Task.WaitAll(taskList.ToArray());     break;

This is also as fast as the Parallel.ForEach() loop. So we still don't have the answer to why AsParallel().ForAll() is so much slower.

561

asked Sep 18 '14 08:09

Devedse

1 Answers

This problem is pretty debuggable, an uncommon luxury when you have problems with threads. Your basic tool here is the Debug > Windows > Threads debugger window. Shows you the active threads and gives you a peek at their stack trace. You'll easily see that, once it gets slow, that you'll have dozens of threads active that are all stuck. Their stack trace all look the same:

    mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout, bool exitContext) + 0x16 bytes       mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout) + 0x7 bytes      mscorlib.dll!System.Threading.ManualResetEventSlim.Wait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken) + 0x182 bytes         mscorlib.dll!System.Threading.Tasks.Task.SpinThenBlockingWait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken) + 0x93 bytes        mscorlib.dll!System.Threading.Tasks.Task.InternalRunSynchronously(System.Threading.Tasks.TaskScheduler scheduler, bool waitForCompletion) + 0xba bytes       mscorlib.dll!System.Threading.Tasks.Task.RunSynchronously(System.Threading.Tasks.TaskScheduler scheduler) + 0x13 bytes       System.Core.dll!System.Linq.Parallel.SpoolingTask.SpoolForAll<ConsoleApplication1.DirWithSubDirs,int>(System.Linq.Parallel.QueryTaskGroupState groupState, System.Linq.Parallel.PartitionedStream<ConsoleApplication1.DirWithSubDirs,int> partitions, System.Threading.Tasks.TaskScheduler taskScheduler) Line 172  C# // etc..

Whenever you see something like this, you should immediately think fire-hose problem. Probably the third-most common bug with threads, after races and deadlocks.

Which you can reason out, now that you know the cause, the problem with the code is that every thread that completes adds N more threads. Where N is the average number of sub-directories in a directory. In effect, the number of threads grows exponentially, that's always bad. It will only stay in control if N = 1, that of course never happens on an typical disk.

Do beware that, like almost any threading problem, that this misbehavior tends to repeat poorly. The SSD in your machine tends to hide it. So does the RAM in your machine, the program might well complete quickly and trouble-free the second time you run it. Since you'll now read from the file system cache instead of the disk, very fast. Tinkering with ThreadPool.SetMinThreads() hides it as well, but it cannot fix it. It never fixes any problem, it only hides them. Because no matter what happens, the exponential number will always overwhelm the set minimum number of threads. You can only hope that it completes finishing iterating the drive before that happens. Idle hope for a user with a big drive.

The difference between ParallelEnumerable.ForAll() and Parallel.ForEach() is now perhaps also easily explained. You can tell from the stack trace that ForAll() does something naughty, the RunSynchronously() method blocks until all the threads are completed. Blocking is something threadpool threads should not do, it gums up the thread pool and won't allow it to schedule the processor for another job. And has the effect you observed, the thread pool is quickly overwhelmed with threads that are waiting on the N other threads to complete. Which isn't happening, they are waiting in the pool and are not getting scheduled because there are already so many of them active.

This is a deadlock scenario, a pretty common one, but the threadpool manager has a workaround for it. It watches the active threadpool threads and steps in when they don't complete in a timely manner. It then allows an extra thread to start, one more than the minimum set by SetMinThreads(). But not more then the maximum set by SetMaxThreads(), having too many active tp threads is risky and likely to trigger OOM. This does solve the deadlock, it gets one of the ForAll() calls to complete. But this happens at a very slow rate, the threadpool only does this twice a second. You'll run out of patience before it catches up.

Parallel.ForEach() doesn't have this problem, it doesn't block so doesn't gum up the pool.

Seems to be the solution, but do keep in mind that your program is still fire-hosing the memory of your machine, adding ever more waiting tp threads to the pool. This can crash your program as well, it just isn't as likely because you have a lot of memory and the threadpool doesn't use a lot of it to keep track of a request. Some programmers however accomplish that as well.

The solution is a very simple one, just don't use threading. It is harmful, there is no concurrency when you have only one disk. And it does not like being commandeered by multiple threads. Especially bad on a spindle drive, head seeks are very, very slow. SSDs do it a lot better, it however still takes an easy 50 microseconds, overhead that you just don't want or need. The ideal number of threads to access a disk that you can't otherwise expect to be cached well is always one.

118

answered Oct 02 '22 12:10

Hans Passant

Related questions
                            
                                Remove console and debug loggers in ASP.NET Core 2.0 when in production mode
                            
                                What are the most important functional differences between C# and VB.NET?
                            
                                How to get Assembly Version (not File Version) for another EXE?
                            
                                Waiting for localhost, forever!
                            
                                How to create a user and get the newly created ID with ASP.NET Identity
                            
                                How to make two-way binding on Blazor component
                            
                                Cannot convert type 'System.Enum' to int
                            
                                C# flattening json structure
                            
                                The request channel timed out while waiting for a reply
                            
                                how to show publish version in a textbox?
                            
                                How can I instruct AutoFixture to not bother filling out some properties?
                            
                                JSON.NET: How to deserialize interface property based on parent (holder) object value?
                            
                                Get Image Orientation and rotate as per orientation
                            
                                Xamarin Forms Swipe Left/Swipe Right Gestures
                            
                                What is the advantage of using Path.Combine over concatenating strings with '+'?
                            
                                Generic Method Executed with a runtime type [duplicate]
                            
                                Impossible to use ref and out for first ("this") parameter in Extension methods?
                            
                                How to include the reference of DocumentFormat.OpenXml.dll on Mono2.10?
                            
                                The item was specified more than once in the "Resources" parameter. Duplicate items are not supported by the "Resources" parameter
                            
                                unchecked -keyword in C#

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is Parallel.ForEach much faster then AsParallel().ForAll() even though MSDN suggests otherwise?

Tags:

performance

c#

foreach

multithreading

parallel-processing

Devedse

People also ask

1 Answers

Hans Passant

Recent Activity

Donate For Us