.NET's Multi-threading vs Multi-processing: Awful Parallel.ForEach Performance

Tags:

I have coded a very simple "Word Count" program that reads a file and counts each word's occurrence in the file. Here is a part of the code:

class Alaki {     private static List<string> input = new List<string>();      private static void exec(int threadcount)     {         ParallelOptions options = new ParallelOptions();         options.MaxDegreeOfParallelism = threadcount;         Parallel.ForEach(Partitioner.Create(0, input.Count),options, (range) =>         {             var dic = new Dictionary<string, List<int>>();             for (int i = range.Item1; i < range.Item2; i++)             {                 //make some delay!                 //for (int x = 0; x < 400000; x++) ;                                      var tokens = input[i].Split();                 foreach (var token in tokens)                 {                     if (!dic.ContainsKey(token))                         dic[token] = new List<int>();                     dic[token].Add(1);                 }             }         });      }      public static void Main(String[] args)     {                     StreamReader reader=new StreamReader((@"c:\txt-set\agg.txt"));         while(true)         {             var line=reader.ReadLine();             if(line==null)                 break;             input.Add(line);         }          DateTime t0 = DateTime.Now;         exec(Environment.ProcessorCount);         Console.WriteLine("Parallel:  " + (DateTime.Now - t0));         t0 = DateTime.Now;         exec(1);         Console.WriteLine("Serial:  " + (DateTime.Now - t0));     } }

It is simple and straight forward. I use a dictionary to count each word's occurrence. The style is roughly based on the MapReduce programming model. As you can see, each task is using its own private dictionary. So, there is NO shared variables; just a bunch of tasks that count words by themselves. Here is the output when the code is run on a quad-core i7 CPU:

Parallel: 00:00:01.6220927
Serial: 00:00:02.0471171

The speedup is about 1.25 which means a tragedy! But when I add some delay when processing each line, I can reach speedup values about 4.

In the original parallel execution with no delay, CPU's utilization hardly reaches to 30% and therefore the speedup is not promising. But, when we add some delay, CPU's utilization reaches to 97%.

Firstly, I thought the cause is the IO-bound nature of the program (but I think inserting into a dictionary is to some extent CPU intensive) and it seems logical because all of the threads are reading data from a shared memory bus. However, The surprising point is when I run 4 instances of serial programs (with no delays) simultaneously, CPU's utilization reaches to about raises and all of the four instances finish in about 2.3 seconds!

This means that when the code is being run in a multiprocessing configuration, it reaches to a speedup value about 3.5 but when it is being run in multithreading config, the speedup is about 1.25.

What is your idea? Is there anything wrong about my code? Because I think there is no shared data at all and I think the code shall not experience any contentions. Is there a flaw in .NET's run-time?

Thanks in advance.

837

asked Dec 02 '12 16:12

Saeed Shahrivari

1 Answers

Parallel.For doesn't divide the input into n pieces (where n is the MaxDegreeOfParallelism); instead it creates many small batches and makes sure that at most n are being processed concurrently. (This is so that if one batch takes a very long time to process, Parallel.For can still be running work on other threads. See Parallelism in .NET - Part 5, Partioning of Work for more details.)

Due to this design, your code is creating and throwing away dozens of Dictionary objects, hundreds of List objects, and thousands of String objects. This is putting enormous pressure on the garbage collector.

Running PerfMonitor on my computer reports that 43% of the total run time is spent in GC. If you rewrite your code to use fewer temporary objects, you should see the desired 4x speedup. Some excerpts from the PerfMonitor report follow:

Over 10% of the total CPU time was spent in the garbage collector. Most well tuned applications are in the 0-10% range. This is typically caused by an allocation pattern that allows objects to live just long enough to require an expensive Gen 2 collection.

This program had a peak GC heap allocation rate of over 10 MB/sec. This is quite high. It is not uncommon that this is simply a performance bug.

Edit: As per your comment, I will attempt to explain the timings you reported. On my computer, with PerfMonitor, I measured between 43% and 52% of time spent in GC. For simplicity, let's assume that 50% of the CPU time is work, and 50% is GC. Thus, if we make the work 4× faster (through multi-threading) but keep the amount of GC the same (this will happen because the number of batches being processed happened to be the same in the parallel and serial configurations), the best improvement we could get is 62.5% of the original time, or 1.6×.

However, we only see a 1.25× speedup because GC isn't multithreaded by default (in workstation GC). As per Fundamentals of Garbage Collection, all managed threads are paused during a Gen 0 or Gen 1 collection. (Concurrent and background GC, in .NET 4 and .NET 4.5, can collect Gen 2 on a background thread.) Your program experiences only a 1.25× speedup (and you see 30% CPU usage overall) because the threads spend most of their time being paused for GC (because the memory allocation pattern of this test program is very poor).

If you enable server GC, it will perform garbage collection on multiple threads. If I do this, the program runs 2× faster (with almost 100% CPU usage).

When you run four instances of the program simultaneously, each has its own managed heap, and the garbage collection for the four processes can execute in parallel. This is why you see 100% CPU usage (each process is using 100% of one CPU). The slightly longer overall time (2.3s for all vs 2.05s for one) is possibly due to inaccuracies in measurement, contention for the disk, time taken to load the file, having to initialise the threadpool, overhead of context switching, or some other environment factor.

116

answered Sep 24 '22 11:09

Bradley Grainger

Related questions
                            
                                What is inner Exception
                            
                                How to use OAuth2 in RestSharp
                            
                                Asynchronous locking based on a key
                            
                                OCR with the Tesseract interface
                            
                                Cast sender object in event handler using GetType().Name
                            
                                Is There Any Difference Between SqlConnection.CreateCommand and new SqlCommand?
                            
                                .NET URI: How can I change ONE part of a URI?
                            
                                C# Winforms - change cursor icon of mouse
                            
                                Random assembly references fail ("Are you missing a using directive or an assembly reference?")
                            
                                How to execute a command via command-line and wait for it to be done
                            
                                How to use DATEADD over column in LINQ - DateAdd is not recognized by LINQ
                            
                                How to check internet connectivity type in Universal Windows Platform
                            
                                appsettings with special characters in .NET Core
                            
                                .NET: Get all Outlook calendar items
                            
                                What is the most flexible serialization for .NET objects, yet simple to implement?
                            
                                DataGridViewComboBoxCell Binding - "value is not valid"
                            
                                Where do you put the function documentation so that it shows up on intellisense?
                            
                                Passing parameters to XSLT Stylesheet via .NET
                            
                                how to convert sql union to linq
                            
                                Possible unintended reference comparison

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

.NET's Multi-threading vs Multi-processing: Awful Parallel.ForEach Performance

Tags:

c#

.net

multithreading

multiprocessing

task-parallel-library

Saeed Shahrivari

People also ask

1 Answers

Bradley Grainger

Recent Activity

Donate For Us