Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Garbage Collection and Parallel.ForEach Issue After VS2015 Upgrade

I have some code to process several million data rows in my own R-like C# DataFrame class. There's a number of Parallel.ForEach calls for iterating over the data rows in parallel. This code has been running for over a year using VS2013 and .NET 4.5 without issues.

I have two dev machines (A and B) and recently upgraded machine A to VS2015. I started noticing a strange intermittent freeze in my code about half the time. Letting it run for a long time, it turns out that the code does eventually finish. It just takes 15-120 minutes instead of 1-2 minutes.

Attempts to Break All using the VS2015 debugger keep failing for some reason. So I inserted a bunch of log statements. It turns out that this freeze occurs when there is a Gen2 collection during a Parallel.ForEach loop (comparing the collection count before and after each Parallel.ForEach loop). The entire extra 13-118 minutes is spent inside whichever Parallel.ForEach loop call happens to overlap with a Gen2 collection (if any). If there are no Gen2 collections during any Parallel.ForEach loops (about 50% of the time when I run it), then everything finishes fine in 1-2 minutes.

When I run the same code in VS2013 on Machine A, I get the same freezes. When I run the code in VS2013 on Machine B (which was never upgraded), it works perfectly. It ran dozens of time overnight with no freezing.

Some things I've noticed / tried:

  • The freezes happen with or without the debugger attached on Machine A (I figured it was something with the VS2015 debugger at first)
  • The freezes happen whether I build in Debug or Release mode
  • The freezes happen if I target .NET 4.5 or .NET 4.6
  • I tried disabling RyuJIT but that did not affect the freezes

I'm not changing the default GC settings at all. According to GCSettings, all runs are happening with LatencyMode Interactive and IsServerGC as false.

I could just switch to LowLatency before every call to Parallel.ForEach, but I'd really prefer to understand what's going on.

Has anyone else seen strange freezes in Parallel.ForEach after the VS2015 upgrade? Any ideas on what a good next step would be?

UPDATE 1: Adding some sample code to the nebulous explanation above...

Here is some sample code that I hope will demonstrate this issue. This code runs in 10-12 seconds on B machine, consistently. It encounters a number of Gen2 collections, but they take almost no time at all. If I uncomment the two GC settings lines, I can force it to have no Gen2 collections. It's somewhat slower then at 30-50 seconds.

Now on my A machine, the code takes a random amount of time. Seems to be between 5 and 30 minutes. And it seems to get worse, the more Gen2 collections it encounters. If I uncomment the two GC settings lines, it takes 30-50 seconds on Machine A also (same as Machine B).

It might take some tweaking in terms of the number of rows and array size for this to show up on another machine.

using System; using System.Collections; using System.Collections.Generic; using System.IO; using System.Diagnostics; using System.Threading; using System.Threading.Tasks; using System.Linq; using System.Runtime;      public class MyDataRow {     public int Id { get; set; }     public double Value { get; set; }     public double DerivedValuesSum { get; set; }     public double[] DerivedValues { get; set; } }  class Program {     static void Example()     {         const int numRows = 2000000;         const int tempArraySize = 250;          var r = new Random();         var dataFrame = new List<MyDataRow>(numRows);          for (int i = 0; i < numRows; i++) dataFrame.Add(new MyDataRow { Id = i, Value = r.NextDouble() });          Stopwatch stw = Stopwatch.StartNew();          int gcs0Initial = GC.CollectionCount(0);         int gcs1Initial = GC.CollectionCount(1);         int gcs2Initial = GC.CollectionCount(2);          //GCSettings.LatencyMode = GCLatencyMode.LowLatency;          Parallel.ForEach(dataFrame, dr =>         {             double[] tempArray = new double[tempArraySize];             for (int j = 0; j < tempArraySize; j++) tempArray[j] = Math.Pow(dr.Value, j);             dr.DerivedValuesSum = tempArray.Sum();             dr.DerivedValues = tempArray.ToArray();         });          int gcs0Final = GC.CollectionCount(0);         int gcs1Final = GC.CollectionCount(1);         int gcs2Final = GC.CollectionCount(2);          stw.Stop();          //GCSettings.LatencyMode = GCLatencyMode.Interactive;          Console.Out.WriteLine("ElapsedTime = {0} Seconds ({1} Minutes)", stw.Elapsed.TotalSeconds, stw.Elapsed.TotalMinutes);          Console.Out.WriteLine("Gcs0 = {0} = {1} - {2}", gcs0Final - gcs0Initial, gcs0Final, gcs0Initial);         Console.Out.WriteLine("Gcs1 = {0} = {1} - {2}", gcs1Final - gcs1Initial, gcs1Final, gcs1Initial);         Console.Out.WriteLine("Gcs2 = {0} = {1} - {2}", gcs2Final - gcs2Initial, gcs2Final, gcs2Initial);          Console.Out.WriteLine("Press Any Key To Exit...");         Console.In.ReadLine();     }      static void Main(string[] args)     {         Example();     } } 

UPDATE 2: Just to move things out of the comments for future readers...

This hotfix: https://support.microsoft.com/en-us/kb/3088957 totally fixes the issue. I'm not seeing any slowness issues at all after applying.

It turned out not to have anything to do with Parallel.ForEach I believe based on this: http://blogs.msdn.com/b/maoni/archive/2015/08/12/gen2-free-list-changes-in-clr-4-6-gc.aspx though the hotfix does mention Parallel.ForEach for some reason.

like image 525
Michael Covelli Avatar asked Jul 31 '15 13:07

Michael Covelli


People also ask

What triggers garbage collection in C #?

Garbage Collection occurs if at least one of multiple conditions is satisfied. These conditions are given as follows: If the system has low physical memory, then garbage collection is necessary. If the memory allocated to various objects in the heap memory exceeds a pre-set threshold, then garbage collection occurs.

Does parallel ForEach wait?

You don't have to do anything special, Parallel. Foreach() will wait until all its branched tasks are complete. From the calling thread you can treat it as a single synchronous statement and for instance wrap it inside a try/catch.

Which is faster ForEach or parallel ForEach C#?

The execution of Parallel. Foreach is faster than normal ForEach.

Does C# have automatic garbage collection?

CSharp Online TrainingThe garbage collector serves as an automatic memory manager. You do not need to know how to allocate and release memory or manage the lifetime of the objects that use that memory. An allocation is made any time you declare an object with a “new” keyword or a value type is boxed.

What are the performance problems caused by full garbage collection?

This article provides workarounds for performance problems that are caused by the churn from full garbage collection during deduplication. Full garbage collection jobs reclaim more free space than "regular" garbage collection. However, full garbage collection generates much more churn on the volume.

What happens when the garbage collector fails to allocate memory?

The garbage collector allocates memory from the system in segments of a pre-determined size. If an allocation requires an additional segment, but there is no contiguous free block left in the process's virtual memory space, the allocation for the managed heap will fail.

Why is the garbage collector not adding segments?

If fragmentation of virtual memory is preventing the garbage collector from adding segments, the causes could be one of the following: Frequent loading and unloading of many small assemblies. Holding too many references to COM objects when interoperating with unmanaged code.

Does garbage collection reclaim more than 95 percent of unreferenced data?

Nevertheless, regular-mode garbage collection should still reclaim more than 95 percent of unreferenced data. On a system that's running Windows Server 2012, make sure that hotfix 2897997 is installed. (This is not necessary for Windows Server 2012 R2.) Provides information on solving the problem of live migration in windows server 2016.


1 Answers

This indeed performs excessively poorly, the background GC is not doing you favor here. First thing I noted is that Parallel.ForEach() is using too many tasks. The threadpool manager misinterprets the thread behavior as "bogged down by I/O" and starts extra threads. This makes the problem worse. Workaround for that is:

var options = new ParallelOptions(); options.MaxDegreeOfParallelism = Environment.ProcessorCount;  Parallel.ForEach(dataFrame, options, dr => {     // etc.. } 

This gives better insight in what ails the program from the new diagnostics hub in VS2015. It doesn't take long for only a single core doing any work, easy to tell from the CPU usage. With occasional spikes, they don't last very long, coinciding with an orange GC mark. When you take a closer look at the GC mark you see it is a gen #1 collection. Taking a very long time, about 6 seconds on my machine.

A gen #1 collection of course doesn't take that long, what you see happening here is the gen #1 collection waiting for the background GC to finish its job. In other words, it is actually the background GC that's taking 6 seconds. Background GC can only be effective if the space in the gen #0 and gen #1 segments is large enough to not require a gen #2 collection while the background GC is trundling. Not the way this app works, it eats memory at a very high rate. The little spike you see is multiple tasks getting unblocked, being able to allocate arrays again. Quickly grinding to a halt when a gen #1 collection has to wait for the background GC again.

Notable is that the allocation pattern of this code is very unfriendly to the GC. It interleaves long-lived arrays (dr.DerivedValues) with short-lived arrays (tempArray). Giving the GC lots of work when it compacts the heap, every single allocated array is going to end up getting moved.

The apparent flaw in the .NET 4.6 GC is that the background collection never seems to effectively compact the heap. It looks like it does the job over and over again, as though the previous collection didn't compact at all. Whether this is by design or a bug is hard to tell, I don't have a clean 4.5 machine anymore. I'm certainly leaning towards bug. You should report this problem at connect.microsoft.com to have Microsoft take a look at it.


A workaround is very easy to come by, all you have to do is prevent the awkward inter-leaving of long- and short-lived objects. Which you do by pre-allocating them:

    for (int i = 0; i < numRows; i++) dataFrame.Add(new MyDataRow {          Id = i, Value = r.NextDouble(),          DerivedValues = new double[tempArraySize] });      ...     Parallel.ForEach(dataFrame, options, dr => {         var array = dr.DerivedValues;         for (int j = 0; j < array.Length; j++) array[j] = Math.Pow(dr.Value, j);         dr.DerivedValuesSum = array.Sum();     }); 

And of course by disabling background GC completely.


UPDATE: GC bug confirmed in this blog post. Fix coming soon.


UPDATE: a hotfix was released.


UPDATE: fixed in .NET 4.6.1

like image 128
5 revs Avatar answered Sep 28 '22 18:09

5 revs