Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I "prime" the CLR GC to expect profligate memory use?

We have a server app that does a lot of memory allocations (both short lived and long lived). We are seeing an awful lot of GC2 collections shortly after startup, but these collections calm down after a period of time (even though the memory allocation pattern is constant). These collections are hitting performance early on.

I'm guessing that this could be caused by GC budgets (for Gen2?). Is there some way I can set this budget (directly or indirectly) to make my server perform better at the beginning?

One counter-intuitive set of results I've seen: We made a big reduction to the amount of memory (and Large Object Heap) allocations, which saw performance over the long term improve, but early performance gets worse, and the "settling down" period gets longer.

The GC apparently needs a certain period of time to realise our app is a memory hog and adapt accordingly. I already know this fact, how do I convince the GC?

Edit

  • OS:64 bit Windows Server 2008 R2
  • We're using .Net 4.0 ServerGC Batch Latency. Tried 4.5 and the 3 different latency modes, and while average performance was improved slightly, worst case performance actually deteriorated

Edit2

  • A GC spike can double time taken (we're talking seconds) going from acceptable to unacceptable
  • Almost all spikes correlate with gen 2 collections
  • My test run causes a final 32GB heap size. The initial frothiness lasts for the 1st 1/5th of the run time, and performance after that is actually better (less frequent spikes), even though the heap is growing. The last spike near the end of the test (with largest heap size) is the same height as (i.e. as bad as) 2 of the spikes in the initial "training" period (with much smaller heaps)
like image 958
Rob Avatar asked Nov 26 '13 10:11

Rob


1 Answers

Allocation of extremely large heap in .NET can be insanely fast, and number of blocking collections will not prevent it from being that fast. Problems that you observe are caused by the fact that you don't just allocate, but also have code that causes dependency reorganizations and actual garbage collection, all at the same time when allocation is going on.

There are a few techniques to consider:

  • try using LatencyMode (http://msdn.microsoft.com/en-us/library/system.runtime.gcsettings.latencymode(v=vs.110).aspx), set it to LowLatency while you are actively loading the data - see comments to this answer as well

  • use multiple threads

  • do not populate cross-references to newly allocated objects while actively loading; first go through active allocation phase, use only integer indexes to cross-reference items, but not managed references; then force full GC couple times to have everything in Gen2, and only then populate your advanced data structures; you may need to re-think your deserialization logic to make this happen

  • try forcing your biggest root collections (arrays of objects, strings) to second generation as early as possible; do this by preallocating them and forcing full GC two times, before you start populating data (loading millions of small objects); if you are using some flavor of generic Dictionary, make sure to preallocate its capacity early on, to avoid reorganizations

  • any big array of references is a big source of GC overhead - until both array and referenced objects are in Gen2; the bigger the array - the bigger the overhead; prefer arrays of indexes to arrays of references, especially for temporary processing needs

  • avoid having many utility or temporary objects deallocated or promoted while in active loading phase on any thread, carefully look through your code for string concatenation, boxing and 'foreach' iterators that can't be auto-optimized into 'for' loops

  • if you have an array of references and a hierarchy of function calls that have some long-running tight loops, avoid introducing local variables that cache the reference value from some position in the array; instead, cache the offset value and keep using something like "myArrayOfObjects[offset]" construct across all levels of your function calls; it helped me a lot with processing pre-populated, Gen2 large data structures, my personal theory here is that this helps GC manage temporary dependencies on your local thread's data structures, thus improving concurrency

Here are the reasons for this behavior, as far as I learned from populating up to ~100 Gb RAM during app startup, with multiple threads:

  • when GC moves data from one generation to another, it actually copies it and thus modifies all references; therefore, the fewer cross-references you have during active load phase - the better

  • GC maintains a lot of internal data structures that manage references; if you do massive modifications to references themselves - or if you have a lot of references that have to be modified during GC - it causes significant CPU and memory bandwidth overhead during both blocking and concurrent GC; sometimes I observed GC constantly consuming 30-80% of CPU without any collections going on - simply by doing some processing, which looks weird until you realize that any time you put a reference to some array or some temporary variable in a tight loop, GC has to modify and sometimes reorganize dependency tracking data structures

  • server GC uses thread-specific Gen0 segments and is capable of pushing entire segment to next Gen (without actually copying data - not sure about this one though), keep this in mind when designing multi-threaded data load process

  • ConcurrentDictionary, while being a great API, does not scale well in extreme scenarios with multiple cores, when number of objects goes above a few millions (consider using unmanaged hashtable optimized for concurrent insertion, such as one coming with Intel's TBB)

  • if possible or applicable, consider using native pooled allocator (Intel TBB, again)

BTW, latest update to .NET 4.5 has defragmentation support for large object heap. One more great reason to upgrade to it.

.NET 4.6 also has an API to ask for no GC whatsoever (GC.TryStartNoGCRegion), if certain conditions are met: https://msdn.microsoft.com/en-us/library/dn906202(v=vs.110).aspx

Also see a related post by Maoni Stephens: https://blogs.msdn.microsoft.com/maoni/2017/04/02/no-gcs-for-your-allocations/

like image 112
Roman Polunin Avatar answered Oct 18 '22 04:10

Roman Polunin