Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GC.AddMemoryPressure() not enough to trigger the Finalizer queue execution on time

We have written a custom indexing engine for a multimedia-matching project written in C#.

The indexing engine is written in unmanaged C++ and can hold a significant amount of unmanaged memory in the form of std:: collections and containers.

Every unmanaged index instance is wrapped by a managed object; the lifetime of the unamanaged index is controlled by the lifetime of the managed wrapper.

We have ensured (via custom, tracking C++ allocators) that every byte that is being consumed internally by the indexes is being accounted for, and we update (10 times per second) the managed garbage collector's memory pressure value with the deltas of this value (Positive deltas call GC.AddMemoryPressure(), negative deltas call GC.RemoveMemoryPressure()).

These indexes are thread-safe, and can be shared by a number of C# workers, so there may be multiple references in use for the same index. For that reason, we can not call Dispose() freely, and instead rely on the garbage collector to track reference sharing and eventually to trigger the finalization of the indexes once they are not in use by a worker process.

Now, the problem is that we are running out of memory. Full collections are in fact executed relatively often, however, with the help of a memory profiler, we can find a very large number of "dead" index instances being held in the finalization queue at the point where the process runs out of memory after exhausting the pagination file.

We can actually circumvent the problem if we add a watchdog thread that calls GC::WaitForPendingFinalizers() followed by a GC::Collect() on low memory conditions, however, from what we have read, calling GC::Collect() manually severely disrupts garbage collection efficiency, and we don't want that.

We have even added, to no avail, a pessimistic pressure factor (tried up to 4x) to exaggerate the amount of unmanaged memory reported to the .net side, to see if we could coax the garbage collector to empty the queue faster. It seems as if the thread that processes the queue is completely unaware of the memory pressure.

At this point we feel we need to implement a manual reference counting to Dispose() as soon as the count reaches zero, but this seems to be an overkill, especially because the whole purpose of the memory pressure API is precisely to account for cases like ours.

Some facts:

  • .Net version is 4.5
  • App is in 64-bit mode
  • Garbage collector is running in concurrent server mode.
  • Size of an index is ~800MB of unmanaged memory
  • There can be up to 12 "alive" indexes at any point in time.
  • Server has 64GB of RAM

Any ideas or suggestions are welcome

like image 247
BlueStrat Avatar asked Nov 03 '15 20:11

BlueStrat


1 Answers

Well, there will be no answer but "if you want to dispose external resource explicitly you had to do it by yourself".

AddMemoryPressure() method does not guarantee to trigger garbage collection immediately. Instead, CLR uses unmanaged memory allocation/deallocation stats to adjust it's own gc thresholds and GC is triggered only if it is considered appropriate.

Note that RemoveMemoryPressure() does not trigger GC at all (theoretically it can do it due to side effects from actions such as setting GCX_PREEMP but let's skip it for brevity). Instead it decreases the current mempressure value, nothing more (simplifying again).

Actual algorithm is undocumented, however you may look at the implementation from CoreCLR. In short, your bytesAllocated value had to exceed some dynamically calculated limit and then the CLR triggers the GC.

Now the bad news:

  • In the real app the process is totally unpredictable as each GC collection and each third-party code have an influence on the GC limits. The GC may be called, may be called later on may not be called at all

  • GC tunes it limits trying to minimize the costly GC2 collections (you're interested in these as you're working with long-lived index objects add they're always promoted to the next generation due to finalizer). So, DDOSing the runtime with huge mem pressure values may strike back as you'll raise the bar high enough to make (almost) no chance to trigger the GC by setting the mem pressure at all. (NB: the last issue will be fixed with new AddMemoryPressure() implementation but not today, definitely).

UPD: more details.

Ok, lets move on : )

Part 2, or "newer underestimate what _udocumented_ means"

As I've said above, you are interested in GC 2 collections as you are using long-lived objects.

It's well-known fact that the finalizer runs almost immediately after the object was GC-ed (assuming that the finalizer queue is not filled with other objects). As a proof: just run this gist.

The real reason why your indexes are not freed is pretty obvious: the generation the objects belongs to is not GCed. And now we're returning to the original question. How do you think, how much memory you had to allocate to trigger the GC2 collection?

As I've said above actual numbers are undocumented. In theory, GC2 may not be called at all until you consume very large chunks of memory. And now really bad news comes: for server GC "in theory" and "what really happens" are the same.

One more gist, on .Net4.6 x64 the output will be alike this:

GC low latency: Allocated, MB:   512.19          GC gen 0|1|2, MB:   194.19 |   317.81 |     0.00        GC count 0-1-2: 1-0-0 Allocated, MB: 1,024.38          GC gen 0|1|2, MB:   421.19 |   399.56 |   203.25        GC count 0-1-2: 2-1-0 Allocated, MB: 1,536.56          GC gen 0|1|2, MB:   446.44 |   901.44 |   188.13        GC count 0-1-2: 3-1-0 Allocated, MB: 2,048.75          GC gen 0|1|2, MB:   258.56 | 1,569.75 |   219.69        GC count 0-1-2: 4-1-0 Allocated, MB: 2,560.94          GC gen 0|1|2, MB:   623.00 | 1,657.56 |   279.44        GC count 0-1-2: 4-1-0 Allocated, MB: 3,073.13          GC gen 0|1|2, MB:   563.63 | 2,273.50 |   234.88        GC count 0-1-2: 5-1-0 Allocated, MB: 3,585.31          GC gen 0|1|2, MB:   309.19 |   723.75 | 2,551.06        GC count 0-1-2: 6-2-1 Allocated, MB: 4,097.50          GC gen 0|1|2, MB:   686.69 |   728.00 | 2,681.31        GC count 0-1-2: 6-2-1 Allocated, MB: 4,609.69          GC gen 0|1|2, MB:   593.63 | 1,465.44 | 2,548.94        GC count 0-1-2: 7-2-1 Allocated, MB: 5,121.88          GC gen 0|1|2, MB:   293.19 | 2,229.38 | 2,597.44        GC count 0-1-2: 8-2-1 

That's right, in worst cases you had to allocate ~3.5 gig to trigger the GC2 collection. I'm pretty sure that your allocations are much smaller:)

NB: Note that dealing with objects from GC1 generation does not make it any better. The size of GC0 segment may exceed 500mb. You had to try really hard to trigger the garbage collection on the ServerGC :)

Summary: the approach with Add/RemoveMemoryPressure will have (almost) no influence on the garbage collection frequency, at least on server GC.

Now, the last part of the question: what possible solutions do we have? In short, the simplest possible approach is to do ref-counting via disposable wrappers.

To be continued

like image 182
Sinix Avatar answered Sep 23 '22 02:09

Sinix