I'm tasked with improving a piece of code that generates massive reports, in any way I see fit.
There are about 10 identical reports generated (for each 'section' of the database) , and the code for them is similar to this:
GeneratePurchaseReport(Country.France, ProductType.Chair);
GC.Collect();
GeneratePurchaseReport(Country.France, ProductType.Table);
GC.Collect();
GeneratePurchaseReport(Country.Italy, ProductType.Chair);
GC.Collect();
GeneratePurchaseReport(Country.Italy, ProductType.Table);
GC.Collect();
If I remove those GC.Collect()
calls, the reporting service crashes with OutOfMemoryException
.
The bulk of the memory is kept in a massive List<T>
which is filled inside GeneratePurchaseReport
and is no longer of use as soon as it exits - which is why a full GC collection will reclaim the memory.
My question is two-fold:
GeneratePurchaseReport
it should do a full collection before crashing and burning, shouldn't it?Read up on the Large Object Heap.
I think what's happening is that the final document for individual reports is built and appended to over time, such that at each append operation a new document is created and the old is discarded (that probably happens behind the scenes). This document is (eventually) larger than the 85,000 byte threshold for storage on the Large Object Heap.
In this scenario, you're actually not using that much physical memory — it's still available for other processes. What you are using is address space that is available to your program. Every process in Windows has it's own (typically) 2GB address space available. Over time as you allocate new copies of your growing report document, you leave behind numerous holes in the LOH when the prior copy is collected. The memory freed by prior objects is not actually used anymore and is available for other processes, but the address space is still lost; it's fragmented and needs to be compacted. Eventually this address space fills up and you get an OutOfMemory exception.
The evidence suggests that calls to GC.Collect() allow for some compaction of the LOH, but it's not a perfect solution. Just about everything else I've read on the subject indicates that GC.Collect() is not supposed to compact the LOH at all, but I've seen several anecdotal reports (some here on Stack Overflow) where calling GC.Collect() was in fact able to avert OutOfMemory Exceptions from LOH fragmentation.
A "better" solution (in terms of being sure you won't ever run out of memory -- using GC.Collect() to compact the LOH just isn't reliable) is to splinter your report into units smaller than 85000 bytes, and write them all into a single buffer at the end, or using a data structure that doesn't throw away your prior work as it grows. Unfortunately, this is likely to be a lot more code.
One relatively simple option here is to allocate a buffer for a MemoryStream object that is bigger than your largest report, and then write into the MemoryStream as you build the report. This way you never leave fragments. If this is just written to disk you might even go right to a FileStream (perhaps via TextWriter, to make it easy to change later). It this option solves your problem, I'd like to hear about it in a comment to this answer.
We would need to see your code to be sure.
Failing that:
Are you pre-sizing the List with an expected number of items?
Can you pre-allocate and use an array instead of a list? (boxing/unboxing might then be an additional cost)
Even on a 64 bit machine, the largest size a single CLR object can be is 2GB
pre-allocate a memorystream to hold the entire report, and write to that.
Of interest?:
BigArray, getting around the 2GB array size limit
Large Object Heap Uncovered
I would suggest using a memory profiler such as memprofiler, or Redgate (both have free trials) to see where the problem actually lies).
The reason is probably Large Object Heap and any objects which use native heap internally, e.g. Bitmap class. Large object heap is also a traditional C heap, which fragments. Fragmentation is one aspect of this issue.
But I think it also has something to do with how GC determine when to collect. It works perfectly for the normal generational heaps but for allocated memory in other heaps, specially for memory in native heaps, it may not have enough information to make a perfect decision. And LOH is treated as generation 2, which means it has the least chance to be collected.
So in your case, I think manual forcing collect is a reasonable solution. But yes, it is not perfect.
PS: I'd like add a few more info to Joel's good explanation. The threshold for LOH is 85000 bytes for normal objects, but for double array it is 8000 bytes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With