Why Large Object Heap and why do we care?

I have read about Generations and Large object heap. But I still fail to understand what is the significance (or benefit) of having Large object heap?

What could have went wrong (in terms of performance or memory) if CLR would have just relied on Generation 2 (Considering that threshold for Gen0 and Gen1 is small to handle Large objects) for storing large objects?

People also ask

What is the large object heap?

If an object is greater than or equal to 85,000 bytes in size, it's considered a large object. This number was determined by performance tuning. When an object allocation request is for 85,000 or more bytes, the runtime allocates it on the large object heap.

Why is a large object heap bad?

Large objects pose a special problem for the runtime: they can't be reliably moved by copying as they would require twice as much memory for garbage collection. Additionally, moving multi-megabyte objects around would cause the garbage collector to take an unreasonably long time to complete.

What is object heap?

Java objects reside in an area called the heap. The heap is created when the JVM starts up and may increase or decrease in size while the application runs. When the heap becomes full, garbage is collected. During the garbage collection objects that are no longer used are cleared, thus making space for new objects.

How big is a heap?

1,000,000 grains is a heap. If 1,000,000 grains is a heap then 999,999 grains is a heap. So 999,999 grains is a heap.

A garbage collection doesn't just get rid of unreferenced objects, it also compacts the heap. That's a very important optimization. It doesn't just make memory usage more efficient (no unused holes), it makes the CPU cache much more efficient. The cache is a really big deal on modern processors, they are an easy order of magnitude faster than the memory bus.

Compacting is done simply by copying bytes. That however takes time. The larger the object, the more likely that the cost of copying it outweighs the possible CPU cache usage improvements.

So they ran a bunch of benchmarks to determine the break-even point. And arrived at 85,000 bytes as the cutoff point where copying no longer improves perf. With a special exception for arrays of double, they are considered 'large' when the array has more than 1000 elements. That's another optimization for 32-bit code, the large object heap allocator has the special property that it allocates memory at addresses that are aligned to 8, unlike the regular generational allocator that only allocates aligned to 4. That alignment is a big deal for double, reading or writing a mis-aligned double is very expensive. Oddly the sparse Microsoft info never mention arrays of long, not sure what's up with that.

Fwiw, there's lots of programmer angst about the large object heap not getting compacted. This invariably gets triggered when they write programs that consume more than half of the entire available address space. Followed by using a tool like a memory profiler to find out why the program bombed even though there was still lots of unused virtual memory available. Such a tool shows the holes in the LOH, unused chunks of memory where previously a large object lived but got garbage collected. Such is the inevitable price of the LOH, the hole can only be re-used by an allocation for an object that's equal or smaller in size. The real problem is assuming that a program should be allowed to consume all virtual memory at any time.

A problem that otherwise disappears completely by just running the code on a 64-bit operating system. A 64-bit process has 8 terabytes of virtual memory address space available, 3 orders of magnitude more than a 32-bit process. You just can't run out of holes.

Long story short, the LOH makes code run more efficient. At the cost of using available virtual memory address space less efficient.

UPDATE, .NET 4.5.1 now supports compacting the LOH, GCSettings.LargeObjectHeapCompactionMode property. Beware the consequences please.

