Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does heap compaction work quickly?

They say that compacting garbage collectors are faster than traditional memory management because they only have to collect live objects, and by rearranging them in memory so everything's in one contiguous block, you end up with no heap fragmentation. But how can that be done quickly? It seems to me that that's equivalent to the bin-packing problem, which is NP-hard and can't be completed in a reasonable amount of time on a large dataset within the current limits of our knowledge about computation. What am I missing?

like image 678
Mason Wheeler Avatar asked Apr 18 '10 18:04

Mason Wheeler


People also ask

What is heap compaction?

To reduce fragmentation, the JRockit JVM compacts a part of the heap at every garbage collection (old collection). Compaction moves objects closer together and further down in the heap, thus creating larger free areas near the top of the heap.

How often the heap memory is garbage collected?

Garbage collection occurs when the generations fill up. You have two primary generations on the heap: young and old. The young generation is also known as the new generation, or eden space.

What is compaction in garbage collection?

When the garbage has been removed from the heap, the Garbage Collector can consider compacting the resulting set of objects to remove the spaces that are between them. The process of compaction is complicated because, if any object is moved, the GC must change all the references that exist to it.

What part of memory stack or heap is cleaned in garbage collection process?

Java Heap space is used by java runtime to allocate memory to Objects and JRE classes. Whenever we create an object, it's always created in the Heap space. Garbage Collection runs on the heap memory to free the memory used by objects that don't have any reference.


1 Answers

Compaction means moving objects in RAM so that some objects are removed (the dead objects, that the GC is supposed to reclaim) and all remaining objects become contiguous in RAM.

Most compacting GC work by allocating objects within a single, contiguous area obtained from the operating system. Compaction is then like removing the dead objects, and then pushing all remaining live objects "to the left", squeezing out the holes. If the GC works by compaction, then allocation is simply a matter of moving up the "end of allocated area" pointer. Synthetically, within the allocation area, there is a pointer such that the free area consists of the bytes after that pointer. To allocate space for an object, the pointer is simply moved up by the new object size. Occasionally, the GC decides that it is time to run, detects the dead object, squeezes the holes out, and thus lowers the allocation pointer.

The performance gains from a compacting GC come from several sources:

  • For allocation, there is no need to find a suitable "hole": by construction, free space is, at all times, a single, big area, and it suffices to just move a pointer up.
  • There is no fragmentation: the compaction moves all live objects together, but this can be seen as moving all holes together into a single big free space.
  • Locality is improved. By moving live objects into adjacent areas, behaviour with regards to cache memory is improved. In particular, compaction algorithms tend to keep objects in their respective allocation order (objects are slided but not swapped) which has been reported to be heuristically good for most applications.

If the operating system refuses to give a single allocation area, instead yielding several blocks, then things become a bit more complex, and could begin to look like the bin-packing problem, because the compacting GC then has to decide in which block goes each live object. However, the complexity of bin-packing is about finding the "perfect" match in the general case; an approximate solution is already good enough for a memory allocator.

The algorithmic difficulty in a compaction algorithm is about updating all pointers, so that they point to the new object location. Through strict typing, the .NET virtual machine can unambiguously decide whether each word in RAM is a pointer or not, but updating all pointers efficiently without using too much extra RAM can be tricky. H.B.M. Jonkers has described a wonderfully smart algorithm for that, in "A Fast Garbage Compaction Algorithm" (Information Processing Letters, Volume 9, Number 1, 1979, pp. 26-30). I could not find a copy of that paper on the Vast Internet, but the algorithm is described in the "Garbage Collection" book by Jones and Lins (section 5.6). I warmly recommend that book to anyone who is interested in understanding garbage collectors. Jonkers' algorithm requires two linear passes on the live objects and turns out to be easy to implement (a few dozen lines of code, no more; the most difficult part is understanding why it works).

Additional complexity comes from generational collectors which try, most of the time, to leave most of the objects untouched, working preferentially with young objects only. Here, this means compacting only the end of the heap; full compaction being applied only rarely. The point here is that a full compaction, although linear, could still induce a noticeable pause. Generational GC tries to make such pauses rarer. There again, Jones' and Lins' book is a must-read.

like image 174
Thomas Pornin Avatar answered Oct 10 '22 14:10

Thomas Pornin