They say that compacting garbage collectors are faster than traditional memory management because they only have to collect live objects, and by rearranging them in memory so everything's in one contiguous block, you end up with no heap fragmentation. But how can that be done quickly? It seems to me that that's equivalent to the bin-packing problem, which is NP-hard and can't be completed in a reasonable amount of time on a large dataset within the current limits of our knowledge about computation. What am I missing?
To reduce fragmentation, the JRockit JVM compacts a part of the heap at every garbage collection (old collection). Compaction moves objects closer together and further down in the heap, thus creating larger free areas near the top of the heap.
Garbage collection occurs when the generations fill up. You have two primary generations on the heap: young and old. The young generation is also known as the new generation, or eden space.
When the garbage has been removed from the heap, the Garbage Collector can consider compacting the resulting set of objects to remove the spaces that are between them. The process of compaction is complicated because, if any object is moved, the GC must change all the references that exist to it.
Java Heap space is used by java runtime to allocate memory to Objects and JRE classes. Whenever we create an object, it's always created in the Heap space. Garbage Collection runs on the heap memory to free the memory used by objects that don't have any reference.
Compaction means moving objects in RAM so that some objects are removed (the dead objects, that the GC is supposed to reclaim) and all remaining objects become contiguous in RAM.
Most compacting GC work by allocating objects within a single, contiguous area obtained from the operating system. Compaction is then like removing the dead objects, and then pushing all remaining live objects "to the left", squeezing out the holes. If the GC works by compaction, then allocation is simply a matter of moving up the "end of allocated area" pointer. Synthetically, within the allocation area, there is a pointer such that the free area consists of the bytes after that pointer. To allocate space for an object, the pointer is simply moved up by the new object size. Occasionally, the GC decides that it is time to run, detects the dead object, squeezes the holes out, and thus lowers the allocation pointer.
The performance gains from a compacting GC come from several sources:
If the operating system refuses to give a single allocation area, instead yielding several blocks, then things become a bit more complex, and could begin to look like the bin-packing problem, because the compacting GC then has to decide in which block goes each live object. However, the complexity of bin-packing is about finding the "perfect" match in the general case; an approximate solution is already good enough for a memory allocator.
The algorithmic difficulty in a compaction algorithm is about updating all pointers, so that they point to the new object location. Through strict typing, the .NET virtual machine can unambiguously decide whether each word in RAM is a pointer or not, but updating all pointers efficiently without using too much extra RAM can be tricky. H.B.M. Jonkers has described a wonderfully smart algorithm for that, in "A Fast Garbage Compaction Algorithm" (Information Processing Letters, Volume 9, Number 1, 1979, pp. 26-30). I could not find a copy of that paper on the Vast Internet, but the algorithm is described in the "Garbage Collection" book by Jones and Lins (section 5.6). I warmly recommend that book to anyone who is interested in understanding garbage collectors. Jonkers' algorithm requires two linear passes on the live objects and turns out to be easy to implement (a few dozen lines of code, no more; the most difficult part is understanding why it works).
Additional complexity comes from generational collectors which try, most of the time, to leave most of the objects untouched, working preferentially with young objects only. Here, this means compacting only the end of the heap; full compaction being applied only rarely. The point here is that a full compaction, although linear, could still induce a noticeable pause. Generational GC tries to make such pauses rarer. There again, Jones' and Lins' book is a must-read.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With