I have a short attention span so I couldn't make my way through the Wikipedia article.
I understand there are several techniques for garbage collection but a common one is the "reachability" test, where an object's eligibility for collection is based on whether or not it can be "reached" by a rooted object (which, to my understanding is an object that is known to not require collection). When you want to know if a certain object is reachable, how would you go about it? How do you know where to look?
The collector obviously must be aware of all allocated objects and rooted objects. How does it determine the reachability of each of these objects?
By walking the pointers/references, I'd say. In principle you just look whether an object still has references pointing to it (from other objects, local variables of currently executed code, ...). If it hasn't then no reference to this object can be obtained again (in languages like Java, at least where you can't do pointer trickery), so it's usually safe to throw that particular object away.
Other schemes used (or still in use) are for example reference counting where each object has a counter of references to it which must be incremented each time someone gets a reference to that object and decremented each time someone loses a reference to that object. COM in Windows works that way, if I remember correctly.
Java and .NET use (among others) generational garbage collection where each object is initially assumed to die very quickly (generational hypothesis). It then emplos some optimizations to keep garbage collection cycles fast and thus not disrupt the programs running too much. In ye olde times it wasn't uncommon for GC to lock up a program while it ran, sometimes for several seconds.
Aside from that, GC usually only runs when memory is low, i. e. too many dead objects have accumulated and need to be reclaimed. That's why most managed applications seem to waste much more memory compared to unmanaged ones, even though in many cases much of that memory can be reclaimed by running the GC once.
A garbage collector will always know about every allocated object, since otherwise it cannot possibly detect unreachable objects for deletion. This is all taken care of during memory allocation. This makes sense, since if the garbage collector doesn't know about every object, then it would be possible to form orphaned sets of objects which cannot be reached, not even by the garbage collector. Basically, that would be a memory leak. So at the very least the garbage collector must be aware of every allocated object.
This leads to the question of how the garbage collector determines which objects are eligible for deletion. There are various techniques for this, but two that you will hear about a lot are "mark and sweep" and "reference counting" (they often come up in text books) and they are worth knowing about to understand the basic ideas of garbage collection.
Mark and sweep involves the garbage collector walking through the object references (starting at a known set of non-collectible objects) and marking each object that it can reach (setting a flag on the object for example). When it has exhausted all references, then the set of objects that are NOT marked, can be deleted during the sweep phase.
Reference counting involves the garbage collector keeping a count for each object in memory of how many other objects refer to it. This counter will be incremented each time a reference to the object is established from another object, and decremented when a reference is deleted. When the counter hits 0, the object is no longer being referenced by anything, and the garbage collector knows that it can be safely deleted (it has essentially become unreachable at this point - no object "knows about" this object anymore).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With