Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do garbage collectors wait before deallocating?

I have a "why does it work that way?" question about garbage collection (any/all implementations: Java, Python, CLR, etc.). Garbage collectors deallocate an object when it is no longer in any scope; the number of references pointing to it is zero. It seems to me that a framework could deallocate as soon as the number of references reaches zero, but all implementations I've encountered wait a while and then deallocate many objects at a time. My question is, why?

I'm assuming that the framework keeps an integer for each object (which I think Python does, because you have to call PyINCREF and PyDECREF when writing extension modules for it in C; presumably these functions modify a real counter somewhere). If so, then it shouldn't take any more CPU time to eliminate the object the moment it goes out of scope. If it takes x nanoseconds per object now, then it would take x nanoseconds per object later, right?

If my assumption is wrong and there is no integer associated with each object, then I understand why garbage collection waits: it would have to walk the graph of references to determine the status of each object, and that calculation takes time. Such a method would consume less memory than the explicit reference-count method, but I'm astonished that it's quicker or is the preferred method for other reasons. It sounds like a lot of work.

From a programming point of view, it would be nice if objects deallocated immediately after they go out of scope. Not only could we rely on destructors being executed when we want them to be (one of the Python gotchas is that __del__ is not called at a predictable time), but it would become much easier to memory-profile a program. Here's an example of how much confusion this causes. To my mind, the benefits of programming in a deallocate-right-away framework are so great that there must be some good reason why all the implementations I've heard of wait before deallocating. What is that benefit?

Note: if the walk over the graph of references is only needed to identify circular references (a pure reference count can't), then why not a hybrid approach? Deallocate objects as soon as their reference count hits zero and then also do periodic sweeps to look for circular references. Programmers working in such a framework would have a performance/determinism reason to stick to non-circular references as much as is feasible. It's often feasible (e.g. all data are in the form of JSON objects with no pointers to parents). Is this how any popular garbage collectors work?

like image 330
Jim Pivarski Avatar asked Jul 15 '13 03:07

Jim Pivarski


People also ask

Why is garbage collector slow?

One of the most misunderstood parts of garbage collection is that it doesn't actually collect dead or unused objects; it collects used ones, the so-called survivors. Garbage collection is slow if most objects survive the collection process.

How can I make my garbage collection faster?

Short of avoiding garbage collection altogether, there is only one way to make garbage collection faster: ensure that as few objects as possible are reachable during the garbage collection. The fewer objects that are alive, the less there is to be marked. This is the rationale behind the generational heap.

What is garbage collection time?

The more live objects are found, the longer the suspension, which has a direct impact on response time and throughput. This fundamental tenet of garbage collection and the resulting effect on application execution is called the garbage-collection pause or GC pause time.


2 Answers

To start with, a point of terminology: "garbage collection" means different things to different people, and some GC schemes are more sophisticated than others. Some people consider reference counting to be a form of GC, but personally I consider "true GC" to be distinct from reference counting.

With refcounts, there is an integer tracking the number of references, and you can trigger deallocation immediately when the refcount hits zero. This us how the CPython implementation works, and how most varieties of C++ smart pointers work. The CPython implementation adds a mark/sweep GC as a backup, so it's very much like the hybrid design you describe.

But refcounting is actually a pretty terrible solution, since it incurs a (relatively) expensive memory write (plus a memory barrier and/or lock, to ensure thread safety) every time a reference is passed, which happens quite a lot. In imperative languages like C++ it's possible (just difficult) to manage memory ownership through macros and coding conventions, but in functional languages like Lisp it becomes well-nigh impossible, because memory allocation usually happens implicitly due to local variable capture in a closure.

So it should come as no surprise that the first step toward a modern GC was invented for Lisp. It was called the "twospace allocator" or "twospace collector" and it worked exactly like it sounds: it divided allocatable memory (the "heap") into two spaces. Every new object was allocated out of the first space until it got too full, at which point allocation would stop and the runtime would walk the reference graph and copy only the live (still referenced) objects to the second space. After the live objects were copied, the first space would be marked empty, and allocation would resume, allocating new objects from the second space, until it got too full, at which point the live objects would be copied back to the first space and the process would start all over again.

The advantage of the twospace collector is that, instead of doing O(N) work, where N is the total number of garbage objects, it would only do O(M) work, where M is the number of objects that were not garbage. Since in practice, most objects are allocated and then deallocated in a short period of time, this can lead to a substantial performance improvement.

Additionally, the twospace collector made it possible to simplify the allocator side as well. Most malloc() implementations maintain what is called a "free list": a list of which blocks are still available to be allocated. To allocate a new object, malloc() must scan the free list looking for an empty space that's big enough. But the twospace allocator didn't bother with that: it just allocated objects in each space like a stack, by just pushing a pointer up by the desired number of bytes.

So the twospace collector was much faster than malloc(), which was great because Lisp programs would do a lot more allocations than C programs would. Or, to put it another way, Lisp programs needed a way to allocate memory like a stack but with a lifetime that was not limited to the execution stack -- in other words, a stack that could grow infinitely without the program running out of memory. And, in fact, Raymond Chen argues that that's exactly how people should think about GC. I highly recommend his series of blog posts starting with Everybody thinks about garbage collection the wrong way.

But the twospace collector had a major flaw, which is that no program could ever use more than half the available RAM: the other half was always wasted. So the history of GC techniques is the history of attempts to improve on the twospace collector, usually by using heuristics of program behavior. However, GC algorithms inevitably involve tradeoffs, usually preferring to deallocate objects in batches instead of individually, which inevitably leads to delays where objects aren't deallocated immediately.

Edit: To answer your follow-up question, modern GCs generally incorporate the idea of generational garbage collection, where objects are grouped into different "generations" based on lifetime, and an object in one generation gets "promoted" to another generation once it's lived long enough. Sometimes a small difference in object lifetime (e.g. in a request-driven server, storing an object for longer than one request) can lead to a large difference in the amount of time it takes before the object gets deallocated, since it causes it to become more "tenured".

You correctly observe that a true GC has to operate "beneath" the level of malloc() and free(). (As a side note, it's worth learning about how malloc() and free() are implemented -- they aren't magic either!) Additionally, for an effective GC, you either need to be conservative (like the Boehm GC) and never move objects, and check things that might be pointers, or else you need some kind of "opaque pointer" type -- which Java and C# call "references". Opaque pointers are actually great for an allocation system, since it means you can always move objects by updating pointers to them. In a language like C where you interact directly with raw memory addresses, it's never really safe to move objects.

And there are multiple options for GC algorithms. The standard Java runtime contains no less than five collectors (Young, Serial, old CMS, new CMS, and G1, although I think I'm forgetting one) and each has a set of options that are all configurable.

However, GCs aren't magic. Most GCs are just exploiting the time-space tradeoff of batching work, which means that the gains in speed are usually paid for in increased memory usage (compared to manual memory management or refcounting). But the combination of increased program performance and increased programmer performance, versus the low cost of RAM these days, makes the tradeoff usually worth it.

Hopefully that helps make things clearer!

like image 183
Daniel Pryden Avatar answered Oct 07 '22 01:10

Daniel Pryden


To understand garbage-collection, go to a bowling alley and watch how the pinsetter removes fallen pins after the first ball has been rolled. Rather than identifying and removing individual fallen pins, the pinsetter mechanism picks up all the pins that are still standing, lifts them to safety, and then runs a sweeper bar across the lane without regard for how many pins are lying there or where they are located. Once that is done, the pins that were standing are placed back on the lane. Many garbage-collection systems work on much the same principle: they have to do a non-trivial amount of work for each live object to ensure it doesn't get destroyed, but dead objects are destroyed wholesale without even being looked at or noticed.

Addendum

A garbage collector that always has to act on every live item to ensure its preservation is apt to be slow when there are a lot of live items; this is why garbage collectors have, historically, gotten a bad rap. The BASIC interpreter on the Commodore 64 (which was, incidentally, written by Microsoft in the days before MS-DOS) would take many seconds to perform a garbage collection in a program which had an array of a few hundred strings. Performance can be improved enormously if items which survive their first garbage collection can be ignored until many items have survived their first garbage collection, and those which have participated in and survived two garbage collections (note that they won't have to participate in their second collection until many other objects have survived their first) can be ignored until many other objects have also participated and survived in their second. This concept can be partially implemented easily (even on the Commodore 64, one could force all strings that exist at a given moment to be exempt from future garbage collection, which could be useful if on startup a program created large arrays of strings that would never change) but becomes more powerful with a little extra hardware support.

If one figures that a garbage collector will try to pack the objects which are going to be kept as close close to an end of memory as it can, generational support requires doing nothing more than keeping track of what (contiguous) range of memory is used by objects of each generation. All objects of every generation must be scanned to make sure all newer-generation live objects are located and preserved, but older-generation objects don't have to be moved, since the memory they occupy isn't in danger of wholesale elimination. This approach is very simple to implement, and can offer some significant performance improvements versus a non-generational GC, but even the scanning phase of a GC can be expensive if there are many live objects.

They key to speed up a "newer-generation" garbage collections is to observe that if an object "Fred" has not been written since the last garbage-collection in which it participated, it cannot possibly contain any references to any objects which have been created since that time. Consequently, none of the objects to which it holds references would be in any danger of elimination until Fred itself is eligible for elimination. Of course, if references to newer objects have been stored in Fred since the last lower-level GC, those references do need to be scanned. To accomplish this, advanced garbage collectors set up hardware traps which fire when parts of the older generation heap are written. When such a trap fires, it adds the objects in that region to a list of older generation objects which will need to be scanned, and then disables the trap associated with that region. In cases where older-generation objects frequently have references to newer objects stored in them, this extra bookkeeping can hurt performance, but in most cases it ends up being a major performance win.

like image 35
supercat Avatar answered Oct 07 '22 02:10

supercat