I'm reading Steve Yegge's "Dynamic Languages Strike Back" talk, and in it he sort of criticizes mark-and-sweep GCs (about 5-10 percent through that link, the "Pigs attempt's to fly" slide) What's wrong with them?
The mark-and-sweep algorithm is called a tracing garbage collector because it traces out the entire collection of objects that are directly or indirectly accessible by the program. Example: A. All the objects have their marked bits set to false.
Concurrent Mark Sweep Collector Performance and Structure The CMS collector attempts to reduce pause times due to major collections by using separate garbage collector threads to trace the reachable objects concurrently with the execution of the application threads.
It has often been argued that copying collection is superior to mark-sweep for two reasons. First, it compacts memory, and hence avoids any fragmentation. Second, it's running time is proportional to the amount of live memory, not the size of the heap.
Concurrent Mark-Sweep refers to the Garbage Collection alogorithm that is being used, in this case, to collect against the "old" heap. The heap is generally in 3 generations.
(It is worth noting that Steve Yegge's talk was presented a long time ago now, and that some of the generalizations he makes about dynamic languages and their implementations are out of date. And contrary-wise, implying that generational garbage collection is the solution to GC pauses is ... optimistic. Especially when you consider the kind of real-time characteristics demanded by gamers.)
Here's a high-level bullet point comparison of the various techniques mentioned in the referenced quotation (plus "mark-and-compact" ... which is a variation on mark-and-sweep.)
The properties of reference counting collection are:
For classic mark-and-sweep:
Classical mark-and-sweep is sometimes modified so that the sweep phase compacts the free space by "sliding" non-garbage objects. This is called "mark-sweep-compact". This is fairly complicated but:
Modern collectors (including typical generational collectors) are based on mark-and-copy. The idea is that the collector traces objects in a "from space" copying them to a "to space". When it is done, the "to space" has a contiguous chunk of free space at the end which can be used for allocating new objects. The old "from space" is put on one side for the next time the garbage collector runs. The nice thing about copying collection is that the garbage collection cost associated with a garbage object is close to zero.
A generational collector is one where there are multiple spaces (generations), that are collected at different rates. This is based on the "weak generational hypothesis" that posits that most objects become unreachable quickly; i.e. they die young. So by garbage collecting the space containing the young objects, you reclaim a relatively large amount of space at relatively low cost. You still need to collect the older generations, but this can happen less frequently.
(A mark-and-sweep collector could be generational, but the pay-off isn't as great as for a copying collector.)
Here's the context of the quote:
Generational garbage collectors is the best answer I've got for that, because it reduces the pauses, and frankly, the garbage collectors for all the [new] dynamic languages today are crap. They're mark-and-sweep, or they're reference counted.
From the quote, he appears to be talking about fairly primitive GCs which aren't generational. Generational GCs can still be mark and sweep, but they have a lot less to mark most of the time, which makes them a lot faster than "mark and sweep the world every time".
Assuming that's what he meant, I agree - but he could have put it more clearly. Bear in mind that this was a talk rather than a doctoral thesis though - coming up with the clearest possible way of expressing yourself "on the hoof" is kinda tricky :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With