What's wrong with mark-and-sweep GCs?

Tags:

I'm reading Steve Yegge's "Dynamic Languages Strike Back" talk, and in it he sort of criticizes mark-and-sweep GCs (about 5-10 percent through that link, the "Pigs attempt's to fly" slide) What's wrong with them?

920

asked Oct 23 '09 06:10

RCIX

2 Answers

^{(It is worth noting that Steve Yegge's talk was presented a long time ago now, and that some of the generalizations he makes about dynamic languages and their implementations are out of date. And contrary-wise, implying that generational garbage collection is the solution to GC pauses is ... optimistic. Especially when you consider the kind of real-time characteristics demanded by gamers.)}

Here's a high-level bullet point comparison of the various techniques mentioned in the referenced quotation (plus "mark-and-compact" ... which is a variation on mark-and-sweep.)

The properties of reference counting collection are:

PRO - garbage is reclaimed immediately (apart from cycles)
PRO - garbage collection pauses are smaller, and minimal if you can defer updating the "free space" data structure.
CON - reference counts need to be adjusted on most pointer write operation
CON - free space is never compacted
CON - because free space is not compacted, a "free space" data structure must be maintained which increases allocation and deallocation costs.
CON - cyclic garbage is not collected, unless the application breaks the cycle by hand.
CON - updating reference counts in a multi-threaded app is extra expensive.

For classic mark-and-sweep:

PRO - no pointer write overhead
PRO - cyclic data is collected
PRO - storage management concurrency bottlenecks can be avoided (apart from GC)
CON - stop-the-world garbage collection
CON - free space is never compacted
CON - because free space is not compacted, a "free space" data structure must be maintained which increases allocation and deallocation costs.

Classical mark-and-sweep is sometimes modified so that the sweep phase compacts the free space by "sliding" non-garbage objects. This is called "mark-sweep-compact". This is fairly complicated but:

PRO - no pointer write overhead
PRO - cyclic data is collected
PRO - storage management concurrency bottlenecks can be easily avoided (apart from GC)
CON - stop-the-world garbage collection
PRO - free space is compacted, so allocation is cheap
CON - the compact phase is rather expensive

Modern collectors (including typical generational collectors) are based on mark-and-copy. The idea is that the collector traces objects in a "from space" copying them to a "to space". When it is done, the "to space" has a contiguous chunk of free space at the end which can be used for allocating new objects. The old "from space" is put on one side for the next time the garbage collector runs. The nice thing about copying collection is that the garbage collection cost associated with a garbage object is close to zero.

CON - pointer write overhead (to record when a "new generation" pointer is written into an "old generation" object)
PRO - cyclic data is collected
PRO - storage management concurrency bottlenecks can be easily avoided (apart from GC)
CON - stop-the-world garbage collection, though this can be mitigated at the cost of some runtime overheads
PRO - with generational collectors, you usually GC just part of the heap with lots of garbage, and hence GC overheads are less on average
PRO - smaller GC pauses (most of the time)
PRO - free space is compacted, so allocation is cheap
PRO - compaction comes more cheaply than with a sliding compacter
CON - you need to reserve an extra object space for the collector.

A generational collector is one where there are multiple spaces (generations), that are collected at different rates. This is based on the "weak generational hypothesis" that posits that most objects become unreachable quickly; i.e. they die young. So by garbage collecting the space containing the young objects, you reclaim a relatively large amount of space at relatively low cost. You still need to collect the older generations, but this can happen less frequently.

(A mark-and-sweep collector could be generational, but the pay-off isn't as great as for a copying collector.)

165

answered Nov 20 '22 21:11

Stephen C

Here's the context of the quote:

Generational garbage collectors is the best answer I've got for that, because it reduces the pauses, and frankly, the garbage collectors for all the [new] dynamic languages today are crap. They're mark-and-sweep, or they're reference counted.

From the quote, he appears to be talking about fairly primitive GCs which aren't generational. Generational GCs can still be mark and sweep, but they have a lot less to mark most of the time, which makes them a lot faster than "mark and sweep the world every time".

Assuming that's what he meant, I agree - but he could have put it more clearly. Bear in mind that this was a talk rather than a doctoral thesis though - coming up with the clearest possible way of expressing yourself "on the hoof" is kinda tricky :)

answered Nov 20 '22 23:11

Jon Skeet

Related questions
                            
                                Java faster than C [duplicate]
                            
                                Does a hash shrink in Perl as you delete elements?
                            
                                How does String.Contains work? [duplicate]
                            
                                Opencart extremely slow loading speed
                            
                                What is a good scripting language to integrate into high-performance applications?
                            
                                How can I test the performance of a C function?
                            
                                Improving performance of string concatenation in Java [duplicate]
                            
                                Char Array vs String: which is better for storing a set of letters
                            
                                Loop efficiency - C++
                            
                                System.out.println() vs \n in Java
                            
                                Regarding storing Lat / Lng coordinates in Postgresql (Column type)
                            
                                Stack performance in programming languages
                            
                                VBS vs PowerShell: Which is lighter?
                            
                                Javascript fastest way to remove Object from Array
                            
                                Does the number of columns returned affect the speed of a query?
                            
                                Wanted: Very Fast Linked Lists in C
                            
                                Poor performance of many if-else statements in Java
                            
                                Is $_ more efficient than a named variable in Perl's foreach?
                            
                                How can I improve/replace sprintf, which I've measured to be a performance hotspot?
                            
                                Delete large amount of data in sql server

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's wrong with mark-and-sweep GCs?

Tags:

performance

garbage-collection

RCIX

People also ask

2 Answers

Stephen C

Jon Skeet

Recent Activity

Donate For Us