Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any hard data on GC vs explicit memory management performance?

I recently read the excellent article "The Transactional Memory / Garbage Collection Analogy" by Dan Grossman. One sentence really caught my attention:

In theory, garbage collection can improve performance by increasing spatial locality (due to object-relocation), but in practice we pay a moderate performance cost for software engineering benefits.

Until then, my feeling had always been very vague about it. Over and over, you see claims that GC can be more efficient, so I always kept that notion in the back of my head. After reading this, however, I started having serious doubts.

As an experiment to measure the impact on GC languages, some people took some Java programs, traced the execution, and then replaced garbage collection with explicit memory management. According to this review of the article on Lambda the ultimate, they found out that GC was always slower. Virtual memory issues made GC look even worse, since the collector regularly touches way more memory pages than the program itself at that point, and therefore causes a lot of swapping.

This is all experimental to me. Has anybody, and in particular in the context of C++, performed a comprehensive benchmark of GC performance when comparing to explicit memory management?

Particularly interesting would be to compare how various big open-source projects, for example, perform with or without GC. Has anybody heard of such results before?

EDIT: And please focus on the performance problem, not on why GC exists or why it is beneficial.

Cheers,

Carl

PS. In case you're already pulling out the flame-thrower: I am not trying to disqualify GC, I'm just trying to get a definitive answer to the performance question.

like image 297
Carl Seleborg Avatar asked Apr 16 '09 12:04

Carl Seleborg


People also ask

How does garbage collection improve program performance?

If your application exhibits unacceptably high latencies, you might improve performance by modifying your JVM's garbage collection behavior. Garbage collection, while necessary, introduces latency into your system by consuming resources that would otherwise be available to your application.

What is GC in performance testing?

Garbage collection is related to memory management and helps to improve the performance of the application. Garbage collection in a system should not be too quick or too late. Too many GC cycles degrade the performance of the system and causing the spike in CPU whereas a delay in GC cycles leads to memory leakage.

What is heap and GC?

The JVM runtime environment uses a large memory pool called the heap for object allocation. The JVM automatically invokes garbage collections in order to clean up the heap of unreferenced or dead objects.


2 Answers

This turns into another flamewar with a lot of "my gut feeling". Some hard data for a change (papers contain details, benchmarks, graphs, etc.):

http://www.cs.umass.edu/~emery/pubs/04-17.pdf says:

"Conclusion. The controversy over garbage collection’s performance impact has long overshadowed the software engineering benefi it provides.This paper introduces a tracing and simulation-based oracular memory manager. Using this framework, we execute a range of unaltered Java benchmarks using both garbage collection and explicit memory management. Comparing runtime, space consumption, and virtual memory footprints, we find that when space is plentiful, the runtime performance of garbage collection can be competitive with explicit memory management, and can even outperform it by up to 4%. We fi that copying garbage collection canrequire six times the physical memory as the Lea or Kingsley allocators to provide comparable performance."

When you have enough memory, copying GC becomes faster than explicit free() - http://www.cs.ucsb.edu/~grze/papers/gc/appel87garbage.pdf

It also depends on what language you use - Java will have to do a lot of rewriting (stack, objects, generations) on each collection and writing a multithreaded GC that doesn't have to stop the world in JVM would be a great achievement. On the other hand, you get that almost for free in Haskell where GC time will rarely be >5%, while alloc time is almost 0. It really depends what you're doing and in what environment.

like image 80
viraptor Avatar answered Nov 11 '22 05:11

viraptor


The cost of memory allocation is generally much lower in a garbage collected memory model, then when just using new or malloc explicitly because garbage collectors generally pre-allocate this memory. However, explicit memory models may also do this (using memory pools or memory areas); making the cost of memory allocation equivalent to a pointer addition.

As Raymond Chen and Rico Mariani pointed out, managed languages tend to out perform unmanaged languages in the general case. However, after pushing it, the unmanaged language can and will eventually beat the GC/Jitted language.

The same thing is also evident in the Computer Language Shootout because even though C++ tends to rank higher than Java most of the time, you'll often see C++ implementations jumping trough various hoops (such as object pools) to achieve optimal performance. Garbage collected languages, however, tend to have easier to follow and more straight forward implementations because the GC is better at allocating (small chunks of) memory.

However, performance isn't the biggest difference when it comes to GC vs non-GC; arguably it's the deterministic finalization (or RIIA) of non-GC (and reference counted) languages that is the biggest argument for explicit memory management because this is generally used for purposes other than memory management (such as releasing locks, closing file or window handles et cetera). 'Recently' however C# introduced the using / IDisposable construct to do exactly this.

Another problem with garbage collection is that the systems they use tend to be rather complex to prevent memory leaks. However, this also makes it way more difficult to debug and track down once you do have a memory leak (yes, even garbage collected languages can have memory leaks).

On the flip side, the garbage collected language can do the most optimal thing at the most optimal time (or approximately) without having to burden the developer with that task. This means that developing for a GC language might be more natural, so you can focus more on the real problem.

like image 33
Jasper Bekkers Avatar answered Nov 11 '22 04:11

Jasper Bekkers