Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quantifying the Performance of Garbage Collection vs. Explicit Memory Management

Tags:

I found this article here:

Quantifying the Performance of Garbage Collection vs. Explicit Memory Management

http://www.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf

In the conclusion section, it reads:

Comparing runtime, space consumption, and virtual memory footprints over a range of benchmarks, we show that the runtime performance of the best-performing garbage collector is competitive with explicit memory management when given enough memory. In particular, when garbage collection has five times as much memory as required, its runtime performance matches or slightly exceeds that of explicit memory management. However, garbage collection’s performance degrades substantially when it must use smaller heaps. With three times as much memory, it runs 17% slower on average, and with twice as much memory, it runs 70% slower. Garbage collection also is more susceptible to paging when physical memory is scarce. In such conditions, all of the garbage collectors we examine here suffer order-of-magnitude performance penalties relative to explicit memory management.

So, if my understanding is correct: if I have an app written in native C++ requiring 100 MB of memory, to achieve the same performance with a "managed" (i.e. garbage collector based) language (e.g. Java, C#), the app should require 5*100 MB = 500 MB? (And with 2*100 MB = 200 MB, the managed app would run 70% slower than the native app?)

Do you know if current (i.e. latest Java VM's and .NET 4.0's) garbage collectors suffer the same problems described in the aforementioned article? Has the performance of modern garbage collectors improved?

Thanks.

like image 256
EmbeddedProg Avatar asked Jun 05 '10 22:06

EmbeddedProg


3 Answers

if I have an app written in native C++ requiring 100 MB of memory, to achieve the same performance with a "managed" (i.e. garbage collector based) language (e.g. Java, C#), the app should require 5*100 MB = 500 MB? (And with 2*100 MB = 200 MB, the managed app would run 70% slower than the native app?)

Only if the app is bottlenecked on allocating and deallocating memory. Note that the paper talks exclusively about the performance of the garbage collector itself.

like image 122
Michael Borgwardt Avatar answered Sep 30 '22 06:09

Michael Borgwardt


You seem to be asking two things:

  • have GC's improved since that research was performed, and
  • can I use the conclusions of the paper as a formula to predict required memory.

The answer to the first is that there have been no major breakthroughs in GC algorithms that would invalidate the general conclusions:

  • GC'ed memory management still requires significantly more virtual memory.
  • If you try to constrain the heap size the GC performance drops significantly.
  • If real memory is restricted, the GC'ed memory management approach results in substantially worse performance due to paging overheads.

However, the conclusions cannot really be used as a formula:

  • The original study was done with JikesRVM rather than a Sun JVM.
  • The Sun JVM's garbage collectors have improved in the ~5 years since the study.
  • The study does not seem to take into account that Java data structures take more space than equivalent C++ data structures for reasons that are not GC related.

On the last point, I have seen a presentation by someone that talks about Java memory overheads. For instance, it found that the minimum representation size of a Java String is something like 48 bytes. (A String consists of two primitive objects; one an Object with 4 word-sized fields and the other an array with a minimum of 1 word of content. Each primitive object also has 3 or 4 words of overhead.) Java collection data structures similarly use far more memory than people realize.

These overheads are not GC-related per se. Rather they are direct and indirect consequences of design decisions in the Java language, JVM and class libraries. For example:

  • Each Java primitive object header1 reserves one word for the object's "identity hashcode" value, and one or more words for representing the object lock.
  • The representation of a String has to use a separate "array of characters" because of JVM limitations. Two of the three other fields are an attempt to make the substring operation less memory intensive.
  • The Java collection types use a lot of memory because collection elements cannot be directly chained. So for example, the overheads of a (hypothetical) singly linked list collection class in Java would be 6 words per list element. By contrast an optimal C/C++ linked list (i.e. with each element having a "next" pointer) has an overhead of one word per list element.

1 - In fact, the overheads are less than this on average. The JVM only "inflates" a lock following use & contention, and similar tricks are used for the identity hashcode. The fixed overhead is only a few bits. However, these bits add up to a measurably larger object header ... which is the real point here.

like image 26
Stephen C Avatar answered Sep 30 '22 04:09

Stephen C


Michael Borgwardt is kind of right about if the application is bottlenecked on allocating memory. This is according to Amdahl's law.

However, I have used C++, Java, and VB .NET. In C++ there are powerful techniques available that allocate memory on the stack instead of the heap. Stack allocation is easily a hundreds of times faster than heap allocation. I would say that use of these techniques could remove maybe one allocation in eight, and use of writable strings one allocation in four.

It's no joke when people claim highly optimized C++ code can trounce the best possible Java code. It's the flat out truth.

Microsoft claims the overhead in using any of the .NET family of languages over C++ is about two to one. I believe that number is just about right for most things.

HOWEVER, managed environments carry a particular benefit in that when dealing with inferior programmers you don't have to worry about one module trashing another module's memory and the resulting crash being blamed on the wrong developer and the bug difficult to find.

like image 20
Joshua Avatar answered Sep 30 '22 04:09

Joshua