Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java ConcurrentMarkSweep garbage collector not removing all garbage

Short form: The CMS garbage collector appears to be failing to collect an ever-increasing amount of garbage; eventually, our JVM fills up, and the application becomes unresponsive. Forcing a GC via an external tool (JConsole or jmap -histo:live) cleans it up once.

UPDATE: The problem appears to be related to the JTop plugin for JConsole; if we don't run JConsole, or run it without the JTop plugin, the behavior goes away.

(Technical notes: we're running Sun JDK 1.6.0_07, 32-bit, on a Linux 2.6.9 box. Upgrading the JDK version is not really an option, unless there's an unavoidable, major reason. Also, our system is not hooked up to an Internet-accessible machine, so screenshots of JConsole, etc aren't an option.)

We're currently running our JVM with the following flags:

-server -Xms3072m -Xmx3072m -XX:NewSize=512m -XX:MaxNewSize=512m 
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled 
-XX:CMSInitiatingOccupancyFraction=70 
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-XX:+DisableExplicitGC

Observing the memory graph in JConsole, there's a full GC that runs every ~15 minutes or so during the first several hours of our application's lifespan; after each full GC, there's more and more memory still in use. After a few hours, the system hits a steady state where there's approximately 2GB of used memory in the CMS old gen.

Which sounds like a classic memory leak, except that if we use any tool that forces a full GC (hitting the "collect garbage" button in JConsole, or running jmap -histo:live, etc), the old gen suddenly drops to ~500MB used, and our application becomes responsive again for the next several hours (during which time the same pattern continues - after each full GC, more and more of the old gen is full.)

One thing of note: in JConsole, the reported ConcurrentMarkSweep GC count will stay at 0 until we force a GC with jconsole/jmap/etc.

Using jmap -histo and jmap -histo:live in sequence, I am able to determine that the apparently uncollected objects consist of:

  • several million HashMaps and arrays of HashMap$Entry (in a 1:1 ratio)
  • several million Vectors and Object arrays (1:1 ratio, and about the same as the number of HashMaps)
  • several million HashSet, Hashtable, and com.sun.jmx.remote.util.OrderClassLoaders, as well as arrays of Hashtable$Entry (about the same number of each; about half as many as the HashMaps and Vectors)

There are some excerpts from the GC output below; my interpretation of them appears to be that the CMS GC is getting aborted without failing over to the stop-the-world GC. Am I misinterpreting this output somehow? Is there something that would cause that?

During the normal runtime, the CMS GC output blocks look about like this:

36301.827: [GC [1 CMS-initial-mark: 1856321K(2621330K)] 1879456K(3093312K), 1.7634200 secs] [Times: user=0.17 sys=0.00, real=0.18 secs]
36303.638: [CMS-concurrent-mark-start]
36314.903: [CMS-concurrent-mark: 7.804/11.264 secs] [Times: user=2.13 sys=0.06, real=1.13 secs]
36314.903: [CMS-concurrent-preclean-start]
36314.963: [CMS-concurrent-preclean: 0.037/0.060 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
36314.963: [CMS-concurrent-abortable-preclean-start]
36315.195: [GC 36315.195: [ParNew: 428092K->40832K(471872K), 1.1705760 secs] 2284414K->1897153K(3093312K), 1.1710560 secs] [Times: user=0.13 sys=0.02, real=0.12 secs]
CMS: abort preclean due to time 36320.059: [CMS-concurrent-abortable-preclean: 0.844/5.095 secs] [Times: user=0.74 sys=0.05, real=0.51 secs]
36320.062: [GC[YG occupancy: 146166 K (471872 K)]36320.062: [Rescan (parallel), 1.54078550 secs]36321.603: [weak refs processing, 0.0042640 secs] [1 CMS-remark: 1856321K(2621440K)] 2002488K(3093312K), 1.5456150 secs] [Times: user=0.18 sys=0.03, real=0.15 secs]
36321.608: [CMS-concurrent-sweep-start]
36324.650: [CMS-concurrent-sweep: 2.686/3.042 secs] [Times: uesr=0.66 sys=0.02, real=0.30 secs]
36324.651: [CMS-concurrent-reset-start]
36324.700: [CMS-concurrent-reset: 0.050/0.050 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]

and that's it; the next line will be the next ParNew GC.

When we force a GC using jmap -histo:live, we instead get:

48004.088: [CMS-concurrent-mark: 8.012/8.647 secs] [Times: user=1.15 sys=0.02, real=0.87 secs]
(concurrent mode interrupted)

followed by ~125 lines of the form below: (some GeneratedMethodAccessor, some GeneratedSerializationConstructorAccessor, some GeneratedConstructorAccessor, etc)

[Unloading class sun.reflect.GeneratedMethodAccessor3]

followed by:

: 1911295K->562232K(2621440K), 15.6886180 secs] 2366440K->562232K(3093312K), [CMS Perm: 52729K->51864K(65536K)], 15.6892270 secs] [Times: user=1.55 sys=0.01, real=1.57 secs]

Thanks in advance!

like image 918
Sbodd Avatar asked Oct 06 '10 14:10

Sbodd


People also ask

Does Java garbage collector clean both heap and stack memory?

Java Heap space is used by java runtime to allocate memory to Objects and JRE classes. Whenever we create an object, it's always created in the Heap space. Garbage Collection runs on the heap memory to free the memory used by objects that don't have any reference.


2 Answers

com.sun.jmx.remote.util.OrderClassLoader is used in the remote'ing layer for JMX and a quick review of the code suggests they're created as part of the unmarshalling process for remote requests inside of the JVM. The lifetime of those classloaders will be directly related to the lifetime of the thing that was unmarshalled such that once there are no longer any references to that thing the classloader could be released.

I wouldn't be surprised if in this case the presence of these instances was a direct result of you using JConsole to examine the goings on in the JVM. And it looks like they'd just be cleaned up by GC as part of normal operation.

I guess it's possible there's a bug in the JMX implementation (seems unlikely in a relatively up-to-date JVM) or perhaps you have some custom MBeans or are using some custom JMX tools that are causing the problem. But ultimately, I'm suspecting the OrderClassLoader is probably a red-herring and the issue lies elsewhere (broken GC or some other leak).

like image 181
Dan Creswell Avatar answered Sep 22 '22 03:09

Dan Creswell


Technical notes: we're running Sun JDK 1.6.0_07, 32-bit, on a Linux 2.6.9 box. Upgrading the JDK version is not really an option, unless there's an unavoidable, major reason.

Several newer Java versions have had updates to the CMS garbage collector. Notably 6u12, 6u14, and 6u18.

I'm not an expert with GC stuff, but I'm guessing the preclean fixes in 6u14 may fix the issue you're seeing. Of course, I could say the same thing about 6u18's class unloading bugs. Like I said, I'm not an expert at GC stuff.

There are fixes for:

  • 6u10: (affects 6u4+) CMS never clears referents when -XX:+ParallelRefProcEnabled
  • 6u12: CMS: Incorrect encoding of overflown object arrays during concurrent precleaning
  • 6u12: CMS: Incorrect overflow handling when using parallel concurrent marking
  • 6u14: CMS: assertion failure "is_cms_thread == Thread::current()->is_ConcurrentGC_thread()"
  • 6u14: CMS: Need CMSInitiatingPermOccupancyFraction for perm, divorcing from CMSInitiatingOccupancyFraction
  • 6u14: CMS assert: _concurrent_iteration_safe_limit update missed
  • 6u14: CMS: Incorrect overflow handling during precleaning of Reference lists
  • 6u14: SIGSEGV or (!is_null(v),"oop value can never be zero") assertion when running with CMS and COOPs
  • 6u14: CMS: Livelock in CompactibleFreeListSpace::block_size().
  • 6u14: Make CMS work with compressed oops
  • 6u18: CMS: core dump with -XX:+UseCompressedOops
  • 6u18: CMS: bugs related to class unloading
  • 6u18: CMS: ReduceInitialCardMarks unsafe in the presence of cms precleaning
  • 6u18: [Regression] -XX:NewRatio with -XX:+UseConcMarkSweepGC causes fatal error
  • 6u20: card marks can be deferred too long

In addition to all of the above, 6u14 also introduced the G1 garbage collector, although it is still in testing. G1 is intended to replace CMS in Java 7.

G1 can be used in Java 6u14 and newer with the following command-line switches:

-XX:+UnlockExperimentalVMOptions -XX:+UseG1GC

like image 38
Powerlord Avatar answered Sep 26 '22 03:09

Powerlord