Short form: The CMS garbage collector appears to be failing to collect an ever-increasing amount of garbage; eventually, our JVM fills up, and the application becomes unresponsive. Forcing a GC via an external tool (JConsole or jmap -histo:live
) cleans it up once.
UPDATE: The problem appears to be related to the JTop plugin for JConsole; if we don't run JConsole, or run it without the JTop plugin, the behavior goes away.
(Technical notes: we're running Sun JDK 1.6.0_07, 32-bit, on a Linux 2.6.9 box. Upgrading the JDK version is not really an option, unless there's an unavoidable, major reason. Also, our system is not hooked up to an Internet-accessible machine, so screenshots of JConsole, etc aren't an option.)
We're currently running our JVM with the following flags:
-server -Xms3072m -Xmx3072m -XX:NewSize=512m -XX:MaxNewSize=512m
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=70
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+DisableExplicitGC
Observing the memory graph in JConsole, there's a full GC that runs every ~15 minutes or so during the first several hours of our application's lifespan; after each full GC, there's more and more memory still in use. After a few hours, the system hits a steady state where there's approximately 2GB of used memory in the CMS old gen.
Which sounds like a classic memory leak, except that if we use any tool that forces a full GC (hitting the "collect garbage" button in JConsole, or running jmap -histo:live
, etc), the old gen suddenly drops to ~500MB used, and our application becomes responsive again for the next several hours (during which time the same pattern continues - after each full GC, more and more of the old gen is full.)
One thing of note: in JConsole, the reported ConcurrentMarkSweep GC count will stay at 0 until we force a GC with jconsole/jmap/etc.
Using jmap -histo
and jmap -histo:live
in sequence, I am able to determine that the apparently uncollected objects consist of:
HashMap
s and arrays of HashMap$Entry
(in a 1:1 ratio)Vector
s and Object arrays (1:1 ratio, and about the same as the number of HashMaps)HashSet
, Hashtable
, and com.sun.jmx.remote.util.OrderClassLoader
s, as well as arrays of Hashtable$Entry
(about the same number of each; about half as many as the HashMaps and Vectors)There are some excerpts from the GC output below; my interpretation of them appears to be that the CMS GC is getting aborted without failing over to the stop-the-world GC. Am I misinterpreting this output somehow? Is there something that would cause that?
During the normal runtime, the CMS GC output blocks look about like this:
36301.827: [GC [1 CMS-initial-mark: 1856321K(2621330K)] 1879456K(3093312K), 1.7634200 secs] [Times: user=0.17 sys=0.00, real=0.18 secs]
36303.638: [CMS-concurrent-mark-start]
36314.903: [CMS-concurrent-mark: 7.804/11.264 secs] [Times: user=2.13 sys=0.06, real=1.13 secs]
36314.903: [CMS-concurrent-preclean-start]
36314.963: [CMS-concurrent-preclean: 0.037/0.060 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
36314.963: [CMS-concurrent-abortable-preclean-start]
36315.195: [GC 36315.195: [ParNew: 428092K->40832K(471872K), 1.1705760 secs] 2284414K->1897153K(3093312K), 1.1710560 secs] [Times: user=0.13 sys=0.02, real=0.12 secs]
CMS: abort preclean due to time 36320.059: [CMS-concurrent-abortable-preclean: 0.844/5.095 secs] [Times: user=0.74 sys=0.05, real=0.51 secs]
36320.062: [GC[YG occupancy: 146166 K (471872 K)]36320.062: [Rescan (parallel), 1.54078550 secs]36321.603: [weak refs processing, 0.0042640 secs] [1 CMS-remark: 1856321K(2621440K)] 2002488K(3093312K), 1.5456150 secs] [Times: user=0.18 sys=0.03, real=0.15 secs]
36321.608: [CMS-concurrent-sweep-start]
36324.650: [CMS-concurrent-sweep: 2.686/3.042 secs] [Times: uesr=0.66 sys=0.02, real=0.30 secs]
36324.651: [CMS-concurrent-reset-start]
36324.700: [CMS-concurrent-reset: 0.050/0.050 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
and that's it; the next line will be the next ParNew GC.
When we force a GC using jmap -histo:live, we instead get:
48004.088: [CMS-concurrent-mark: 8.012/8.647 secs] [Times: user=1.15 sys=0.02, real=0.87 secs]
(concurrent mode interrupted)
followed by ~125 lines of the form below: (some GeneratedMethodAccessor, some GeneratedSerializationConstructorAccessor, some GeneratedConstructorAccessor, etc)
[Unloading class sun.reflect.GeneratedMethodAccessor3]
followed by:
: 1911295K->562232K(2621440K), 15.6886180 secs] 2366440K->562232K(3093312K), [CMS Perm: 52729K->51864K(65536K)], 15.6892270 secs] [Times: user=1.55 sys=0.01, real=1.57 secs]
Thanks in advance!
Java Heap space is used by java runtime to allocate memory to Objects and JRE classes. Whenever we create an object, it's always created in the Heap space. Garbage Collection runs on the heap memory to free the memory used by objects that don't have any reference.
com.sun.jmx.remote.util.OrderClassLoader is used in the remote'ing layer for JMX and a quick review of the code suggests they're created as part of the unmarshalling process for remote requests inside of the JVM. The lifetime of those classloaders will be directly related to the lifetime of the thing that was unmarshalled such that once there are no longer any references to that thing the classloader could be released.
I wouldn't be surprised if in this case the presence of these instances was a direct result of you using JConsole to examine the goings on in the JVM. And it looks like they'd just be cleaned up by GC as part of normal operation.
I guess it's possible there's a bug in the JMX implementation (seems unlikely in a relatively up-to-date JVM) or perhaps you have some custom MBeans or are using some custom JMX tools that are causing the problem. But ultimately, I'm suspecting the OrderClassLoader is probably a red-herring and the issue lies elsewhere (broken GC or some other leak).
Technical notes: we're running Sun JDK 1.6.0_07, 32-bit, on a Linux 2.6.9 box. Upgrading the JDK version is not really an option, unless there's an unavoidable, major reason.
Several newer Java versions have had updates to the CMS garbage collector. Notably 6u12, 6u14, and 6u18.
I'm not an expert with GC stuff, but I'm guessing the preclean fixes in 6u14 may fix the issue you're seeing. Of course, I could say the same thing about 6u18's class unloading bugs. Like I said, I'm not an expert at GC stuff.
There are fixes for:
In addition to all of the above, 6u14 also introduced the G1 garbage collector, although it is still in testing. G1 is intended to replace CMS in Java 7.
G1 can be used in Java 6u14 and newer with the following command-line switches:
-XX:+UnlockExperimentalVMOptions -XX:+UseG1GC
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With