Short form: The CMS garbage collector appears to be failing to collect an ever-increasing amount of garbage; eventually, our JVM fills up, and the application becomes unresponsive. Forcing a GC via an external tool (JConsole or <code>jmap -histo:live</code>) cleans it up once. UPDATE: The problem appears to be related to the JTop plugin for JConsole; if we don't run JConsole, or run it without the JTop plugin, the behavior goes away. (Technical notes: we're running Sun JDK 1.6.0_07, 32-bit, on a Linux 2.6.9 box. Upgrading the JDK version is not really an option, unless there's an unavoidable, major reason. Also, our system is not hooked up to an Internet-accessible machine, so screenshots of JConsole, etc aren't an option.) We're currently running our JVM with the following flags: <pre class="prettyprint"><code>-server -Xms3072m -Xmx3072m -XX:NewSize=512m -XX:MaxNewSize=512m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+DisableExplicitGC </code></pre> Observing the memory graph in JConsole, there's a full GC that runs every ~15 minutes or so during the first several hours of our application's lifespan; after each full GC, there's more and more memory still in use. After a few hours, the system hits a steady state where there's approximately 2GB of used memory in the CMS old gen. Which sounds like a classic memory leak, except that if we use any tool that forces a full GC (hitting the "collect garbage" button in JConsole, or running <code>jmap -histo:live</code>, etc), the old gen suddenly drops to ~500MB used, and our application becomes responsive again for the next several hours (during which time the same pattern continues - after each full GC, more and more of the old gen is full.) One thing of note: in JConsole, the reported ConcurrentMarkSweep GC count will stay at 0 until we force a GC with jconsole/jmap/etc. Using <code>jmap -histo</code> and <code>jmap -histo:live</code> in sequence, I am able to determine that the apparently uncollected objects consist of: <ul> <li>several million <code>HashMap</code>s and arrays of <code>HashMap$Entry</code> (in a 1:1 ratio)</li> <li>several million <code>Vector</code>s and Object arrays (1:1 ratio, and about the same as the number of HashMaps)</li> <li>several million <code>HashSet</code>, <code>Hashtable</code>, and <code>com.sun.jmx.remote.util.OrderClassLoader</code>s, as well as arrays of <code>Hashtable$Entry</code> (about the same number of each; about half as many as the HashMaps and Vectors)</li> </ul> There are some excerpts from the GC output below; my interpretation of them appears to be that the CMS GC is getting aborted without failing over to the stop-the-world GC. Am I misinterpreting this output somehow? Is there something that would cause that? During the normal runtime, the CMS GC output blocks look about like this: <pre class="prettyprint"><code>36301.827: [GC [1 CMS-initial-mark: 1856321K(2621330K)] 1879456K(3093312K), 1.7634200 secs] [Times: user=0.17 sys=0.00, real=0.18 secs] 36303.638: [CMS-concurrent-mark-start] 36314.903: [CMS-concurrent-mark: 7.804/11.264 secs] [Times: user=2.13 sys=0.06, real=1.13 secs] 36314.903: [CMS-concurrent-preclean-start] 36314.963: [CMS-concurrent-preclean: 0.037/0.060 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 36314.963: [CMS-concurrent-abortable-preclean-start] 36315.195: [GC 36315.195: [ParNew: 428092K->40832K(471872K), 1.1705760 secs] 2284414K->1897153K(3093312K), 1.1710560 secs] [Times: user=0.13 sys=0.02, real=0.12 secs] CMS: abort preclean due to time 36320.059: [CMS-concurrent-abortable-preclean: 0.844/5.095 secs] [Times: user=0.74 sys=0.05, real=0.51 secs] 36320.062: [GC[YG occupancy: 146166 K (471872 K)]36320.062: [Rescan (parallel), 1.54078550 secs]36321.603: [weak refs processing, 0.0042640 secs] [1 CMS-remark: 1856321K(2621440K)] 2002488K(3093312K), 1.5456150 secs] [Times: user=0.18 sys=0.03, real=0.15 secs] 36321.608: [CMS-concurrent-sweep-start] 36324.650: [CMS-concurrent-sweep: 2.686/3.042 secs] [Times: uesr=0.66 sys=0.02, real=0.30 secs] 36324.651: [CMS-concurrent-reset-start] 36324.700: [CMS-concurrent-reset: 0.050/0.050 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] </code></pre> and that's it; the next line will be the next ParNew GC. When we force a GC using jmap -histo:live, we instead get: <pre class="prettyprint"><code>48004.088: [CMS-concurrent-mark: 8.012/8.647 secs] [Times: user=1.15 sys=0.02, real=0.87 secs] (concurrent mode interrupted) </code></pre> followed by ~125 lines of the form below: (some GeneratedMethodAccessor, some GeneratedSerializationConstructorAccessor, some GeneratedConstructorAccessor, etc) <pre class="prettyprint"><code>[Unloading class sun.reflect.GeneratedMethodAccessor3] </code></pre> followed by: <pre class="prettyprint"><code>: 1911295K->562232K(2621440K), 15.6886180 secs] 2366440K->562232K(3093312K), [CMS Perm: 52729K->51864K(65536K)], 15.6892270 secs] [Times: user=1.55 sys=0.01, real=1.57 secs] </code></pre> Thanks in advance!

<blockquote> Technical notes: we're running Sun JDK 1.6.0_07, 32-bit, on a Linux 2.6.9 box. Upgrading the JDK version is not really an option, unless there's an unavoidable, major reason. </blockquote> Several newer Java versions have had updates to the CMS garbage collector. Notably 6u12, 6u14, and 6u18. I'm not an expert with GC stuff, but I'm guessing the preclean fixes in 6u14 may fix the issue you're seeing. Of course, I could say the same thing about 6u18's class unloading bugs. Like I said, I'm not an expert at GC stuff. There are fixes for: <ul> <li>6u10: (affects 6u4+) CMS never clears referents when -XX:+ParallelRefProcEnabled</li> <li>6u12: CMS: Incorrect encoding of overflown object arrays during concurrent precleaning </li> <li>6u12: CMS: Incorrect overflow handling when using parallel concurrent marking </li> <li>6u14: CMS: assertion failure "is_cms_thread == Thread::current()->is_ConcurrentGC_thread()"</li> <li>6u14: CMS: Need CMSInitiatingPermOccupancyFraction for perm, divorcing from CMSInitiatingOccupancyFraction</li> <li>6u14: CMS assert: _concurrent_iteration_safe_limit update missed</li> <li>6u14: CMS: Incorrect overflow handling during precleaning of Reference lists</li> <li>6u14: SIGSEGV or (!is_null(v),"oop value can never be zero") assertion when running with CMS and COOPs</li> <li>6u14: CMS: Livelock in CompactibleFreeListSpace::block_size().</li> <li>6u14: Make CMS work with compressed oops</li> <li>6u18: CMS: core dump with -XX:+UseCompressedOops</li> <li>6u18: CMS: bugs related to class unloading</li> <li>6u18: CMS: ReduceInitialCardMarks unsafe in the presence of cms precleaning</li> <li>6u18: [Regression] -XX:NewRatio with -XX:+UseConcMarkSweepGC causes fatal error</li> <li>6u20: card marks can be deferred too long</li> </ul> In addition to all of the above, 6u14 also introduced the G1 garbage collector, although it is still in testing. G1 is intended to replace CMS in Java 7. G1 can be used in Java 6u14 and newer with the following command-line switches: <code>-XX:+UnlockExperimentalVMOptions -XX:+UseG1GC</code>

Java ConcurrentMarkSweep garbage collector not removing all garbage

Tags:

java

garbage-collection

concurrent-mark-sweep

Short form: The CMS garbage collector appears to be failing to collect an ever-increasing amount of garbage; eventually, our JVM fills up, and the application becomes unresponsive. Forcing a GC via an external tool (JConsole or jmap -histo:live) cleans it up once.

UPDATE: The problem appears to be related to the JTop plugin for JConsole; if we don't run JConsole, or run it without the JTop plugin, the behavior goes away.

(Technical notes: we're running Sun JDK 1.6.0_07, 32-bit, on a Linux 2.6.9 box. Upgrading the JDK version is not really an option, unless there's an unavoidable, major reason. Also, our system is not hooked up to an Internet-accessible machine, so screenshots of JConsole, etc aren't an option.)

We're currently running our JVM with the following flags:

-server -Xms3072m -Xmx3072m -XX:NewSize=512m -XX:MaxNewSize=512m 
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled 
-XX:CMSInitiatingOccupancyFraction=70 
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-XX:+DisableExplicitGC

Observing the memory graph in JConsole, there's a full GC that runs every ~15 minutes or so during the first several hours of our application's lifespan; after each full GC, there's more and more memory still in use. After a few hours, the system hits a steady state where there's approximately 2GB of used memory in the CMS old gen.

Which sounds like a classic memory leak, except that if we use any tool that forces a full GC (hitting the "collect garbage" button in JConsole, or running jmap -histo:live, etc), the old gen suddenly drops to ~500MB used, and our application becomes responsive again for the next several hours (during which time the same pattern continues - after each full GC, more and more of the old gen is full.)

One thing of note: in JConsole, the reported ConcurrentMarkSweep GC count will stay at 0 until we force a GC with jconsole/jmap/etc.

Using jmap -histo and jmap -histo:live in sequence, I am able to determine that the apparently uncollected objects consist of:

several million HashMaps and arrays of HashMap$Entry (in a 1:1 ratio)
several million Vectors and Object arrays (1:1 ratio, and about the same as the number of HashMaps)
several million HashSet, Hashtable, and com.sun.jmx.remote.util.OrderClassLoaders, as well as arrays of Hashtable$Entry (about the same number of each; about half as many as the HashMaps and Vectors)

There are some excerpts from the GC output below; my interpretation of them appears to be that the CMS GC is getting aborted without failing over to the stop-the-world GC. Am I misinterpreting this output somehow? Is there something that would cause that?

During the normal runtime, the CMS GC output blocks look about like this:

36301.827: [GC [1 CMS-initial-mark: 1856321K(2621330K)] 1879456K(3093312K), 1.7634200 secs] [Times: user=0.17 sys=0.00, real=0.18 secs]
36303.638: [CMS-concurrent-mark-start]
36314.903: [CMS-concurrent-mark: 7.804/11.264 secs] [Times: user=2.13 sys=0.06, real=1.13 secs]
36314.903: [CMS-concurrent-preclean-start]
36314.963: [CMS-concurrent-preclean: 0.037/0.060 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
36314.963: [CMS-concurrent-abortable-preclean-start]
36315.195: [GC 36315.195: [ParNew: 428092K->40832K(471872K), 1.1705760 secs] 2284414K->1897153K(3093312K), 1.1710560 secs] [Times: user=0.13 sys=0.02, real=0.12 secs]
CMS: abort preclean due to time 36320.059: [CMS-concurrent-abortable-preclean: 0.844/5.095 secs] [Times: user=0.74 sys=0.05, real=0.51 secs]
36320.062: [GC[YG occupancy: 146166 K (471872 K)]36320.062: [Rescan (parallel), 1.54078550 secs]36321.603: [weak refs processing, 0.0042640 secs] [1 CMS-remark: 1856321K(2621440K)] 2002488K(3093312K), 1.5456150 secs] [Times: user=0.18 sys=0.03, real=0.15 secs]
36321.608: [CMS-concurrent-sweep-start]
36324.650: [CMS-concurrent-sweep: 2.686/3.042 secs] [Times: uesr=0.66 sys=0.02, real=0.30 secs]
36324.651: [CMS-concurrent-reset-start]
36324.700: [CMS-concurrent-reset: 0.050/0.050 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]

and that's it; the next line will be the next ParNew GC.

When we force a GC using jmap -histo:live, we instead get:

48004.088: [CMS-concurrent-mark: 8.012/8.647 secs] [Times: user=1.15 sys=0.02, real=0.87 secs]
(concurrent mode interrupted)

followed by ~125 lines of the form below: (some GeneratedMethodAccessor, some GeneratedSerializationConstructorAccessor, some GeneratedConstructorAccessor, etc)

[Unloading class sun.reflect.GeneratedMethodAccessor3]

followed by:

: 1911295K->562232K(2621440K), 15.6886180 secs] 2366440K->562232K(3093312K), [CMS Perm: 52729K->51864K(65536K)], 15.6892270 secs] [Times: user=1.55 sys=0.01, real=1.57 secs]

Thanks in advance!

918

asked Oct 06 '10 14:10

Sbodd

2 Answers

com.sun.jmx.remote.util.OrderClassLoader is used in the remote'ing layer for JMX and a quick review of the code suggests they're created as part of the unmarshalling process for remote requests inside of the JVM. The lifetime of those classloaders will be directly related to the lifetime of the thing that was unmarshalled such that once there are no longer any references to that thing the classloader could be released.

I wouldn't be surprised if in this case the presence of these instances was a direct result of you using JConsole to examine the goings on in the JVM. And it looks like they'd just be cleaned up by GC as part of normal operation.

I guess it's possible there's a bug in the JMX implementation (seems unlikely in a relatively up-to-date JVM) or perhaps you have some custom MBeans or are using some custom JMX tools that are causing the problem. But ultimately, I'm suspecting the OrderClassLoader is probably a red-herring and the issue lies elsewhere (broken GC or some other leak).

181

answered Sep 22 '22 03:09

Dan Creswell

Technical notes: we're running Sun JDK 1.6.0_07, 32-bit, on a Linux 2.6.9 box. Upgrading the JDK version is not really an option, unless there's an unavoidable, major reason.

Several newer Java versions have had updates to the CMS garbage collector. Notably 6u12, 6u14, and 6u18.

I'm not an expert with GC stuff, but I'm guessing the preclean fixes in 6u14 may fix the issue you're seeing. Of course, I could say the same thing about 6u18's class unloading bugs. Like I said, I'm not an expert at GC stuff.

There are fixes for:

6u10: (affects 6u4+) CMS never clears referents when -XX:+ParallelRefProcEnabled
6u12: CMS: Incorrect encoding of overflown object arrays during concurrent precleaning
6u12: CMS: Incorrect overflow handling when using parallel concurrent marking
6u14: CMS: assertion failure "is_cms_thread == Thread::current()->is_ConcurrentGC_thread()"
6u14: CMS: Need CMSInitiatingPermOccupancyFraction for perm, divorcing from CMSInitiatingOccupancyFraction
6u14: CMS assert: _concurrent_iteration_safe_limit update missed
6u14: CMS: Incorrect overflow handling during precleaning of Reference lists
6u14: SIGSEGV or (!is_null(v),"oop value can never be zero") assertion when running with CMS and COOPs
6u14: CMS: Livelock in CompactibleFreeListSpace::block_size().
6u14: Make CMS work with compressed oops
6u18: CMS: core dump with -XX:+UseCompressedOops
6u18: CMS: bugs related to class unloading
6u18: CMS: ReduceInitialCardMarks unsafe in the presence of cms precleaning
6u18: [Regression] -XX:NewRatio with -XX:+UseConcMarkSweepGC causes fatal error
6u20: card marks can be deferred too long

In addition to all of the above, 6u14 also introduced the G1 garbage collector, although it is still in testing. G1 is intended to replace CMS in Java 7.

G1 can be used in Java 6u14 and newer with the following command-line switches:

-XX:+UnlockExperimentalVMOptions -XX:+UseG1GC

answered Sep 26 '22 03:09

Powerlord

Related questions
                            
                                Dynamically binding lists with Spring's form tag
                            
                                How many CPUs are needed before Erlang is faster than single-threaded Java [closed]
                            
                                Java automatic unboxing - is there a compiler warning?
                            
                                Comparing strings with JSTL
                            
                                How to find, with Java, if a certain font is installed correctly on a machine
                            
                                Stopping a parallel java task with ant
                            
                                Using GWT History to pass parameters?
                            
                                Core dump equivalent for Java
                            
                                What does flushing thread local memory to global memory mean?
                            
                                Good programming style when handling multiple objects
                            
                                Removing quotation marks in JSONObject
                            
                                ArrayList indexOf() returns wrong index?
                            
                                Hibernate Mapped Superclass relationships and overriding
                            
                                Mock File class and NullPointerException
                            
                                Eclipse Java: Template for the 'create field' quick fix suggestion?
                            
                                Maven and GWT (and Eclipse) - Does it really work?
                            
                                Java libraries to manage CSS explosion and or reuse?
                            
                                Where can I download the javadoc for swt(Only)
                            
                                How to reference a system property within a user-defined .properties file?
                            
                                CRC-16 with 0xA001 polynomial

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With