Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java CMS Being Ignored And Getting Full GC Instead

I'm running a Java server that uses CMS for the tenured collector. Running under a load-test, I see young collections about every 1s and tenured (concurrent) about every 5m. This is good.

When I run with real traffic of about 1/2 capacity, I get young collections about every 4s and tenured (!parallel, stop the world!) about every 7m. Why is the JVM deciding to do full stop-the-world collections instead of using the CMS collector?

From the gc.log you can see the "Full GC" being run and taking over 3s to complete. There's no concurrent mode failure here. Nothing is explicitly requesting a collection.

1350.596: [GC 1350.596: [ParNew
Desired survivor size 119275520 bytes, new threshold 3 (max 3)
- age   1:   34779376 bytes,   34779376 total
- age   2:   17072392 bytes,   51851768 total
- age   3:   24120992 bytes,   75972760 total
: 1765625K->116452K(1864192K), 0.1560370 secs] 3887120K->2277489K(5009920K), 0.1561920 secs] [Times: user=0.40 sys=0.04, real=0.16 secs] 
1355.106: [GC 1355.107: [ParNew
Desired survivor size 119275520 bytes, new threshold 3 (max 3)
- age   1:   44862680 bytes,   44862680 total
- age   2:   20363280 bytes,   65225960 total
- age   3:   16908840 bytes,   82134800 total
: 1747684K->123571K(1864192K), 0.1068880 secs] 3908721K->2307790K(5009920K), 0.1070130 secs] [Times: user=0.29 sys=0.04, real=0.11 secs] 
1356.106: [Full GC 1356.106: [CMS: 2184218K->1268401K(3145728K), 3.0678070 secs] 2682861K->1268401K(5009920K), [CMS Perm : 145090K->145060K(262144K)], 3.0679600 secs] [Times: user=3.05 sys=0.02, real=3.07 secs] 
1361.375: [GC 1361.375: [ParNew
Desired survivor size 119275520 bytes, new threshold 3 (max 3)
- age   1:   33708472 bytes,   33708472 total
: 1631232K->84465K(1864192K), 0.0189890 secs] 2899633K->1352866K(5009920K), 0.0191530 secs] [Times: user=0.19 sys=0.00, real=0.02 secs] 
1365.587: [GC 1365.587: [ParNew
Desired survivor size 119275520 bytes, new threshold 3 (max 3)
- age   1:   33475320 bytes,   33475320 total
- age   2:   22698536 bytes,   56173856 total
: 1715697K->67421K(1864192K), 0.0229540 secs] 2984098K->1335822K(5009920K), 0.0231240 secs] [Times: user=0.25 sys=0.00, real=0.03 secs] 

Here are the JVM flags:

-server -Xss256K -Xms5120M -Xmx5120M -XX:NewSize=2048M -XX:MaxNewSize=2048M
-XX:SurvivorRatio=7 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80
-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSFullGCsBeforeCompaction=1
-XX:SoftRefLRUPolicyMSPerMB=73 -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -Xloggc:logs/gc.log
-XX:MaxPermSize=256m -XX:PermSize=256m -XX:MaxTenuringThreshold=3
like image 806
Brian White Avatar asked Apr 19 '11 16:04

Brian White


People also ask

What triggers full GC Java?

A Full GC will be triggered whenever the heap fills up. In such a case the young generation is collected first followed by the old generation.

What causes full GC?

A full heap garbage collection (Full GC) is often very time consuming. Full GCs caused by too high heap occupancy in the old generation can be detected by finding the words Pause Full (Allocation Failure) in the log.

Is full GC a problem?

In healthy applications, most of the time young GC runs and, very few times, a full GC runs. If a full GC runs consecutively, it's indicative of a problem. Even though full GCs were consecutively running, the system wasn't able to reclaim enough memory to continue.

What triggers major garbage collection?

The Old Generation is used to store long surviving objects. Typically, a threshold is set for young generation object and when that age is met, the object gets moved to the old generation. Eventually the old generation needs to be collected. This event is called a major garbage collection.


2 Answers

If your survivor space is not large enough, it can trigger a Full GC. (It appears to be complaining about he survivor ratio)

Either you need to decrease your survivor ratio or a better solution is likely to be to increase your NewSize so that less objects survive from the eden space. I have an eden space of 6 GB ;)

like image 127
Peter Lawrey Avatar answered Oct 07 '22 19:10

Peter Lawrey


I seem to recall seeing a similar phenomenon last year when tuning a large heap to avoid full GC. I think you might want to decrease the size of eden. That's fairly large compared to the tenured generation.

What I believe might be happening is that more of your eden gets "old" at once with your 1/2 speed traffic than it does at full speed (where they're not surviving). Which means more of it needs to move to tenured at once. And if it doesn't fit at that time it might trigger a full GC to make room.

For reference here's what we use now for 6GB to 24GB heaps:

-XX:NewRatio=4 -XX:SurvivorRatio=8 -XX:+UseCompressedOops
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC  
-XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled  
-XX:+CMSScavengeBeforeRemark -XX:CMSInitiatingOccupancyFraction=68
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:logs/gc.log

It's fairly similar to yours already. The nice thing about using all ratios is that you can easily change the heap size and it (generally) scales appropriately. One other note is that -XX:+UseCompressedOops can typically use 40% less memory by reducing 64-bit addressing to 32-bit (only works up to 32GB).

like image 45
WhiteFang34 Avatar answered Oct 07 '22 20:10

WhiteFang34