I have a Java application using CMS garbage collection that suffers from a "ParNew (promotion failed)" full GC a few times every day (see below for an example). I understand that a promotion failure occurs when garbage collection cannot find enough (contiguous) space in the old generation into which to promote an object from the new generation. At this point it is forced to do an expensive stop-the-world full GC. I want to avoid such events.
I have read several articles that suggest possible solutions but I wanted to clarify/consolidate them here:
In case it is relevant, here are my current GC options and a sample of logs preceding a promotion failed event.
-Xmx4g -XX:+UseConcMarkSweepGC -XX:NewRatio=1
2014-12-19T09:38:34.304+0100: [GC (Allocation Failure) [ParNew: 1887488K->209664K(1887488K), 0.0685828 secs] 3115998K->1551788K(3984640K), 0.0690028 secs] [Times: user=0.50 sys=0.02, real=0.07 secs]
2014-12-19T09:38:35.962+0100: [GC (Allocation Failure) [ParNew: 1887488K->208840K(1887488K), 0.0827565 secs] 3229612K->1687030K(3984640K), 0.0831611 secs] [Times: user=0.39 sys=0.03, real=0.08 secs]
2014-12-19T09:38:39.975+0100: [GC (Allocation Failure) [ParNew: 1886664K->114108K(1887488K), 0.0442130 secs] 3364854K->1592298K(3984640K), 0.0446680 secs] [Times: user=0.31 sys=0.00, real=0.05 secs]
2014-12-19T09:38:44.818+0100: [GC (Allocation Failure) [ParNew: 1791932K->167245K(1887488K), 0.0588917 secs] 3270122K->1645435K(3984640K), 0.0593308 secs] [Times: user=0.57 sys=0.00, real=0.06 secs]
2014-12-19T09:38:49.239+0100: [GC (Allocation Failure) [ParNew (promotion failed): 1845069K->1819715K(1887488K), 0.4417916 secs][CMS: 1499941K->647982K(2097152K), 2.4203021 secs] 3323259K->647982K(3984640K), [Metaspace: 137778K->137778K(1177600K)], 2.8626552 secs] [Times: user=3.46 sys=0.01, real=2.86 secs]
Although increasing the memory is indeed the simplest and most general solution, in this case it seems we had a particular issue that required a particular solution. Looking at the GC logs in my case I would see logs like this:
GC (CMS Initial Mark) [1 CMS-initial-mark: 2905552K(3145728K)]
which shows that the old gen was ~92% full at the start of the CMS (2.9Gb out of 3.1Gb was used). So the JVM had decided that the "occupancy fraction" should be around 90%. This is a change from the default it starts with that I think is around 68%.
Apparently my application behaves in a way that makes the JVM think this is a good thing. But then the application seems to surprise the JVM by suddenly needing more space in old gen to promote objects from new gen.
On adding the GC flags
-XX:CMSInitiatingOccupancyFraction=50 -XX:+UseCMSInitiatingOccupancyOnly
we no longer saw any "promotion failed" events. These flags, respectively, set the initial occupancy fraction to 50% and tell the JVM not to change this fraction. Therefore, as soon as old gen gets above 50%, it will start a CMS. This avoids it waiting till occupancy gets up to 90% or so, where the chance of a "promotion failed" is much higher.
Increasing the memory is the simplest approach. There is still the risk that the memory will eventual be fragmented (in extreme cases) I suggest you make the heap at least 2.5x the size of memory used after a full GC.
The full GC in CMS is so expensive as it is a serial collection instead a parallel collection.
An alternative is to use the parallel collection which defragments and doesn't fall back to serial collection.
Network buffers and long Strings are larger objects. If they were really large, they would go straight into tenures space, these appear to be larger objects in the new space, failing to be copied to the tenured space.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With