Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoiding promotion failed in Java CMS GC

I have a Java application using CMS garbage collection that suffers from a "ParNew (promotion failed)" full GC a few times every day (see below for an example). I understand that a promotion failure occurs when garbage collection cannot find enough (contiguous) space in the old generation into which to promote an object from the new generation. At this point it is forced to do an expensive stop-the-world full GC. I want to avoid such events.

I have read several articles that suggest possible solutions but I wanted to clarify/consolidate them here:

  1. -Xmx: Increase heap size, eg. from 2G to 4G -- simple solution to give more headroom in old generation -- seems to work reasonably well in my experience
  2. -XX:NewRatio: Increase NewRatio, eg. from 2 to 4, in order to increase old generation/decrease new generation -- give the old generation more space -- does not seem to have much, if any, effect from my experiments so far
  3. -XX:PromotedPadding: Increase the amount of padding provided for avoiding promotion failures -- however I cannot find any suggestions on what values to give for this parameter -- does anyone know what the value means, what the default is, or what values to try?
  4. -XX:CMSInitiatingOccupancyFraction -XX:+UseCMSInitiatingOccupancyOnly: Make the CMS cycle start sooner to avoid a lack of space in old generation -- I have not tried this solution yet -- what values would be reasonable to try? What is the default?
  5. Do not allocate very large objects on the heap: A very large object can be difficult to promote since it will require a large contiguous amount of free space in old generation -- this does not apply to my application, as far as I am aware

In case it is relevant, here are my current GC options and a sample of logs preceding a promotion failed event.

-Xmx4g -XX:+UseConcMarkSweepGC -XX:NewRatio=1

2014-12-19T09:38:34.304+0100: [GC (Allocation Failure) [ParNew: 1887488K->209664K(1887488K), 0.0685828 secs] 3115998K->1551788K(3984640K), 0.0690028 secs] [Times: user=0.50 sys=0.02, real=0.07 secs] 
2014-12-19T09:38:35.962+0100: [GC (Allocation Failure) [ParNew: 1887488K->208840K(1887488K), 0.0827565 secs] 3229612K->1687030K(3984640K), 0.0831611 secs] [Times: user=0.39 sys=0.03, real=0.08 secs] 
2014-12-19T09:38:39.975+0100: [GC (Allocation Failure) [ParNew: 1886664K->114108K(1887488K), 0.0442130 secs] 3364854K->1592298K(3984640K), 0.0446680 secs] [Times: user=0.31 sys=0.00, real=0.05 secs] 
2014-12-19T09:38:44.818+0100: [GC (Allocation Failure) [ParNew: 1791932K->167245K(1887488K), 0.0588917 secs] 3270122K->1645435K(3984640K), 0.0593308 secs] [Times: user=0.57 sys=0.00, real=0.06 secs] 
2014-12-19T09:38:49.239+0100: [GC (Allocation Failure) [ParNew (promotion failed): 1845069K->1819715K(1887488K), 0.4417916 secs][CMS: 1499941K->647982K(2097152K), 2.4203021 secs] 3323259K->647982K(3984640K), [Metaspace: 137778K->137778K(1177600K)], 2.8626552 secs] [Times: user=3.46 sys=0.01, real=2.86 secs] 
like image 446
Graeme Moss Avatar asked Dec 22 '14 09:12

Graeme Moss


2 Answers

Although increasing the memory is indeed the simplest and most general solution, in this case it seems we had a particular issue that required a particular solution. Looking at the GC logs in my case I would see logs like this:

GC (CMS Initial Mark) [1 CMS-initial-mark: 2905552K(3145728K)]

which shows that the old gen was ~92% full at the start of the CMS (2.9Gb out of 3.1Gb was used). So the JVM had decided that the "occupancy fraction" should be around 90%. This is a change from the default it starts with that I think is around 68%.

Apparently my application behaves in a way that makes the JVM think this is a good thing. But then the application seems to surprise the JVM by suddenly needing more space in old gen to promote objects from new gen.

On adding the GC flags

-XX:CMSInitiatingOccupancyFraction=50 -XX:+UseCMSInitiatingOccupancyOnly

we no longer saw any "promotion failed" events. These flags, respectively, set the initial occupancy fraction to 50% and tell the JVM not to change this fraction. Therefore, as soon as old gen gets above 50%, it will start a CMS. This avoids it waiting till occupancy gets up to 90% or so, where the chance of a "promotion failed" is much higher.

like image 131
Graeme Moss Avatar answered Nov 02 '22 08:11

Graeme Moss


Increasing the memory is the simplest approach. There is still the risk that the memory will eventual be fragmented (in extreme cases) I suggest you make the heap at least 2.5x the size of memory used after a full GC.

The full GC in CMS is so expensive as it is a serial collection instead a parallel collection.

An alternative is to use the parallel collection which defragments and doesn't fall back to serial collection.

Network buffers and long Strings are larger objects. If they were really large, they would go straight into tenures space, these appear to be larger objects in the new space, failing to be copied to the tenured space.

like image 45
Peter Lawrey Avatar answered Nov 02 '22 09:11

Peter Lawrey