Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Large NewSize of java heap make process unresposive for long time

I have java application, which makes some job with specific memory usage. I noticed, that when i start application with almost 80% of heap setup for young generation, my applications works much faster then with default 1:2 settings. In particular, I am launching jvm with:

java -XX:NewSize=10G -XX:+UseParallelOldGC -server -Xmx12G -Xms12G

The server has at least 14 Gb of free physical memory, so it should be enough I think, for java heap and 'other' space.

Now things happen like this:

25.289: [GC [PSYoungGen: 7872317K->1058813K(9175040K)] 7872533K->1059029K(11272192K), 0.1876420 secs] [Times: user=1.92 sys=1.01, real=0.18 secs]
28.918: [GC [PSYoungGen: 8923133K->1091124K(9175040K)] 8923349K->1091340K(11272192K), 0.2206940 secs] [Times: user=1.92 sys=1.70, real=0.22 secs]
32.946: [GC [PSYoungGen: 8955444K->1060567K(9175040K)] 8955660K->1060783K(11272192K), 0.1804050 secs] [Times: user=2.86 sys=0.01, real=0.18 secs]
37.166: [GC [PSYoungGen: 8924887K->1080085K(8329344K)] 8925103K->1080301K(10426496K), 0.1891370 secs] [Times: user=3.08 sys=0.01, real=0.19 secs]
41.326: [GC [PSYoungGen: 8098709K->1088209K(8106880K)] 8098925K->1088425K(10204032K), 0.2284920 secs] [Times: user=3.49 sys=0.04, real=0.23 secs]
45.779: [GC [PSYoungGen: 8106833K->59784K(8672768K)] 8107049K->1039790K(10769920K), 0.2195770 secs] [Times: user=2.02 sys=1.91, real=0.22 secs]
49.963: [GC [PSYoungGen: 6953352K->75043K(8689664K)] 7933358K->1062837K(10786816K), 0.0384440 secs] [Times: user=0.63 sys=0.01, real=0.04 secs]
54.171: [GC [PSYoungGen: 6968611K->140387K(8737984K)] 7956405K->1129497K(10835136K), 0.0715690 secs] [Times: user=1.12 sys=0.00, real=0.07 secs]
58.455: [GC [PSYoungGen: 7093923K->194024K(8701312K)] 8083033K->1205300K(10798464K), 0.0952730 secs] [Times: user=1.66 sys=0.02, real=0.10 secs]
62.825: [GC [PSYoungGen: 7147560K->122912K(8840256K)] 8158836K->1298466K(10937408K), 0.1671770 secs] [Times: user=2.89 sys=0.10, real=0.16 secs]
67.302: [GC [PSYoungGen: 7270304K->117888K(8792896K)] 8445858K->1377169K(10890048K), 0.1156200 secs] [Times: user=1.98 sys=0.05, real=0.12 secs]
71.785: [GC [PSYoungGen: 7265280K->119002K(8950720K)] 8524561K->1464556K(11047872K), 0.1152940 secs] [Times: user=1.97 sys=0.09, real=0.11 secs]
76.448: [GC [PSYoungGen: 7477018K->206455K(8893056K)] 8822572K->1642652K(10990208K), 0.1607870 secs] [Times: user=2.63 sys=0.06, real=0.16 secs]
81.051: [GC [PSYoungGen: 7564471K->114350K(9084608K)] 9000668K->1649307K(11181760K), 0.1145730 secs] [Times: user=1.89 sys=0.16, real=0.12 secs]
86.020: [GC [PSYoungGen: 7739630K->125895K(9026432K)] 9274587K->1743248K(11123584K), 0.1125030 secs] [Times: user=1.95 sys=0.06, real=0.11 secs]
91.007: [GC [PSYoungGen: 7751175K->202320K(9221952K)] 9368528K->1905769K(11319104K), 0.1523180 secs] [Times: user=2.58 sys=0.06, real=0.15 secs]
95.817: [GC [PSYoungGen: 8085136K->327488K(9146624K)] 9788585K->2203753K(11243776K), 0.2542190 secs] [Times: user=4.44 sys=0.10, real=0.25 secs]
96.071: [Full GC [PSYoungGen: 327488K->0K(9146624K)] [ParOldGen: 1876265K->1032314K(2097152K)] 2203753K->1032314K(11243776K) [PSPermGen: 27528K->21277K(48128K)], 1.4351920 secs] [Times: user=5.12 sys=0.36, real=1.44 secs]

As you see, everything is fine and Full GC works OK. But next happening GC(not Full) increases memory usage of process dramatically, and server is starting using swap.

102.741: [GC-- [PSYoungGen: 7882816K->7882816K(9146624K)] 8915130K->9979962K(11243776K), 133.4433280 secs] [Times: user=69.73 sys=602.83, real=133.46 secs]
236.191: [Full GC [PSYoungGen: 7882816K->0K(9146624K)] [ParOldGen: 2097146K->1069237K(2097152K)] 9979962K->1069237K(11243776K) [PSPermGen: 21277K->21251K(48192K)], 6.9285350 secs] [Times: user=12.75 sys=0.23, real=6.93 secs]

The question is - why? As far as i understand, full gc is most painful point in gc processing. Then why does application stop AFTER full gc successfully done?

like image 445
andll Avatar asked Jan 15 '13 14:01

andll


2 Answers

Given the very high system time (much higher than the user time) This suggests something is going on in the OS. You say you have plenty of memory, but if even a small portion of the JVM is swapped to disk, it can kill the GC times.

I suggest reducing the total size of the heap to ensure you have more free memory for the OS/disk cache/other programs.

To improve your application further I would use a memory profiler (most likely you need to use a commercial, an eval license will do) You appear to be generating 1.5 GB per second of garbage which is an incredible amount. If you can cut it to a few 100 MBs per second it should improve your performance significantly (as well as reduce delays)

BTW: -server should be the default on any 64-bit machine.

like image 87
Peter Lawrey Avatar answered Oct 26 '22 07:10

Peter Lawrey


I've worked for several years with an application that requires a heap size and memory churn similar to yours (20 GB heap, 1 GB+/second churn). As @Peter Lawrey said, if you can reduce either the total memory consumption or the the churn, you'll come out ahead (FYI - I've had good luck with the YourKit profiler. No association with the company, but my experience has been good. YMMV.)

But, if you actually can't reduce the heap usage or churn, I'd recommend you consider some more GC tuning. You've obviously done some; here are a few things that have worked for us:

  • Reduce new-gen size (you're currently allocating 10 GB, and collecting nearly all of it every few seconds. You might do better to allocate 1-2 GB and collect more often. This would allow you to reduce the total heap size as well, and probably avoid swapping.

-XX:+PrintGCDetails -XX:+PrintGCDateStamps - sometimes it's nice to have real dates in the GC logs)

-XX:+UseConcMarkSweepGC - Concurrent old-gen GC consumes more CPU, but provides shorter pauses. For our application, that's our preference, and it sounds like it's probably yours too.

You might also set -XX:ParallelGCThreads=<threads> to something reasonable for your hardware (we're using 6 on 12-core machines, but I don't know that we've tuned that optimally).

like image 39
AaronD Avatar answered Oct 26 '22 09:10

AaronD