Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java HotSpot extremely long duration young collections

This is a long running server app which creates a lot of short lived garbage and almost nothing tenured after startup. Most of the time the young gen collection goes quickly, even with 10 gb it's fast since it's almost all garbage, but we are seeing occasional brutal outliers. As a potential hint, we are running a similarly configured server at lower load on a machine with more CPU resources but 12 gb less memory. We are not seeing this pattern there.

java -Xms20g -Xmx20g -Xloggc:"./logs/gc.log" -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:NewRatio=1 -XX:+DisableExplicitGC -XX:+UseSpinning -XX:PreBlockSpin=1000 -XX:-UseCounterDecay -XX:+UseTLAB -XX:+TieredCompilation -XX:ReservedCodeCacheSize=512m

Here is a snippet from the GC logs. Is it safe to say that it's spending the full time actually doing GC work (instead of waiting for threads to reach safepoints, for example) since sys time is high as well?

2013-12-13T14:04:50.388+0000: 56748.603: [GC [PSYoungGen: 10363808K->1248K(10421824K)] 10569323K->207003K(20907584K), 0.0035110 secs] [Times: user=0.03 sys=0.00, real=0.00 secs]

2013-12-13T14:17:32.245+0000: 57510.460: [GC [PSYoungGen: 10363808K->1600K(10435072K)] 10569563K->207563K(20920832K), 0.0038460 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2013-12-13T14:29:08.572+0000: 58206.787: [GC [PSYoungGen: 10381824K->1408K(10430912K)] 10587787K->207875K(20916672K), 0.0045710 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2013-12-13T14:36:25.176+0000: 58643.391: [GC [PSYoungGen: 10381632K->1600K(10442432K)] 10588099K->208371K(20928192K), 0.0040800 secs] [Times: user=0.03 sys=0.00, real=0.00 secs]

2013-12-13T14:44:04.409+0000: 59102.624: [GC [PSYoungGen: 10397120K->1600K(10438848K)] 10603891K->208715K(20924608K), 14.7387710 secs] [Times: user=0.00 sys=143.05, real=14.74 secs]

2013-12-13T14:52:50.287+0000: 59628.502: [GC [PSYoungGen: 10397120K->11488K(10446720K)] 10604235K->218819K(20932480K), 0.0127280 secs] [Times: user=0.11 sys=0.00, real=0.01 secs]

2013-12-13T14:59:34.934+0000: 60033.149: [GC [PSYoungGen: 10418208K->1536K(10445760K)] 10625539K->218931K(20931520K), 211.1968100 secs] [Times: user=0.00 sys=2067.08, real=211.16 secs]

2013-12-13T15:12:04.478+0000: 60782.693: [GC [PSYoungGen: 10408256K->15136K(10448064K)] 10625651K->232723K(20933824K), 0.0147670 secs] [Times: user=0.00 sys=0.12, real=0.01 secs]

2013-12-13T15:23:02.123+0000: 61440.338: [GC [PSYoungGen: 10426912K->12672K(10448768K)] 10644499K->232635K(20934528K), 0.0141040 secs] [Times: user=0.00 sys=0.12, real=0.02 secs]

2013-12-13T15:31:09.119+0000: 61927.334: [GC [PSYoungGen: 10424448K->5280K(10452160K)] 10644411K->232907K(20937920K), 0.8134440 secs] [Times: user=1.22 sys=6.30, real=0.82 secs]

2013-12-13T15:39:25.083+0000: 62423.298: [GC [PSYoungGen: 10421024K->6208K(10449344K)] 10648651K->235547K(20935104K), 0.0087890 secs] [Times: user=0.06 sys=0.00, real=0.01 secs]

2013-12-13T15:47:52.781+0000: 62930.996: [GC [PSYoungGen: 10421952K->7200K(10456960K)] 10651291K->239123K(20942720K), 0.0094460 secs] [Times: user=0.08 sys=0.00, real=0.01 secs]

2013-12-13T15:58:18.594+0000: 63556.809: [GC [PSYoungGen: 10433248K->6752K(10454848K)] 10665171K->242331K(20940608K), 0.0091570 secs] [Times: user=0.08 sys=0.00, real=0.01 secs]

Specs on the server above are dual 4-core X5570 with 48 gb of ram. HTOP is showing 15 gb used, the rest cached with the app using a little over 10 gb at any time (tenured is small). This makes me think it's not a paging issue

The other server I refer to is a dual 8-core E5-2690 with 32 gb of ram. Both are running jre 1.7.0_25 on fedora, although the server above is on a very old version

like image 554
gbasin Avatar asked Dec 13 '13 18:12

gbasin


1 Answers

Such high sys time is unusual. Could it be an interference with some other apps/activites on your system?

I think you should collect also other system stats - vmstat, netstat, iostat, mpstat, to rule out any interference or your GC time with activities on your system. Correlate those with the YounGen phases to see if there are any spikes during those long YounGen pauses.

like image 111
Aleš Avatar answered Oct 26 '22 03:10

Aleš