Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Garbage collector not able to clear garbage, causes full frequent gc

We use parallel gc with 1.7.0_71.

java -XX:+PrintCommandLineFlags -XX:+PrintGCDetails -version
-XX:InitialHeapSize=258222272 -XX:MaxHeapSize=4131556352 -XX:+PrintCommandLineFlags -XX:+PrintGCDetails -XX:+UseCompressedOops -XX:+UseParallelGC 
java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)

We are seeing huge garbage created in our application every few minutes under decent load . Minor gc triggers every few seconds and major gc triggers every 2 minutes. On analysis we found that 1.8 million char array objects each of size 173K is left in heap. Minor gc is not able to recover them . On taking heapdump ,we found a lot of objects in remainder section in Eclipe MAT. MAT histogram shows lots of char arrays (they are rendered html's),however all incoming references and merge paths to gc root for the char array is unreachable, but only full GC is only able to recover them not minor GC. Why minor GC is not able to recover them when there are no referenced objects ? Eclipse MAT pictures attached. Please see all references are unreachable.

enter image description here There were some reference to request objects, cleared them by setting nulls.

Before the fix there were hanging references to HttpServletRequest objects

enter image description here All images related to the issue: https://www.dropbox.com/sh/qgsitzb7x27j8kc/AABoQwR1qPwTPiDtO6B0_Pm7a?dl=0

Overall heap enter image description here

Heap Histogram view enter image description here Now the question is

  1. Do you think this is issue with garbage collector? We upgraded JBoss 7.1.1 to wildfly 8.2.0.Final. No changes in gc strategy and JDK version. Why gc is not able to reclaim memory where it is pointing to unreachable references.

  2. We can decrease XX:NewRatio to 1. However we are no sure this will work as full gc happens frequently

  3. DO you think moving to G1 GC will help ? Will there be decrease in throughput ? What are best options available for G1 GC. Our heap size -Xms : 1024m , -Xmx 2048m. Perm gen 512m We are not seeing memory leaks, no out of memory error.Attaching full gc log outputs

    2015-05-10 19:32:41 IST| 459.939: [Full GC [PSYoungGen: 8123K->0K(680960K)] [ParOldGen: 782136K->359065K(766464K)] 790260K->359065K(1447424K) [PSPermGen: 202932K->202930K(441344K)], 1.0738240 secs] [Times: user=3.37 sys=0.01, real=1.07 secs]
    2015-05-10 19:32:42 IST| 462.306: [GC [PSYoungGen: 672768K->10534K(685056K)] 1031833K->369600K(1451520K), 0.0450800 secs] [Times: user=0.15 sys=0.00, real=0.04 secs]
    2015-05-10 19:32:44 IST| 463.641: [GC [PSYoungGen: 682790K->9093K(685568K)] 1041856K->373085K(1452032K), 0.0570820 secs] [Times: user=0.16 sys=0.00, real=0.06 secs]
    2015-05-10 19:32:45 IST| 464.936: [GC [PSYoungGen: 681349K->9812K(686080K)] 1045341K->377511K(1452544K), 0.0439060 secs] [Times: user=0.12 sys=0.00, real=0.04 secs]
    2015-05-10 19:32:46 IST| 466.283: [GC [PSYoungGen: 683092K->10733K(686080K)] 1050791K->383554K(1452544K), 0.0464700 secs] [Times: user=0.14 sys=0.00, real=0.05 secs]
    2015-05-10 19:32:48 IST| 467.659: [GC [PSYoungGen: 684013K->11283K(685568K)] 1056834K->388651K(1452032K), 0.1381130 secs] [Times: user=0.30 sys=0.00, real=0.14 secs]
    2015-05-10 19:32:50 IST| 469.734: [GC [PSYoungGen: 684051K->9652K(686080K)] 1061419K->393759K(1452544K), 0.0466800 secs] [Times: user=0.13 sys=0.00, real=0.05 secs]
    2015-05-10 19:32:51 IST| 471.087: [GC [PSYoungGen: 682420K->11253K(685568K)] 1066527K->400087K(1452032K), 0.0589180 secs] [Times: user=0.11 sys=0.00, real=0.06 secs]
    2015-05-10 19:32:52 IST| 472.325: [GC [PSYoungGen: 684021K->7957K(686080K)] 1072855K->403018K(1452544K), 0.0436140 secs] [Times: user=0.13 sys=0.00, real=0.04 secs]
    2015-05-10 19:32:54 IST| 473.606: [GC [PSYoungGen: 680725K->9177K(685056K)] 1075786K->406493K(1451520K), 0.0524990 secs] [Times: user=0.13 sys=0.00, real=0.05 secs]
    
    2015-05-10 19:34:34 IST| 573.526: [GC [PSYoungGen: 684217K->10956K(686080K)] 1440629K->771626K(1452544K), 0.0416620 secs] [Times:  user=0.14 sys=0.00, real=0.04 secs]
     2015-05-10 19:34:34 IST| 573.568: [Full GC [PSYoungGen: 10956K->0K(686080K)] [ParOldGen: 760670K->364958K(818688K)] 771626K->364958K(1504768K) [PSPermGen: 203069K->203069K(420864K)], 0.8001740 secs] >[Times: user=2.46 sys=0.01, real=0.80 secs]
    2015-05-10 19:34:36 IST| 575.600: [GC [PSYoungGen: 674304K->10465K(686592K)] 1039262K->375423K(1505280K), 0.0410330 secs] [Times: user=0.13 sys=0.00, real=0.04 secs]
    2015-05-10 19:36:35 IST| 694.277: [GC [PSYoungGen: 684413K->9342K(687104K)] 1490469K->820033K(1505792K), 0.2160320 secs] [Times: user=0.55 sys=0.08, real=0.21 secs]
    2015-05-10 19:36:36 IST| 695.664: [GC [PSYoungGen: 684670K->8323K(687104K)] 1495361K->823380K(1505792K), 0.0454050 secs] [Times: user=0.11 sys=0.00, real=0.05 secs]
    2015-05-10 19:36:37 IST| 695.710: [Full GC [PSYoungGen: 8323K->0K(687104K)] [ParOldGen: 815056K->363295K(838144K)] 823380K->363295K(1525248K) [PSPermGen: 203095K->203095K(401920K)], 0.8133080 secs] [Times: user=2.43 sys=0.01, real=0.81 secs]
    2015-05-10 19:36:38 IST| 697.669: [GC [PSYoungGen: 675328K->10586K(686592K)] 1038623K->373882K(1524736K), 0.0436000 secs] [Times: user=0.13 sys=0.00, real=0.04 secs]
    ...
    ....
    
like image 396
Srivathsan Avatar asked Sep 27 '22 23:09

Srivathsan


2 Answers

Why minor GC is not able to recover them when there are no referenced objects ?

Most likely because they are in the old generation. Minor GCs only dispose of unreachable objects in the young generation.

This usually happens when the object lives longer than it can remain in the young generation due to tenuring. E.g. if caches are involved that hold onto results for a while or when the request lifetime exceeds that of the GC interval * tenuring threshold.

-XX:+PrintTenuringDistribution may prove informative.

First you could try simply providing a pause time goal via -XX:MaxGCPauseMillis=.... ParallelGC might be able to meet it.

If that doesn't help you could refactor your code to reduce object lifetimes or reduce allocation rates to make minor GCs less frequent.

Note that foremost ParallelGC is a throughput collector, in terms of CPU cycles it's more efficient than concurrent collectors but it usually won't be able to meet as low pause time goals as those.

DO you think moving to G1 GC will help ?

If you are concerned about pause times, most likely. You might also want to try CMS.

If you want to try out G1 you should probably switch to java 8, its heuristics have improved a lot over time and it is still maturing.

Will there be decrease in throughput ?

possibly. it depends on whether there's spare CPU capacity and how you define/measure throughput. And even under less favorable circumstances the decrease might not be significant.

What are best options available for G1 GC.

G1 is supposed to self-tuning (beyond the user-provided pause and throughput goals that also apply to ParallelGC). So just enable it and see if it provides acceptable performance.

like image 127
the8472 Avatar answered Oct 19 '22 18:10

the8472


Similar issue happened with us in >NET. A reference leaked into Gen1 - and that aged into Gen 2, finally causing the accumulation of RAM into 30Gb+ size.

Although not related to JAVA directly, we solved the similar problem by building a custom memory manager which sub-allots byte[]. This allowed us to store hundreds of millions of objects in-process for days without loading GC at all. The solution is great for caching and far faster than out-of-process Redis/memcache. It serializes objects with super-fast serializer, that way "references" are turned into "Pointer" (our struct of two ints), this way the GC is not overloaded with scanning 1000000000 objects

See this: https://www.youtube.com/watch?v=Dz_7hukyejQ

like image 39
itadapter DKh Avatar answered Oct 19 '22 19:10

itadapter DKh