Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JVM Tenured/Old gen reached limit & server hanging

Our application requires very huge memory since it deals with very large data. Hence we increased our max heap size to 12GB (-Xmx).

Following are the environment details

OS - Linux 2.6.18-164.11.1.el5    
JBoss - 5.0.0.GA
VM Version - 16.0-b13 Sun JVM
JDK - 1.6.0_18

We have above env & configuration in our QA & prod. In QA we have max PS Old Gen (Heap memory) allocated as 8.67GB whereas in Prod it is just 8GB.

In Prod for a particular job Old Gen Heap reaches 8GB, hangs there and the web URL become inaccessible. Server is getting down. But in QA also it reaches 8.67GB but full GC is performed and its coming back to 6.5GB or something. Here its not getting hanged.

We couldn't figure out a solution for this because both the environment and configuration on both the boxes are same.

I have 3 questions here,

2/3rd of max heap will be allocated to old/tenured gen. If that is the case why it is 8GB in one place and 8.67GB in another place?

How to provide a valid ratio for New and Tenure in this case(12GB)?

Why it is full GCed in one place and not in the other?

Any help would be really appreciable. Thanks.

Pls let me know if you need further details on env or conf.

like image 346
raksja Avatar asked May 09 '11 15:05

raksja


People also ask

What is tenured generation in JVM?

The Tenured generation is used for the longer lived objects. Another GC process (CMS) runs when it becomes full to remove any unused objects.

What happens when Eden space is full?

When the eden space becomes full, minor gc takes place. During a minor GC event, objects surviving the eden space are moved to the survivor space.

Why JVM heap utilization is too high?

This is because the JVM steadily increases heap usage percentage until the garbage collection process frees up memory again. High heap usage occurs when the garbage collection process cannot keep up. An indicator of high heap usage is when the garbage collection is incapable of reducing the heap usage to around 30%.

What is Eden space in JVM memory?

Eden Space: The pool from which memory is initially allocated for most objects. Survivor Space: The pool containing objects that have survived the garbage collection of the Eden space.


2 Answers

For your specific questions:

  1. The default ratio between new and old generations can depend on the system and what the JVM determines will be best.
  2. To specify a specific ratio between new and old generations with -XX:NewRatio=3.
  3. If your JVM is hanging and the heap is full it's probably stuck doing constant GC's.

It sounds like you need more memory for prod. If on QA the request finishes then perhaps that extra 0.67GB is all that it needs. That doesn't seem to leave you much headroom though. Are you running the same test on QA as will happen on prod?

Since you're using 12GB you must be using 64-bit. You can save the memory overhead of 64-bit addressing by using the -XX:+UseCompressedOops option. It typically saves 40% memory, so your 12GB will go a lot further.

Depending on what you're doing the concurrent collector might be better as well, particularly to reduce long GC pause times. I'd recommend trying these options as I've found them to work well:

-Xmx12g -XX:NewRatio=4 -XX:SurvivorRatio=8 -XX:+UseCompressedOops
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC
-XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSClassUnloadingEnabled
-XX:+CMSScavengeBeforeRemark -XX:CMSInitiatingOccupancyFraction=68
like image 154
WhiteFang34 Avatar answered Oct 21 '22 18:10

WhiteFang34


you need to get some more data in order to know what is going on, only then will you know what needs to be fixed. To my mind that means

  1. get detailed information about what the garbage collector is doing, these params are a good start (substitute some preferred path and file in place of gc.log)

    -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:gc.log -verbose:gc

  2. repeat the run, scan through the gc log for the period when it is hanging & post back with that output

  3. consider watching the output using visualgc (requires jstatd running on the server, one random link that explains how to do this setup is this one) which is part of jvmstat, this is a v easy way to see how the various generations in the heap are sized (though perhaps not for 6hrs!)

I also strongly recommend you do some reading too so you know what all these switches are referring to otherwise you'll be blindly trying stuff with no real understanding of why 1 thing helps and another doesn't. I'd start with the oracle java 6 gc tuning page which you can find here

I'd only suggest changing options once you have baselined performance. Having said that CompressedOops is v likely to be an easy win, you may want to note it has been defaulted to on since 6u23.

Finally you should consider upgrading the jvm, 6u18 is getting on a bit and performance keeps improving.

each job will take 3 hours to complete and almost 6 jobs running one after another. Last job when running reaches 8GB max and getting hang in prod

are these jobs related at all? this really sounds like a gradual memory leak if they're not working on the same dataset. If heap usage keeps going up and up and eventually blows then you have a memory leak. You should consider using -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/some/dir to catch a heap dump (though note with a 13G heap it will be a big file so make sure you have the disk space) if/when it blows. You can then use jhat to look at what was on the heap at the time.

like image 41
Matt Avatar answered Oct 21 '22 18:10

Matt