I've got a Java webapp running on one tomcat instance. During peak times the webapp serves around 30 pages per second and normally around 15.
My environment is:
O/S: SUSE Linux Enterprise Server 10 (x86_64)
RAM: 16GB
server: Tomcat 6.0.20
JVM: Java HotSpot(TM) 64-Bit Server VM 1.6.0_14
JVM options:
CATALINA_OPTS="-Xms512m -Xmx1024m -XX:PermSize=128m -XX:MaxPermSize=256m
-XX:+UseParallelGC
-Djava.awt.headless=true
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
JAVA_OPTS="-server"
After a couple of days of uptime the Full GC starts occurring more frequently and it becomes a serious problem to the application's availability. After a tomcat restart the problem goes away but, of course, returns after 5 to 10 or 30 days (not consistent).
The Full GC log before and after a restart is at http://pastebin.com/raw.php?i=4NtkNXmi
It shows a log before the restart at 6.6 days uptime where the app was suffering because Full GC needed 2.5 seconds and was happening every ~6 secs.
Then it shows a log just after the restart where Full GC only happened every 5-10 minutes.
I've got two dumps using jmap -dump:format=b,file=dump.hprof PID
when the Full GCs where occurring (I'm not sure whether I got them exactly right when a Full GC was occurring or between 2 Full GCs) and opened them in http://www.eclipse.org/mat/ but didn't get anything useful in Leak Suspects:
Note that I never get an OutOfMemoryError.
Any ideas on where should I look next?
The reason that a Full GC occurs is because the application allocates too many objects that can't be reclaimed quickly enough. Often concurrent marking has not been able to complete in time to start a space-reclamation phase.
If your application's object creation rate is very high, then to keep up with it, the garbage collection rate will also be very high. A high garbage collection rate will increase the GC pause time as well. Thus, optimizing the application to create fewer objects is THE EFFECTIVE strategy to reduce long GC pauses.
The root cause of high-frequency garbage collection is object churn—many objects being created and disposed of in short order. Nearly all garbage collection strategies are well suited for such a scenario; they do their job well and are fast. However, this still consumes resources.
That is, every time you allocate a small block of memory (big blocks are typically placed directly into "older" generations), the system checks whether there's enough free space in the gen-0 heap, and if there isn't, it runs the GC to free up space for the allocation to succeed.
When we had this issue we eventually tracked it down to the young generation being too small. Although we had given plenty of ram the young generation wasn't given it's fair share.
This meant that small garbage collections would happen more frequently and caused some young objects to be moved into the tenured generation meaning more large garbage collections also.
Try using the -XX:NewRatio
with a fairly low value (say 2 or 3) and see if this helps.
More info can be found here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With