Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JVM crashes under stress on RHEL 5.2

I've got (the currently latest) jdk 1.6.0.18 crashing while running a web application on (the currently latest) tomcat 6.0.24 unexpectedly after 4 to 24 hours 4 hours to 8 days of stress testing (30 threads hitting the app at 6 mil. pageviews/day). This is on RHEL 5.2 (Tikanga).

The crash report is at http://pastebin.com/f639a6cf1 and the consistent parts of the crash are:

  • a SIGSEGV is being thrown
  • on libjvm.so
  • eden space is always full (100%)

JVM runs with the following options:

CATALINA_OPTS="-server -Xms512m -Xmx1024m -Djava.awt.headless=true"

I've also tested the memory for hardware problems using http://memtest.org/ for 48 hours (14 passes of the whole memory) without any error.

I've enabled -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps to inspect for any GC trends or space exhaustion but there is nothing suspicious there. GC and full GC happens at predicable intervals, almost always freeing the same amount of memory capacities.

My application does not, directly, use any native code.

Any ideas of where I should look next?

Edit - more info:

1) There is no client vm in this JDK:

[foo@localhost ~]$ java -version -server
java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)

[foo@localhost ~]$ java -version -client
java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)

2) Changing the O/S is not possible.

3) I don't want to change the JMeter stress test variables since this could hide the problem. Since I've got a use case (the current stress test scenario) which crashes the JVM I'd like to fix the crash and not change the test.

4) I've done static analysis on my application but nothing serious came up.

5) The memory does not grow over time. The memory usage equilibrates very quickly (after startup) at a very steady trend which does not seem suspicious.

6) /var/log/messages does not contain any useful information before or during the time of the crash

More info: Forgot to mention that there was an apache (2.2.14) fronting tomcat using mod_jk 1.2.28. Right now I'm running the test without apache just in case the JVM crash relates to the mod_jk native code which connects to JVM (tomcat connector).

After that (if JVM crashes again) I'll try removing some components from my application (caching, lucene, quartz) and later on will try using jetty. Since the crash is currently happening anytime between 4 hours to 8 days, it may take a lot of time to find out what's going on.

like image 667
cherouvim Avatar asked Feb 11 '10 20:02

cherouvim


People also ask

How do I stop JVM from crashing?

Reduce the Java heap size. The Java heap is only a certain part of the total memory used by the JVM. If the Java heap is significantly larger, JVM can run out of virtual memory while compiling methods or when native libraries are loaded. Try lowering the maximum heap size to avoid this error.

What can cause JVM to crash?

A Java application might stop running for several reasons. The most common reason is that the application finished running or was halted normally. Other reasons might be Java application errors, exceptions that cannot be handled, and irrecoverable Java errors like OutOfMemoryError .


2 Answers

Do you have compiler output? i.e. PrintCompilation (and if you're feeling particularly brave, LogCompilation).

I have debugged a case like this in the part by watching what the compiler is doing and, eventually (this took a long time until the light bulb moment), realising that my crash was caused by compilation of a particular method in the oracle jdbc driver.

Basically what I'd do is;

  • switch on PrintCompilation
  • since that doesn't give timestamps, write a script that watches that logfile (like a sleep every second and print new rows) and reports when methods were compiled (or not)
  • repeat the test
  • check the compiler output to see if the crash corresponds with compilation of some method
  • repeat a few more times to see if there is a pattern

If there is a discernable pattern then use .hotspot_compiler (or .hotspotrc) to make it stop compiling the offending method(s), repeat the test and see if it doesn't blow up. Obviously in your case this process could theoretically take months I'm afraid.

some references

  • for dealing with logcompilation output --> http://wikis.sun.com/display/HotSpotInternals/LogCompilation+tool
  • for info on .hotspot_compiler --> http://futuretask.blogspot.com/2005/01/java-tip-7-use-hotspotcompiler-file-to.html or http://blogs.oracle.com/javawithjiva/entry/hotspotrc_and_hotspot_compiler
  • a really simple, quick & dirty script for watching the compiler output --> http://pastebin.com/Haqjdue9
  • note that this was written for solaris which always has bizarre options to utils compared to the gnu equivalents so no doubt easier ways to do this on other platforms or using different languages

The other thing I'd do is systematically change the gc algorithm you're using and check the crash times against gc activity (e.g. does it correlate with a young or old gc, what about TLABs?). Your dump indicates you're using parallel scavenge so try

  • the serial (young) collector (IIRC it can be combined with a parallel old)
  • ParNew + CMS
  • G1

if it doesn't recur with the different GC algos then you know it's down to that (and you have no fix but to change GC algo and/or walk back through older JVMs until you find a version of that algo that doesn't blow).

like image 196
Matt Avatar answered Sep 22 '22 14:09

Matt


A few ideas:

  • Use a different JDK, Tomcat and/or OS version
  • Slightly modify test parameters, e.g. 25 threads at 7.2 M pageviews/day
  • Monitor or profile memory usage
  • Debug or tune the Garbage Collector
  • Run static and dynamic analysis
like image 35
kiwicptn Avatar answered Sep 23 '22 14:09

kiwicptn