JVM crashes under stress on RHEL 5.2

Tags:

I've got (the currently latest) jdk 1.6.0.18 crashing while running a web application on (the currently latest) tomcat 6.0.24 unexpectedly after ~~4 to 24 hours~~ 4 hours to 8 days of stress testing (30 threads hitting the app at 6 mil. pageviews/day). This is on RHEL 5.2 (Tikanga).

The crash report is at http://pastebin.com/f639a6cf1 and the consistent parts of the crash are:

a SIGSEGV is being thrown
on libjvm.so
eden space is always full (100%)

JVM runs with the following options:

CATALINA_OPTS="-server -Xms512m -Xmx1024m -Djava.awt.headless=true"

I've also tested the memory for hardware problems using http://memtest.org/ for 48 hours (14 passes of the whole memory) without any error.

I've enabled -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps to inspect for any GC trends or space exhaustion but there is nothing suspicious there. GC and full GC happens at predicable intervals, almost always freeing the same amount of memory capacities.

My application does not, directly, use any native code.

Any ideas of where I should look next?

Edit - more info:

1) There is no client vm in this JDK:

[foo@localhost ~]$ java -version -server
java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)

[foo@localhost ~]$ java -version -client
java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)

2) Changing the O/S is not possible.

3) I don't want to change the JMeter stress test variables since this could hide the problem. Since I've got a use case (the current stress test scenario) which crashes the JVM I'd like to fix the crash and not change the test.

4) I've done static analysis on my application but nothing serious came up.

5) The memory does not grow over time. The memory usage equilibrates very quickly (after startup) at a very steady trend which does not seem suspicious.

6) /var/log/messages does not contain any useful information before or during the time of the crash

More info: Forgot to mention that there was an apache (2.2.14) fronting tomcat using mod_jk 1.2.28. Right now I'm running the test without apache just in case the JVM crash relates to the mod_jk native code which connects to JVM (tomcat connector).

After that (if JVM crashes again) I'll try removing some components from my application (caching, lucene, quartz) and later on will try using jetty. Since the crash is currently happening anytime between 4 hours to 8 days, it may take a lot of time to find out what's going on.

667

asked Feb 11 '10 20:02

cherouvim

2 Answers

Do you have compiler output? i.e. PrintCompilation (and if you're feeling particularly brave, LogCompilation).

I have debugged a case like this in the part by watching what the compiler is doing and, eventually (this took a long time until the light bulb moment), realising that my crash was caused by compilation of a particular method in the oracle jdbc driver.

Basically what I'd do is;

switch on PrintCompilation
since that doesn't give timestamps, write a script that watches that logfile (like a sleep every second and print new rows) and reports when methods were compiled (or not)
repeat the test
check the compiler output to see if the crash corresponds with compilation of some method
repeat a few more times to see if there is a pattern

If there is a discernable pattern then use .hotspot_compiler (or .hotspotrc) to make it stop compiling the offending method(s), repeat the test and see if it doesn't blow up. Obviously in your case this process could theoretically take months I'm afraid.

some references

for dealing with logcompilation output --> http://wikis.sun.com/display/HotSpotInternals/LogCompilation+tool
for info on .hotspot_compiler --> http://futuretask.blogspot.com/2005/01/java-tip-7-use-hotspotcompiler-file-to.html or http://blogs.oracle.com/javawithjiva/entry/hotspotrc_and_hotspot_compiler
a really simple, quick & dirty script for watching the compiler output --> http://pastebin.com/Haqjdue9
note that this was written for solaris which always has bizarre options to utils compared to the gnu equivalents so no doubt easier ways to do this on other platforms or using different languages

The other thing I'd do is systematically change the gc algorithm you're using and check the crash times against gc activity (e.g. does it correlate with a young or old gc, what about TLABs?). Your dump indicates you're using parallel scavenge so try

the serial (young) collector (IIRC it can be combined with a parallel old)
ParNew + CMS
G1

if it doesn't recur with the different GC algos then you know it's down to that (and you have no fix but to change GC algo and/or walk back through older JVMs until you find a version of that algo that doesn't blow).

196

answered Sep 22 '22 14:09

Matt

A few ideas:

Use a different JDK, Tomcat and/or OS version
Slightly modify test parameters, e.g. 25 threads at 7.2 M pageviews/day
Monitor or profile memory usage
Debug or tune the Garbage Collector
Run static and dynamic analysis

answered Sep 23 '22 14:09

kiwicptn

Related questions
                            
                                Lightweight messaging (async invocations) in Java
                            
                                Why is this code with several "or" statements slightly faster than using a lookup table in Java?
                            
                                How can I build an Eclipse plugin outside of Eclipse?
                            
                                How to put Google Adsense in GWT
                            
                                Annotation member which holds other annotations?
                            
                                How to load .js files into a Rhino context in Java
                            
                                how to write join query in hibernate
                            
                                Android Application Requiring Large Data Files
                            
                                Oracle lag between commit and select
                            
                                How can I access lazy-loaded fields after the session has closed, using hibernate?
                            
                                How can I make a java FileDialog accept directories as its FileType in OS X?
                            
                                Using Jackson ObjectMapper to serialize the subclass name into JSON, not the superclass
                            
                                How to merge wars into one?
                            
                                How do I create a Mac installer for my Java application?
                            
                                generating javadoc as a word document
                            
                                Non-blocking I/O versus using threads (How bad is context switching?)
                            
                                When should I use the JDBC Persistence Adapter in ActiveMQ?
                            
                                generating Variations without repetitions / Permutations in java
                            
                                How can I append to an existing java.io.ObjectStream? [duplicate]
                            
                                Java - Anonymous Inner Class Life Cycle

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

JVM crashes under stress on RHEL 5.2

Tags:

java

jvm

crash

segmentation-fault

rhel

cherouvim

People also ask

2 Answers

Matt

kiwicptn

Recent Activity

Donate For Us