JVM periodically hangs

Tags:

Trying to debug a misbehaving Java VM. The process in question is a large VM (100GB heap) running Sun VM 1.6u24 on Centos 5 that is doing routine back-end work - i.e. database access, file I/O and so forth.

After the process was restarted for a software version upgrade, we noticed that its throughput has dropped significantly. Most of the time, top reports the Java process is fully utilizing 2 cores. During that time, the VM is totally non-responsible: no logs are written and it doesn't respond to outside tools such as jstack or kill -3. Once the VM recovers, the process continues as per normal, until the next hang.

strace shows that during these hangs, only 2 threads make system calls. These were the VM threads "VM Thread" (21776) and "VM Periodic Task Thread" (21786). Presumably, these 2 threads are using up the CPU time. The application threads occasionally wake up and do their work. The rest of the time they seem to be waiting on various futexes. Incidentally, the first line of the normal phase is always a SIGSEGV.

Click to copy

[pid 21776] sched_yield()               = 0
[pid 21776] sched_yield()               = 0
[pid 21776] sched_yield( <unfinished ...>
[pid 21786] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
[pid 21776] <... sched_yield resumed> ) = 0
[pid 21786] futex(0x2aabac71ef28, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 21776] sched_yield( <unfinished ...>
[pid 21786] <... futex resumed> )       = 0
[pid 21786] clock_gettime(CLOCK_MONOTONIC, {517080, 280918033}) = 0
[pid 21786] clock_gettime(CLOCK_REALTIME, {1369750039, 794028000}) = 0
[pid 21786] futex(0x2aabb81b94c4, FUTEX_WAIT_PRIVATE, 1, {0, 49923000} <unfinished ...>
[pid 21776] <... sched_yield resumed> ) = 0
[pid 21776] sched_yield()               = 0
[pid 21776] sched_yield()               = 0
[pid 21955] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
[pid 21955] rt_sigreturn(0x2b1cde2f54ad <unfinished ...>

The problem manifests itself in 2 different servers. Rolling back our code version only worked for one of the 2 servers. No error messages were reported in system logs, and another Java process on the affected machine is behaving correctly.

This following output was obtained with gstack and shows 2 typical waiting application threads:

Click to copy

Thread 552 (Thread 0x4935f940 (LWP 21906)):
#0  0x00000030b040ae00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002b1cdd8548d6 in os::PlatformEvent::park(long) () from /usr/lib/jvm/java/jre/lib/amd64/server/libjvm.so
#2  0x00002b1cdd92b230 in ObjectMonitor::wait(long, bool, Thread*) () from /usr/lib/jvm/java/jre/lib/amd64/server/libjvm.so
#3  0x00002b1cdd928853 in ObjectSynchronizer::wait(Handle, long, Thread*) () from /usr/lib/jvm/java/jre/lib/amd64/server/libjvm.so
#4  0x00002b1cdd69b716 in JVM_MonitorWait () from /usr/lib/jvm/java/jre/lib/amd64/server/libjvm.so
#5  0x00002b1cde193cc8 in ?? ()
#6  0x00002b1ce2552d90 in ?? ()
#7  0x00002b1cdd84fc23 in os::javaTimeMillis() () from /usr/lib/jvm/java/jre/lib/amd64/server/libjvm.so
#8  0x00002b1cde188a82 in ?? ()
#9  0x0000000000000000 in ?? ()
Thread 551 (Thread 0x49460940 (LWP 21907)):
#0  0x00000030b040ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002b1cdd854d6f in Parker::park(bool, long) () from /usr/lib/jvm/java/jre/lib/amd64/server/libjvm.so
#2  0x00002b1cdd98a1c8 in Unsafe_Park () from /usr/lib/jvm/java/jre/lib/amd64/server/libjvm.so
#3  0x00002b1cde193cc8 in ?? ()
#4  0x000000004945f798 in ?? ()
#5  0x00002b1cde188a82 in ?? ()
#6  0x0000000000000000 in ?? ()

We looked at issues with NTPD, including leap second bugs, but the suggested workarounds didn't help, neither did using external NTPD servers. Restarting the machine itself didn't help as well. We have GC logging enabled, and it doesn't look it a GC issue, as there are no messages indicating it. Looking for any suggestions that can help with this issue, any help is much appreciated.

638

asked May 28 '13 15:05

Alon Pilberg

1 Answers

Here are a couple of things I'd look at:

When the JVM is unresponsive, use iostat and vmstat to see if the system is thrashing. This can happen when you over-allocate memory; i.e. your overall system is using significantly more virtual memory than physical memory.
Turn on the JVM's GC logging, and see if there is a correlation between the JVM going unresponsive and GC runs.

answered Sep 29 '22 13:09

Stephen C

Related questions
                            
                                How can I reuse an inner class in Java / Android?
                            
                                Does Java 1.7 use a different character encoding?
                            
                                Why doesn't Class have a nice generic type in this case?
                            
                                Programmatic builds using Maven 3 API
                            
                                How do I set priority on Spring MVC mapping?
                            
                                Java transparent PNG to clipboard
                            
                                Using the Eclipse code formatter from own program
                            
                                Hibernate not following JPA specifications when combined with Bean Validation API?
                            
                                Verify that Java classes implement toString()
                            
                                How to stop HttpURLConnection.getInputStream()?
                            
                                How can i get a printer's make and model in Java?
                            
                                Java Tetris - Thinking about piece rotation [closed]
                            
                                NoClassDefFoundError in web application running on Tomcat 7
                            
                                NoClassDefFoundError while starting up jetty Server
                            
                                File processing on two different machine using spring batch
                            
                                Compiling java into native code? [duplicate]
                            
                                Can I instantiate an anonymous class in the constructor of outer class?
                            
                                "No information about coverage per test." from Sonar with Jacoco Ant build
                            
                                java.sql.SQLException: Io exception: Socket read timed out vs Closed Connection
                            
                                How does this library work? JsonEngine

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

JVM periodically hangs

Tags:

java

jvm

strace

freeze

centos5

Alon Pilberg

People also ask

1 Answers

Stephen C

Recent Activity

Donate For Us