Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JVM Freeze under high load in longevity tests

Running with JVM:

java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)

OS:

CentOS release 6.4 (Final)

Jvm Options:

-Xmx4g -Xms4g -XX:MaxPermSize=4g -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintClassHistogram -XX:+CMSClassUnloadingEnabled -verbose:gc -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+DisableExplicitGC

Running in an OSGI environment, Aerospike DB, NETTY (NIO) for networking.

Ran a weekend longevity test. This was the last print:

[2015-12-11 09:54:51,185] INFO  : [GC pause (young)

After 2 days I ran strace on the pid, and then those are the next prints:

[2015-12-11 09:54:51,185] INFO  : [GC pause (young) 3598M->1458M(4096M), 0.0280020 secs]
[2015-12-13 11:54:54,353] INFO  : [GC pause (young) 3598M->1464M(4096M), 180001.5628870 secs]

The first print finished and the next print showed a 2 days GC.

The jvm did not respone to thread dump signals during the freeze (pkill -QUIT pid). This freeze happens every few days. The freeze happens not only with the G1 collector, but also with CMS collector. How can I start debugging this, and what can potentially cause this?

Thank you.

EDIT: Had another freeze, this time the strace does not release the freeze. The second freeze was released using jstack.

UPDATE: Found the problem! Look at the answer below.

like image 631
Guy Sela Avatar asked Dec 13 '15 13:12

Guy Sela


1 Answers

I found the problem!
It is a kernel bug in futex_wait() that was backported to our kernel version.
You can read about it here:
https://groups.google.com/forum/#!topic/mechanical-sympathy/QbmpZxp6C64

like image 84
Guy Sela Avatar answered Nov 19 '22 21:11

Guy Sela