Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java periodically hangs at futex and very low IO output

Currently my application periodically blocked in IO , and the output is very low . I use some command to trace the process.

By using jstack i found that the app is hanging at FileOutputStream.writeBytes.

By using strace -f -c -p pid to collect syscall info, i found that. For normal situation, it has both futex and write syscalls. But when it went unnormal, there are only futex syscalls. The app keeps calling futex but all failed and throw ETIMEDOUT, just like this:

<futex resumed>  =-1 ETIMEDOUT (Connecton timed out)
futex(Ox7f823, FUTEX_WAKE_PRIVATE,1)=0
futex(Ox7f824, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME) =-1<unfinished>
<futex resumed>  =-1 ETIMEDOUT (Connecton timed out)
futex(Ox7f823, FUTEX_WAKE_PRIVATE,1)=0
futex(Ox7f824, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME) =-1<unfinished>

This issue happens periodically ,and continues for mintues or hours, and go normal again.

Escipally, when blocked in IO, echo 3 > /proc/sys/vm/drop_caches always makes it go normal temporarily. I googled it and found some similiar proleam, listing below.

  1. leap second. Doesn't work, our system's ntpd is stopped.
  2. transparent hugepage bug. https://bugzilla.redhat.com/show_bug.cgi?id=879801 This is very similar to my probleam, but my khugepaged process is normal, and the load is always nearly zero. Escipally drop_caches works for my application too. And my system is also multi core and large memory. It donsn't work for me. So anyone met the same probleam or familiar with this issue?

Some info about my system. OS:Redhat 6.1, kernal version 2.6.31

JDK:1.7.0_05

CPU:X5650, 24cores

Memory :24GB and 48GB

like image 688
bforevdr Avatar asked Aug 28 '15 03:08

bforevdr


2 Answers

Maybe the kernel bug in futex_wait()?

You can read about it here: https://groups.google.com/forum/#!topic/mechanical-sympathy/QbmpZxp6C64

like image 148
Guy Sela Avatar answered Nov 15 '22 07:11

Guy Sela


In addition to clock jumps and aforementioned (rather old) THP kernel bug, another common reason for java to unexpectedly block on IO is reading very slow and blocking /dev/random which some libraries prefer over more commonly used and much better performing /dev/urandom.

Easy way to tell if that was the culprit:

sudo mv /dev/random /dev/random.real
sudo ln -s /dev/urandom /dev/random

...then restart the app and see if it stops IO blocking. Once done with the test, you probably want to restore /dev/random:

sudo mv /dev/random.real /dev/random

...and open a bug with application vendor asking to use /dev/urandom where appropriate.

like image 1
Aex Aey Avatar answered Nov 15 '22 08:11

Aex Aey