Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the proper action plans to debug the dead lock issue if it's in the PRODUCTION environment?

Note I am not asking the concept of dead lock. I am interested in what will you do if you meet this issue in your java application in the production cluster server and the debugging skills.

Question

  • The best practice of plans about analyze steps.

Assumption

  • you already know one server is hit by this problem.
  • OS is using Linux .

Goal

  • You want to know the root cause and fix it.
like image 424
Clark Bao Avatar asked Aug 21 '11 00:08

Clark Bao


2 Answers

  1. Send server a SIGQUIT signal to force a stack dump. If you're on Windows, you may be able to get a comparable dump using jconsole. Maybe. But life is a lot easier if you run servers on Linux.
  2. Inspect the stack dump to find the deadlock
  3. Knowing what it is, try to reproduce on test server
  4. When you can reproduce it, fix it, then test on test server
like image 98
Ernest Friedman-Hill Avatar answered Sep 22 '22 16:09

Ernest Friedman-Hill


Just find another useful link hint by Ernest's tips. http://java.sun.com/developer/technicalArticles/Programming/Stacktrace/

Some advices in this article.

In windows , you can use <ctrl><break> sample result here.

2011-08-27 19:48:38
Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.2-b03 mixed mode):

"DestroyJavaVM" prio=6 tid=0x00000000003db000 nid=0x414 waiting on condition [0x
0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Thread-1" prio=6 tid=0x0000000006621800 nid=0x2178 waiting for monitor entry [0
x0000000006f8f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at SimpleDeadLock$Thread2.run(SimpleDeadLock.java:33)
        - waiting to lock <0x00000000ebc3c3e8> (a java.lang.Object)
        - locked <0x00000000ebc3c3f8> (a java.lang.Object)

"Thread-0" prio=6 tid=0x000000000661f000 nid=0x1f50 waiting for monitor entry [0
x0000000006e8f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at SimpleDeadLock$Thread1.run(SimpleDeadLock.java:20)
        - waiting to lock <0x00000000ebc3c3f8> (a java.lang.Object)
        - locked <0x00000000ebc3c3e8> (a java.lang.Object)

"Low Memory Detector" daemon prio=6 tid=0x0000000006603000 nid=0x1118 runnable [
0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x0000000006600800 nid=0x1340 waiting on
 condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00000000065ee000 nid=0x1e10 waiting on
 condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Attach Listener" daemon prio=10 tid=0x00000000065a2800 nid=0xebc runnable [0x00
00000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x000000000659d000 nid=0x18b4 waiting on
condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=8 tid=0x000000000052d800 nid=0x1b6c in Object.wait() [0x
000000000658f000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000ebc01300> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        - locked <0x00000000ebc01300> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)

"Reference Handler" daemon prio=10 tid=0x0000000000523800 nid=0x2054 in Object.w
ait() [0x000000000648f000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000000ebc011d8> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:485)
        at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
        - locked <0x00000000ebc011d8> (a java.lang.ref.Reference$Lock)

"VM Thread" prio=10 tid=0x000000000051b800 nid=0x1f44 runnable

"GC task thread#0 (ParallelGC)" prio=6 tid=0x0000000000476000 nid=0x25c runnable


"GC task thread#1 (ParallelGC)" prio=6 tid=0x0000000000478800 nid=0x1ef0 runnabl
e

"GC task thread#2 (ParallelGC)" prio=6 tid=0x000000000047b000 nid=0x1d88 runnabl
e

"GC task thread#3 (ParallelGC)" prio=6 tid=0x000000000047c800 nid=0x1e3c runnabl
e

"VM Periodic Task Thread" prio=10 tid=0x000000000661c000 nid=0x1f40 waiting on c
ondition

JNI global references: 882


Found one Java-level deadlock:
=============================
"Thread-1":
  waiting to lock monitor 0x000000000052abb0 (object 0x00000000ebc3c3e8, a java.
lang.Object),
  which is held by "Thread-0"
"Thread-0":
  waiting to lock monitor 0x000000000052d460 (object 0x00000000ebc3c3f8, a java.
lang.Object),
  which is held by "Thread-1"

Java stack information for the threads listed above:
===================================================
"Thread-1":
        at SimpleDeadLock$Thread2.run(SimpleDeadLock.java:33)
        - waiting to lock <0x00000000ebc3c3e8> (a java.lang.Object)
        - locked <0x00000000ebc3c3f8> (a java.lang.Object)
"Thread-0":
        at SimpleDeadLock$Thread1.run(SimpleDeadLock.java:20)
        - waiting to lock <0x00000000ebc3c3f8> (a java.lang.Object)
        - locked <0x00000000ebc3c3e8> (a java.lang.Object)

Found 1 deadlock.

Heap
 PSYoungGen      total 18176K, used 937K [0x00000000ebc00000, 0x00000000ed040000
, 0x0000000100000000)
  eden space 15616K, 6% used [0x00000000ebc00000,0x00000000ebcea520,0x00000000ec
b40000)
  from space 2560K, 0% used [0x00000000ecdc0000,0x00000000ecdc0000,0x00000000ed0
40000)
  to   space 2560K, 0% used [0x00000000ecb40000,0x00000000ecb40000,0x00000000ecd
c0000)
 PSOldGen        total 41472K, used 0K [0x00000000c3400000, 0x00000000c5c80000,
0x00000000ebc00000)
  object space 41472K, 0% used [0x00000000c3400000,0x00000000c3400000,0x00000000
c5c80000)
 PSPermGen       total 21248K, used 2930K [0x00000000be200000, 0x00000000bf6c000
0, 0x00000000c3400000)
  object space 21248K, 13% used [0x00000000be200000,0x00000000be4dc9f8,0x0000000
0bf6c0000)

Expert's Checklist

This covers the theory about Java stack traces, and you should now know what to look for next time you see one. To save yourself time, be sure to make full use of the JDC bug search to see if the problem you are having has already been reported.

To summarize, here are the steps to take when you next come across a problem Java program:

For hanging, deadlocked or frozen programs: If you think your program is hanging, generate a stack trace and examine the threads in states MW or CW. If the program is deadlocked then some of the system threads will probably show up as the current threads, because there is nothing else for the JVM to do.

For crashed, aborted programs: On UNIX look for a core file. You can analyze this file in a native debugging tool such as gdb or dbx. Look for threads that have called native methods. Because Java technology uses a safe memory model, any corruption probably occurred in the native code. Remember that the JVM also uses native code, so it may not necessarily be a bug in your application.

For busy programs: The best course of action you can take for busy programs is to generate frequent stack traces. This will narrow down the code path that is causing the errors, and you can then start your investigation from there.

Good Luck and Happy Debugging

Nice official article indeed! Just want to share with you!

The command of sending SIGQUIT will be a bit different in different OS.

But not the main concern here.

like image 28
Clark Bao Avatar answered Sep 25 '22 16:09

Clark Bao