I'm trying to understand a problem we're having with a Java process that is hanging. This process has been running in production for about 4 months and earlier this week it started hanging. When I look at a thread dump of the process all of the relevant threads (3) have stacks like the following:
"TxnParser_1" prio=6 tid=0x69bd3400 nid=0x2534 runnable [0x6aa2f000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at oracle.net.ns.Packet.receive(Unknown Source)
at oracle.net.ns.DataPacket.receive(Unknown Source)
at oracle.net.ns.NetInputStream.getNextPacket(Unknown Source)
at oracle.net.ns.NetInputStream.read(Unknown Source)
at oracle.net.ns.NetInputStream.read(Unknown Source)
at oracle.net.ns.NetInputStream.read(Unknown Source)
at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1099)
at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1070)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:478)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:207)
at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:790)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1039)
at oracle.jdbc.driver.T4CStatement.executeMaybeDescribe(T4CStatement.java:830)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1132)
at oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.java:1687)
at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1653)
- locked <0x40e22f88> (a oracle.jdbc.driver.T4CStatement)
- locked <0x28f8d398> (a oracle.jdbc.driver.T4CConnection)
at com.gcg.data.LogParsingInfo.initFromDB(LogParsingInfo.java:262)
at com.gcg.om.OmQueueEntry.initParseInfoFromDB(OmQueueEntry.java:104)
at com.gcg.om.GenericQueueEntry.run(GenericQueueEntry.java:237)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
There are no threads waiting for locks so the process isn't deadlocked. These 3 threads that are doing the work are just blocked waiting for a response from Oracle, at least that is what it looks like to me.
Looking at Oracle, when I query v$session, it looks like one of the connections associated with these threads is currently executing a query, although I can't see the sql.
select ... from v$session where ...;
SQL_ADDRESS SQL_HASH_VALUE SQL_ID SQL_CHILD_NUMBER SQL_EXEC_START SQL_EXEC_ID PREV_SQL_ADDR PREV_HASH_VALUE PREV_SQL_ID PREV_CHILD_NUMBER PREV_EXEC_START PREV_EXEC_ID
---------------- -------------- ------------- ---------------- -------------- ----------- ---------------- --------------- ------------- ----------------- --------------- ------------
00 0 0000000239F59EE8 1483377872 fqr8pndc6p36h 5 26-JUL-12 32080545
00 0 0000000239F59EE8 1483377872 fqr8pndc6p36h 5 26-JUL-12 32080546
0000000148CABD88 1784444892 a16hxxtp5sxyw 0000000239F59EE8 1483377872 fqr8pndc6p36h 5 26-JUL-12 32080544
select * from v$sql where sql_id = 'a16hxxtp5sxyw';
no rows selected
My questions are:
Updated:
Based on the comment regarding looking in DBA_WAITERS and DBA_LOCKS
select * from dba_waiters;
no rows selected
select * from dba_locks where BLOCKING_OTHERS <> 'Not Blocking';
no rows selected
There were 98 rows in dba_locks but since all are 'Not Blocking' I don't think it is a locking issue? The process in question has been in this state for over 3 hours so any deadlock would have been detected by now.
I'm of the theory that the Oracle instance is not "healthy", but I'm at a loss as to what to look at. I have a request in to reboot the Oracle server, but that hasn't been done yet.
Follow-up question: Is it normal for the v$session to contain a sql_id that doesn't exist in v$sql and if so, under what conditions?
If the application console is available, then press Control+\ (on Oracle Solaris or Linux), or Control+Break (on Windows) to cause the HotSpot VM to print a thread dump. On Oracle Solaris and Linux operating systems the thread dump can also be obtained by sending a SIGQUIT to the process (command kill -QUIT pid).
In computing, a hang or freeze occurs when either a process or system ceases to respond to inputs. A typical example is when computer's graphical user interface (such as Microsoft Windows) no longer responds to the user typing on the keyboard or moving the mouse.
The Java ThreadPool does not have a mechanism for detecting hanging threads. Using a strategy like fixed threadpool ( Executors. newFixedThreadPool() ) will not work because if some tasks hang over time, all of the threads will eventually be in a hung state.
Description. The jstack command prints Java stack traces of Java threads for a specified Java process. For each Java frame, the full class name, method name, byte code index (BCI), and line number, when available, are printed. C++ mangled names aren't demangled.
The problem was solved and the answer was right in the v$session table. Apparently Oracle sessions can block for reasons other than just locking. Notice the column FINAL_BLOCKING_SESSION - it identifies the session that is the root cause of the blocking. We investigated session 845 and found that the client process (identified by MACHINE and PORT) no longer existed. The DBA killed session 845 and all returned to normal.
SID SERIAL# STATUS PROGRAM TYPE SQL_ID PREV_SQL_ID BLOCKING_SESSION_STATUS BLOCKING_INSTANCE BLOCKING_SESSION FINAL_BLOCKING_SESSION_STATUS FINAL_BLOCKING_INSTANCE FINAL_BLOCKING_SESSION EVENT
------- ------- --------- ---------------- ---- ------------- -------------- ----------------------- ----------------- ---------------- ----------------------------- ----------------------- ---------------------- ----------------------------
108 22447 ACTIVE Gcg log parser 1 USER fqr8pndc6p36h VALID 1 1581 VALID 1 845 library cache: mutex X
639 40147 ACTIVE Gcg log parser 3 USER fqr8pndc6p36h VALID 1 1581 VALID 1 845 library cache: mutex X
742 34683 ACTIVE Gcg log parser 2 USER a16hxxtp5sxyw fqr8pndc6p36h VALID 1 1581 VALID 1 845 library cache: mutex X
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With