Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Connection Pool Empty Hibernate 4, but Unable to Find the Culprit

I'm monitoring the SQL database for connections every 5 minutes. For days it'll hover around 5 connections (my idle) then suddenly I'm at 50. Obviously this is a recursive issue because I can't see why I would jump from 5 to 50 within 5 minutes with zero traffic.

I'm using Hibernate 4 and Tomcat and I know of an issue in Hibernate that was patched in 4.3.2, but I'm on 4.3.5

More details: The pool empty event happens every day at exactly 7:13:20PM... Sounds too automatic. I am using Quartz and it runs every 1 minute, but I can't see how they're related.

My properties:

jmxEnabled = true
initialSize = 5
maxActive = 50
minIdle = 5
maxIdle = 25
maxWait = 10000
maxAge = 10 * 60000
timeBetweenEvictionRunsMillis = 5000
minEvictableIdleTimeMillis = 60000
validationQuery = "SELECT 1"
validationQueryTimeout = 3
validationInterval = 15000
testOnBorrow = true
testWhileIdle = true
testOnReturn = false
jdbcInterceptors = "ConnectionState"
defaultTransactionIsolation = java.sql.Connection.TRANSACTION_READ_COMMITTED

Environment:

  • Tomcat 7.0.59
  • java 1.7.0 update 76
  • SQL Server 2012

More information: I reduced the quartz job frequency to every 5 minutes. The event still took place when I loaded a page/view in the application. This was roughly at 7:14 PM. I'm on the verge of downgrading to hibernate 3.

Update Today I reloaded the application in Tomcat Manager at 6:50 PM, but the event still took place. Thread Dump

like image 617
John Giotta Avatar asked May 13 '15 21:05

John Giotta


2 Answers

Ah those kind of bugs are fun. Obviously there is no way we can point you to the exact culprint (except when somebody digs up a bug in the libs you mentioned), so lets see how you can debug this. Approximatly from easy to hard to do, although the details depend on your environment.

  1. You have very helpful information: The problem happens always at the same time. This hints at two options: either one of your jobs that you run with Quartz eats up connections, or something (possible external) is happening at that time causing your code to eat up connections. Obviously you should check your job configurations and cron jobs or jobs configured inside the database or similar for potential culprits. Note that they might start quite some time earlier and just arrive at that critical state later on, so the job might start 2hours earlier for all we know.

  2. Check your logs and the system logs and the database logs for anything that happens at that time or some time earlier.

  3. Double check everything that gets a connection if it always returns the connection. Especially when exceptions get thrown. One classical way to fail at that is a construction like this (java like pseudo code):

    Connection con;
    
    try {
        con = getConnection();
        Statement = stmnt = con.createStatement();
        ....
    } finally (Exception ex){
        if (stmnt != null) stmnt.close();
        if (con != null) con.close(); // this will never happen if stmnt.close throws an exceptions
    
    }
    
  4. Establish logging that let you see when exactly connection don't get returned. Everything that start anything in your application should go through some kind of wrapper (AOP around Aspect, Servlet Filter or similar). That wrapper should do the following: create a unique id for the action (UUID) and put in the MDC of your logging framework. At the end of the action that id gets removed again. All other logging should include that id. Wrap you connection pool as well. Keep track on when something requested a connection, including the timestamp, the id and possibly the stacktrace (by creating and storing an exception). Log that. Every time a connection gets returned log the time it was used. Also every time a connection gets requested check if any connection is used for longer then some threshold.

  5. Isolate things: Set up a second server, where you run the application. Does it have the same problem? Run some parts only on one of the two servers, do they still both have the problem? Continue to exclude candidates until only one is left.

like image 58
Jens Schauder Avatar answered Sep 23 '22 03:09

Jens Schauder


If I am faced with such an issue, I would try my best to get a thread dump of when the maxActive connections reaches 50. You could try increasing this maxActive limit to check if the application has a higher peak.

I would also configure tomcat to use a connection pool provider such as c3p0, if it is not already being used. I would then create a custom hook class as described in the following section: http://www.mchange.com/projects/c3p0/#connection_customizers

With this custom class keep a running counter of the connections being acquired and released. When this number is near or at the limit, initiate a thread dump programmatically. This can be done as described in the following page: http://crunchify.com/how-to-generate-java-thread-dump-programmatically/ Analyze this thread dump to check the source of the connections.

This information will not only be useful for your current issue but also for troubleshooting future performance issues.

like image 26
codedabbler Avatar answered Sep 22 '22 03:09

codedabbler