Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory Leak in multiple apps

I have a memory leak in two apps in Tomcat 6.0.35 server that appeared "out of nowhere". One app is Solr and the other is our own software. I'm hoping someone has seen this before as it's been happening to me for the last few weeks and I have to keep restarting Tomcat in a production environment.

It appeared on our original server despite the fact that none of the code related to thread or DB connection operation has been touched. As the old server this app runs on was due to be retired I migrated the site to a new server and a "cleaner" environment with the idea that would clear out any legacy stuff. But it continues to happen.

Just before Tomcat shuts down the catalina.out log is filled with errors like:

2012-04-25 21:46:00,300 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/AppName] appears to have started a thread named [MultiThreadedHttpConnectionManager cleanup] but has failed to stop it. This is very likely to create a memory leak.

2012-04-25 21:46:00,339 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/AppName] appears to have started a thread named [com.mchan ge.v2.async.ThreadPoolAsynchronousRunner$PoolThread-#2] but has failed to stop it. This is very likely to create a memory leak.

2012-04-25 21:46:00,470 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/AppName] is still processing a request that has yet to fin ish. This is very likely to create a memory leak. You can control the time allowed for requests to finish by using the unloadDelay attribute of the standard Conte xt implementation.

During that migration we went from Solr 1.4->Solr 3.6 in an attempt to fix the problem. When the errors above start filling the log the Solr error below follows right behind repeated 10-15 times and then tomcat stops working and I have to shutdown and startup to get it to respond.

2012-04-25 21:46:00,527 [main] ERROR org.apache.catalina.loader.WebappClassLoader- The web application [/solr] created a ThreadLocal with key of type [org.a pache.solr.schema.DateField.ThreadLocalDateFormat] (value [org.apache.solr.schema.DateField$ThreadLocalDateFormat@1f1e90ac]) and a value of type [org.apache.solr. schema.DateField.ISO8601CanonicalDateFormat] (value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a]) but failed to remove it when the web a pplication was stopped. This is very likely to create a memory leak.

My research has brought up a lot of suggestions about changing the code that manages threads to make sure they kill off DB pooled connections etc. but the this code has not been changed in nearly 12 months. Also the Solr application is crashing and that's 3rd party so my thinking is that this is environmental (jar conflict, versioning, config fat fingered?)

My last change was updating the mysql connector for java to the latest as some memory leak bugs existed around pooling in earlier releases but the server's just crashed again only a few hours later.

One thing I just noticed is I'm seeing thousands of sessions in the Tomcat web manager but that could be a red herring.

If anyone has seen this any help is very much appreciated.

[Edit]

I think I found the source of the problem. It wasn't a memory leak after all. I've taken over an application from another development team that uses c3p0 for database pooling via Hibernate. c3p0 has a bug/feature that if you don't release DB connections c3p0 can go into a waiting state once all the connections (via MaxPoolSize: default is 15) are used. It will wait indefinitely for a connection to become available. Hence my stall.

I upped the MaxPoolSize firstly from 25->100 and my application ran for several days without a hang and then from 100->1000 and it's been running steady ever since (over 2 weeks).

This isn't the complete solution as I need to find out why it's running out of pooled connections so I also set c3p0's unreturnedConnectionTimeout to 4hrs which enforces a 4hr time limit on all connections regardless of whether they're active or not. If it's an active connection it will close it and re-open again.

Not pretty and c3p0 don't recommend it but it gives me some breathing space to find out the source of the problem.

Note: when using c3p0 with Hibernate the settings are stored in your persistence.xml file but not all settings can be put there. Some settings (e.g. unreturnedConnectionTimeout) must go in c3p0.properties

like image 950
Greg Kennedy Avatar asked Apr 25 '12 21:04

Greg Kennedy


1 Answers

You state that the sequence of events is:

  • errors appear
  • Tomcat stops responding
  • restart is required

However, the memory leak error messages only get reported when the web application is stopped. Therefore, something is triggering the web applications to stop (or reload). You need to figure out what is triggering this and stop it.

Regarding the actual leaks, you may find this useful:

http://people.apache.org/~markt/presentations/2010-11-04-Memory-Leaks-60mins.pdf

It looks both your app and Solr have some leaks that need to be fixed. The presentation will provide you with some pointers. I would also consider an upgrade to the latest 7.0.x. The memory leak detection has been improved and not all improvements have made it into 6.0.x yet.

like image 188
Mark Thomas Avatar answered Oct 19 '22 08:10

Mark Thomas