We are experiencing a very tricky problem with our java servlet, running under Jetty. It works perfectly well when the load is low to moderate, but when the load reaches a certain level it will cease responding to requests after about 10-20 minutes.
If we trace the main java process while connecting to it using curl, we can see that it sets up the connection, receives the request, parses it and does what it normally does (query a Solr server, perform some MySQL queries etc), but the result is never sent back to the client.
When it hangs, it will hang indefinitely. No amount of time will make it "snap out of it", BUT if we kill any random thread under the main java/jetty process, the number of threads goes down and it starts responding to requests again.
Here’s how it looks:
# curl http://localhost:8080/some-servlet-url
(Does not respond at this point)
# ps -efL | grep qserv | wc -l
243
# ps -efL | grep qserv | wc -l
243
# ps -efL | grep qserv | wc -l
243
(Number of threads remain seemingly constant)
# kill 29760 <--- random thread under the main java/jetty process
# ps -efL | grep qserv | wc -l
26
(Number of threads immediately decreases sharply)
# curl http://localhost:8080/some-servlet-url
... HTTP response ...
(Responds to connections again)
The server is an m2.2xlarge Amazon EC2 instance running Ubuntu 12.04.1 LTS and Jetty 8.1.7.v20120910
$ java -version
java version "1.7.0_07"
OpenJDK Runtime Environment (IcedTea7 2.3.2) (7u7-2.3.2-1ubuntu0.12.04.1)
OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)
We’ve tried several different java and Jetty versions. We’ve also tried using Tomcat instead of Jetty - same problem.
Profiling the app using YourKit shows no apparent thread locking or excessive CPU usage.
Any ideas?
Edit: We were able to get a stack trace of the hanging java process, and it looks like all the http threads are in this state:
"http-bio-8080-exec-5" daemon prio=10 tid=0x00007fe518007800 nid=0x1fc5 in Object.wait() [0x00007fe57934f000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000076ee9d230> (a org.apache.commons.pool.impl.GenericObjectPool)
at java.lang.Object.wait(Object.java:503)
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:810)
- locked <0x000000076ee9d230> (a org.apache.commons.pool.impl.GenericObjectPool)
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:95)
at net.acmecorp.active.QueryResultXMLFormatter.selectBestHitsAndRunDocumentCompletion(QueryResultXMLFormatter.java:362)
at net.acmecorp.active.QueryResultXMLFormatter.queryResultToXMLRootElement(QueryResultXMLFormatter.java:167)
at net.acmecorp.active.QueryPrepareAndExecuter.prepareParametersAndExecuteQuery_AndInvokeFormatter(QueryPrepareAndExecuter.java:239)
at net.acmecorp.servlets.MultiQueryServlet.handle(MultiQueryServlet.java:470)
at net.acmecorp.servlets.MultiQueryServlet.doGet(MultiQueryServlet.java:85)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1002)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:585)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
- locked <0x0000000700dd70d0> (a org.apache.tomcat.util.net.SocketWrapper)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Locked ownable synchronizers:
- <0x000000076ed40990> (a java.util.concurrent.ThreadPoolExecutor$Worker)
I’m not really a java guy myself, so I’m not able to see exactly what is wrong here, but it certainly looks like they are all waiting for something...
You have database pooling issues, your http threads are all waiting on database resources to free up. This is a common problem, you either need to increase the number of database connections in your pool so is no longer a problem, or you need to use a QoS service like the QoSFilter in jetty to limit the number of requests that have come in for processes to protect a small database pool.
Now, the reason your database pool has been consumed could be a number of things, you might be leaking them in the application (getting them from pool, using them, and never returning them) or you might just be having too many requests coming in concurrently. Or an issue in the database pooling libraries themselves, it happens.
Either way, this is not a jetty or tomcat issue, it is a resource contention issue between your application and the database. Requests are coming into jetty or tomcat, going into servlet lands and then waiting in a servlet or the like for a database connection, simple as that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With