We recently upgraded our Jetty servers from version 6.1.25 to 9.0.4. They are deployed on Java 1.7.0_11 64-bit on a Windows 2008 server.
Other than required configuration changes for Jetty (start.ini - very nice), we kept all the JVM flags the same as they were previously. 6 days after deploying in the production environment, the server became unresponsive to HTTP requests. Internal 'heartbeat' processing continued to run per normal during this time but it was not servicing external requests. The service was restarted and 6 days later it again became unresponsive.
During my initial review, I thought I was onto something with https://bugs.eclipse.org/bugs/show_bug.cgi?id=357318. However, that JVM issue was backported from Java 1.8_0XX to Java 1.7.0_06. This led me to review the Thread processing.
Thought it might be related to case 400617/410550 on the eclipse site although it doesn't present itself quite like the write-up, and the case had been apparently resolved in Jetty 9.0.3.
Monitoring the application via JMX shows that Thread count for 'qtp' threads continues to grow over time and I've been unsuccessful in searching for a resolution. Thread configuration is currently set for:
threads.min=10
threads.max=200
threads.timeout=60000
All the qtp threads are typically in WAITING state with the following stack trace:
Name: qtp1805176801-285
State: WAITING on java.util.concurrent.Semaphore$NonfairSync@4bf4a3b0
Total blocked: 0 Total waited: 110
Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(Unknown Source)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(Unknown Source)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(Unknown Source)
java.util.concurrent.Semaphore.acquire(Unknown Source)
org.eclipse.jetty.util.BlockingCallback.block(BlockingCallback.java:96)
org.eclipse.jetty.server.HttpConnection$Input.blockForContent(HttpConnection.java:457)
org.eclipse.jetty.server.HttpInput.consumeAll(HttpInput.java:282)
- locked org.eclipse.jetty.util.ArrayQueue@3273ba91
org.eclipse.jetty.server.HttpConnection.completed(HttpConnection.java:360)
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:340)
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:224)
org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601)
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532)
java.lang.Thread.run(Unknown Source)
After a closer look, this appears different from the newest threads that have the following state:
Name: qtp1805176801-734
State: TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@77b83b6e
Total blocked: 5 Total waited: 478
Stack trace:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(Unknown Source)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:390)
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:509)
org.eclipse.jetty.util.thread.QueuedThreadPool.access$700(QueuedThreadPool.java:48)
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:563)
java.lang.Thread.run(Unknown Source)
Based on the naming convention, some of the qtp threads are very old (qtp1805176801-206) while some are very new (qtp1805176801-6973). I find it interesting that the older threads aren't timing out based on the 60 second idle timeout. The application services customers during US business hours and is largely idle in the early morning hours at which time I'd expect almost all of the pool to get cleaned up.
Hoping someone may be able to point me the right direction in terms of how to track this issue down. My experience with Jetty leads me to believe their stuff is very solid and most issues are either programmatic in our implementation (been there) or JVM related (done that). Also open to suggestions if you think I might be chasing a red-herring on the Threads.
NEW INFORMATION: Tracing the exceptions a little farther, this appears to be caused when GWT RPC calls are timing out while waiting for a response. The following stack trace shows an exception in the log file that is related to a Thread that is in an invalid state. Using this to review and look for other reports of Jetty/GWT interaction issues.
2013-09-03 08:41:49.249:WARN:/webapp:qtp488328684-414: Exception while dispatching incoming RPC call
java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout expired: 30015/30000 ms
at org.eclipse.jetty.util.BlockingCallback.block(BlockingCallback.java:103)
at org.eclipse.jetty.server.HttpConnection$Input.blockForContent(HttpConnection.java:457)
at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:130)
at java.io.InputStream.read(Unknown Source)
at com.google.gwt.user.server.rpc.RPCServletUtils.readContent(RPCServletUtils.java:175)
at com.google.gwt.user.server.rpc.RPCServletUtils.readContentAsGwtRpc(RPCServletUtils.java:205)
at com.google.gwt.user.server.rpc.AbstractRemoteServiceServlet.readContent(AbstractRemoteServiceServlet.java:182)
at com.google.gwt.user.server.rpc.RemoteServiceServlet.processPost(RemoteServiceServlet.java:239)
at com.google.gwt.user.server.rpc.AbstractRemoteServiceServlet.doPost(AbstractRemoteServiceServlet.java:62)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:755)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:698)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1506)
at c.t.b.servlet.PipelineFilter.doFilter(PipelineFilter.java:56)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1494)
at c.v.servlet.SetRequestEncoding.doFilter(SetRequestEncoding.java:27)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1494)
at c.t.b.servlet.OutOfMemoryFilter.doFilter(OutOfMemoryFilter.java:39)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1094)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1028)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:445)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:267)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:224)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532)
at java.lang.Thread.run(Unknown Source)
Caused by:
java.util.concurrent.TimeoutException: Idle timeout expired: 30015/30000 ms
at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:153)
at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
A thread pool with a queue of jobs to execute.
Jetty has a single ThreadPool, for all operations. A Thread is a Thread is a Thread. There is no distinction between selector / acceptor / requests / async processing / async read / async write / websocket / proxy / client / etc. At last count, in Jetty 9.4.
Jetty is an open-source project providing an HTTP server, HTTP client, and javax. servlet container.
Ended up posting the question on the Eclipse/Jetty web site. The following link can be used to track any permanent fix to the solution.
https://bugs.eclipse.org/bugs/show_bug.cgi?id=416477
The issue has to do with the Semaphore locking on a QTP Thread that has been timed out during the request as part of a GWT RPC call. The original request is timed, with a timeout of 30 seconds. The request times out while it is waiting on the Semaphore.acquire method to complete. As part of the clean-up of the request, the HTTPConnection attempts to .consumeAll on the request, which again attempts a Sempahore.acquire. This time, the request is not timed and the lock remains in place until the thread is interrupted.
The issue does appear to be very specific to the platform as Jetty has not been able to reproduce the issue and I've not be able to find any other reports of the issue. Furthermore, this only occurs in one of our production environments. My guess is that there is something going on between the GWT RPC Code, Jetty and the Operating System. We have minor upgrades planned for the JDK, Jetty and the GWT SDK.
Workaround The initial work around was to manually interrupt locked threads a couple times a day via the JMX console. Our longer term solution was to build a clean-up mechanism that looks for these locked threads and calls the interrupt method on them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With