Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Play framework/Netty does not release socket

Have already asked similar question, but deleted it, since thought I fixed it, but I was wrong.

I'm using Play framework in production for one of my web projects. From time to time Play does not render main page or does not return some of the static content files.

First screenshot displays firebug console, loading of the site is stucked at the beginning, when serving home page. enter image description here Second screenshot display fiddler console, when 2 static resources are not loading.

enter image description here

Initially application runs fine, it has to work for 5-7 days and I can see this issue. It is hard to reproduce, it happens 1 of 15 time, I have to delete cache data and reload page. (pressing CRTL-F5 in FF). Issue can be reproduced in most of the browsers from different machines and OS. Initially, I was thinking that there are some problems with hosting provider. But I have changed it and issue has not gone.

Version of the play is 1.2.5. Tried 1.2.2 as well. Play is running as standalone server on CentOS-5-32 bits.

I suspect, that there are some problems with Netty which are used by Play framework. Netty 3.5.7 final jar is used by Play.

cd /proc/28761/fd
ls -l | wc -l
337

For few days number of opened file descriptor grows from 140 to 350. Note, that the the average load to website in the beginning and later is the same.

I can see a lot of sockets opened by process, which are not released later.

lrwx------ 1 root root 64 Nov 11 10:34 300 -> socket:[1079566]
lrwx------ 1 root root 64 Nov 11 10:34 301 -> socket:[1079568]
lrwx------ 1 root root 64 Nov 11 10:34 302 -> socket:[1149958]
lrwx------ 1 root root 64 Nov 11 10:34 303 -> socket:[1160807]
lrwx------ 1 root root 64 Nov 11 10:34 304 -> socket:[1160605]
lrwx------ 1 root root 64 Nov 11 10:34 305 -> socket:[1157435]
lrwx------ 1 root root 64 Nov 11 10:34 306 -> socket:[1160607]
lrwx------ 1 root root 64 Nov 11 10:34 307 -> socket:[1160609]
lrwx------ 1 root root 64 Nov 11 10:34 308 -> socket:[1155542]
lrwx------ 1 root root 64 Nov 11 10:34 309 -> socket:[1120231]

Update

Number of opened TCP connection on the start of application (few hours of running) is 63.

Total: 150 (kernel 181)
TCP:   63 (estab 38, closed 5, orphaned 0, synrecv 0, timewait 3/0), ports 44

Transport Total     IP        IPv6
*         181       -         -
RAW       0         0         0
UDP       7         4         3
TCP       58        9         49
INET      65        13        52
FRAG      0         0         0

After 2 days of running, number of open TCP connection is 490.

[root@82711-2 fd]# ss -s
Total: 459 (kernel 490)
TCP:   378 (estab 271, closed 23, orphaned 0, synrecv 0, timewait 9/0), ports 37

Transport Total     IP        IPv6
*         490       -         -
RAW       0         0         0
UDP       7         4         3
TCP       355       12        343
INET      362       16        346
FRAG      0         0         0

All of this opened TCP connection are http connection (not database or any others). Average load on the website is the same all time, but number of opened files descriptors and opened sockets is growing all the time until too many open files exception

Initially application is started with 9-15 New I/O Threads (Netty workers). All of Netty threads are in Running state most of the time. And ~16 play threads which are in Wait state.

After few days of running number of Netty workers became 27. I'm not Netty expert, not sure if it is normal behaviour.

Few threads went deadlock: 1 Play thread and 1 Netty thread. Also there is another Play thread which is locked by first Play thread. So 3 locked threads in total. I'm sure, that this deadlocks are not the root cause of the issue, but the root cause can be the same

Name: New I/O  worker #21
State: BLOCKED on org.jboss.netty.handler.stream.ChunkedWriteHandler@15e057 owned by: play-thread-2
Total blocked: 44  Total waited: 9

Stack trace: 
org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:188)
org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:140)
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792)
org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelClosed(SimpleChannelUpstreamHandler.java:212)
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:93)
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792)
org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:636)
org.jboss.netty.handler.codec.replay.ReplayingDecoder.channelClosed(ReplayingDecoder.java:533)
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:93)
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:476)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:631)
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:109)
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:66)
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:780)
org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:55)
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:785)
org.jboss.netty.handler.stream.ChunkedWriteHandler.handleDownstream(ChunkedWriteHandler.java:111)
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
org.jboss.netty.channel.Channels.close(Channels.java:821)
org.jboss.netty.channel.AbstractChannel.close(AbstractChannel.java:194)
org.jboss.netty.channel.ChannelFutureListener$1.operationComplete(ChannelFutureListener.java:41)
org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:399)
org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:385)
org.jboss.netty.channel.DefaultChannelFuture.setSuccess(DefaultChannelFuture.java:334)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:493)
   - locked java.lang.Object@3b7e28
org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:431)
org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:364)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.processWriteTaskQueue(AbstractNioWorker.java:349)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:245)
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:38)
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)

Second thread:

Name: play-thread-2
State: BLOCKED on java.lang.Object@3b7e28 owned by: New I/O  worker #21
Total blocked: 23  Total waited: 34 778

Stack trace: 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:654)
org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:408)
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:127)
org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:66)
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:780)
org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:63)
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:785)
org.jboss.netty.channel.Channels.write(Channels.java:733)
org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:262)
   - locked org.jboss.netty.handler.stream.ChunkedWriteHandler@15e057
org.jboss.netty.handler.stream.ChunkedWriteHandler.handleDownstream(ChunkedWriteHandler.java:121)
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
org.jboss.netty.channel.Channels.write(Channels.java:712)
org.jboss.netty.channel.Channels.write(Channels.java:679)
org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:245)
play.server.PlayHandler.serveStatic(PlayHandler.java:886)
play.server.PlayHandler$NettyInvocation.init(PlayHandler.java:182)
play.Invoker$Invocation.run(Invoker.java:276)
play.server.PlayHandler$NettyInvocation.run(PlayHandler.java:229)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
java.util.concurrent.FutureTask.run(FutureTask.java:138)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)

Update

I deployed same Play application to same environment to Tomcat 7. 24 hours are passed and the problem has gone, number of opened TCP connections remains constant. Number of opened files descriptors is not exceeding ~70. This is same production hosts, same database and same users of the application.

like image 326
Anton Avatar asked Nov 11 '12 12:11

Anton


2 Answers

I actually encountered a similar bug not in play, but in the JVM (which play runs on) Whereby closed channels pointing to filehandles are not released until forced to by closing the JVM. Alas, I cannot remember how I found the bug report, or I'd link to it, but it is a known bug IN THE JVM. I ended up having to work around it. The best thing I can suggest is to rewrite your code to use the same channels/file handles as much as possible.

like image 78
dainichi Avatar answered Nov 15 '22 07:11

dainichi


There were several issues with deadlocks in the ChunkedWriteHandler. All of them seem to be resolved in your Netty version. Anyway, that piece of code seems to be attracting that kind of problems. I suggest you file an issue for the Netty guys.

https://issues.jboss.org/browse/NETTY-384

Also see "similar issues" to get an idea of how many issues there were regarding that class.

like image 38
sorencito Avatar answered Nov 15 '22 08:11

sorencito