Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Server send RST to client when TCP connection max than 65000~

I am work on a high load tcp application with Java Netty, which expect to arrive 300k concurrent TCP connections.

It works perfect on test server, arrive 300k connections, but when deploy to production server, it only can support 65387 connections, after arrive this number, client will throw out a "java.io.IOException: Connection reset by peer" exceptions. I try many times, every time, when connections up to 65387, client will can't create connection.

The network capture as bellow, 10.95.196.27 is server, 10.95.196.29 is client :

16822   12:26:12.480238 10.95.196.29    10.95.196.27    TCP 74  can-ferret > http [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=872641174 TSecr=0 WS=128
16823   12:26:12.480267 10.95.196.27    10.95.196.29    TCP 66  http > can-ferret [SYN, ACK] Seq=0 Ack=1 Win=2920 Len=0 MSS=1460 SACK_PERM=1 WS=1024
16824   12:26:12.480414 10.95.196.29    10.95.196.27    TCP 60  can-ferret > http [ACK] Seq=1 Ack=1 Win=14720 Len=0
16825   12:26:12.480612 10.95.196.27    10.95.196.29    TCP 54  http > can-ferret [FIN, ACK] Seq=1 Ack=1 Win=3072 Len=0
16826   12:26:12.480675 10.95.196.29    10.95.196.27    HTTP    94  Continuation or non-HTTP traffic
16827   12:26:12.480697 10.95.196.27    10.95.196.29    TCP 54  http > can-ferret [RST] Seq=1 Win=0 Len=0

The exception cause by after client 3 handshake to server, server send a RST package to client, and the new connection was broken.

client side exception stack as bellow:

16:42:05.826 [nioEventLoopGroup-1-15] WARN  i.n.channel.DefaultChannelPipeline - An exceptionCaught() event was fired, and it reached at the end of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.7.0_25]
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:1.7.0_25]
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225) ~[na:1.7.0_25]
    at sun.nio.ch.IOUtil.read(IOUtil.java:193) ~[na:1.7.0_25]
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375) ~[na:1.7.0_25]
    at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:259) ~[netty-all-4.0.0.Beta3.jar:na]
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:885) ~[netty-all-4.0.0.Beta3.jar:na]
    at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:226) ~[netty-all-4.0.0.Beta3.jar:na]
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:72) ~[netty-all-4.0.0.Beta3.jar:na]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:460) ~[netty-all-4.0.0.Beta3.jar:na]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:424) ~[netty-all-4.0.0.Beta3.jar:na]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:360) ~[netty-all-4.0.0.Beta3.jar:na]
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:103) ~[netty-all-4.0.0.Beta3.jar:na]
    at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]

Sever side have not exceptions.

I had try turning some sysctl item as bellow to support huge connections, but its useless:

net.core.wmem_max = 33554432
net.ipv4.tcp_rmem = 4096 4096 33554432
net.ipv4.tcp_wmem = 4096 4096 33554432
net.ipv4.tcp_mem = 786432 1048576 26777216
net.ipv4.tcp_max_tw_buckets = 360000
net.core.netdev_max_backlog = 4096
vm.min_free_kbytes = 65536
vm.swappiness = 0
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_max_syn_backlog = 4096
net.netfilter.nf_conntrack_max = 3000000
net.nf_conntrack_max = 3000000
net.core.somaxconn = 327680

The max open fd already set to 999999

linux-152k:~ # ulimit -n
999999

The OS release is SUSE Linux Enterprise Server 11 SP2 with 3.0.13 kernel:

linux-152k:~ # cat /etc/SuSE-release 
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 2
linux-152k:~ # uname -a
Linux linux-152k 3.0.13-0.27-default #1 SMP Wed Feb 15 13:33:49 UTC 2012 (d73692b) x86_64 x86_64 x86_64 GNU/Linux.

The dmesg have not any error information, CPU and Memory keep low level, every thing looks good, just server reset connection from client.

We have a test server which was SUSE Linux Enterprise Server 11 SP1 with 2.6.32 kernel, it works well, can support up to 300k connections.

I think maybe some kernel or security limit cause this, but I can't find it, any suggestions or any way to get some debug informations of why server send RST? Thanks.

like image 637
Santal Li Avatar asked Oct 25 '13 09:10

Santal Li


2 Answers

Santal, I've just came across the following link, and it seems it can give an answer to your question: What is the theoretical maximum number of open TCP connections that a modern Linux box can have

like image 108
Anton Avatar answered Nov 09 '22 16:11

Anton


Finally got the root cause. Simply said, it was a JDK bug, please refer to http://mail.openjdk.java.net/pipermail/nio-dev/2013-September/002284.html which cause NPE when fd > 64 * 1024.

After upgrade to JDK7_45, everything works great now.

like image 34
Santal Li Avatar answered Nov 09 '22 17:11

Santal Li