I have a really weird problem that is driving me crazy.
I have a Ruby server and a Flash client (Action Script 3). It's a multiplayer game.
The problem is that everything is working perfect and then, suddenly, a random player stops receiving data. When the server closes the connection because of inactivity, about 20-60 seconds later, the client receives all the buffered data.
The client uses XMLsocket
for retrieving data, so the way the client receives data is not the problem.
socket.addEventListener(Event.CONNECT, connectHandler);
function connectHandler(event)
{
sendData(sess);
}
function sendData(dat)
{
trace("SEND: " + dat);
addDebugData("SEND: " + dat)
if (socket.connected) {
socket.send(dat);
} else {
addDebugData("SOCKET NOT CONNECTED")
}
}
socket.addEventListener(DataEvent.DATA, dataHandler);
function dataHandler(e:DataEvent) {
var data:String = e.data;
workData(data);
}
The server flushes data after every write, so is not a flushing problem:
sock.write(data + DATAEOF)
sock.flush()
DATAEOF
is null char, so the client parses the string.
When the server accepts a new socket, it sets sync
to true, to autoflush, and TCP_NODELAY
to true too:
newsock = serverSocket.accept
newsock.sync = true
newsock.setsockopt(Socket::IPPROTO_TCP, Socket::TCP_NODELAY, true)
This is my research:
Info: I was dumping netstat data to a file each second.
ESTABLISHED
.FIN_WAIT1
, as expected. Then, tcpflow shows that all buffered data is sent to the client, but the client don't receives data. some seconds after that, connection dissapears from netstat and tcpflow shows that the same data is sent again, but this time the client receives the data so starts sending data to the server and the server receives it. But it's too late... server has closed connection.I don't think it's an OS/network problem, because I've changed from a VPS located in Spain to Amazon EC2 located in Ireland and the problem still remains.
I don't think it's a client network problem too, because this occurs dozens of times per day, and the average quantity of online users is about 45-55, with about 400 unique users a day, so the ratio is extremely high.
EDIT: I've done more research. I've changed the server to C++.
When a client stops sending data, after a while the server receives a "Connection reset by peer" error. In that moment, tcpdump shows me that the client sent a RST packet, this could be because the client closed the connection and the server tried to read, but... why the client closed the connection? I think the answer is that the client is not the one closing the connection, is the kernel. Here is some info: http://scie.nti.st/2008/3/14/amazon-s3-and-connection-reset-by-peer
Basically, as I understand it, Linux kernels 2.6.17+ increased the maximum size of the TCP window/buffer, and this started to cause other gear to wig out, if it couldn’t handle sufficiently large TCP windows. The gear would reset the connection, and we see this as a “Connection reset by peer” message.
I've followed the steps and now it seems that the server is closing connections only when the client losses its connection to internet.
I'm going to add this as an answer so people know a bit mroe about this.
Description. Once a TCP connection has been established, that connection is defined to be valid until one side closes it. Once the connection has entered the connected state, it will remain connected indefinitely.
When a process exits - for whatever reason - the OS will close the TCP connections it had open. someone yanks out a network cable inbetween. the computer at the other end gets nuked.
A TCP socket that is connected should not be closed until the connection has been shut down.
As the receive buffer becomes full, new data cannot be accepted from the network for this socket and must be dropped, which indicates a congestion event to the transmitting node.
I think the answer is that the kernel is the one closing the connection. Here is some info: http://scie.nti.st/2008/3/14/amazon-s3-and-connection-reset-by-peer
Basically, as I understand it, Linux kernels 2.6.17+ increased the maximum size of the TCP window/buffer, and this started to cause other gear to wig out, if it couldn’t handle sufficiently large TCP windows. The gear would reset the connection, and we see this as a “Connection reset by peer” message.
I've followed the steps and now it seems that the server is closing connections only when the client losses its connection to internet.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With