Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What can cause TCP/IP to drop packets without dropping the connection?

I have a web-based application and a client, both written in Java. For what it's worth, the client and server are both on Windows. The client issues HTTP GETs via Apache HttpClient. The server blocks for up to a minute and if no messages have arrived for the client within that minute, the server returns HTTP 204 No Content. Otherwise, as soon as a message is ready for the client, it is returned with the body of an HTTP 200 OK.

Here is what has me puzzled: Intermittently for a specific subset of clients -- always clients with demonstrably flaky network connections -- the client issues a GET, the server receives and processes the GET, but the client sits forever. Enabling debugging logs for the client, I see that HttpClient is still waiting for the very first line of the response.

There is no Exception thrown on the server, at least nothing logged anywhere, not by Tomcat, not by my webapp. According to debugging logs, there is every sign that the server successfully responded to the client. However, the client shows no sign of having received anything. The client hangs indefinitely in HttpClient.executeMethod. This becomes obvious after the session times out and the client takes action that causes another Thread to issue an HTTP POST. Of course, the POST fails because the session has expired. In some cases, hours have elapsed between the session expiring and the client issuing a POST and discovering this fact. For this entire time, executeMethod is still waiting for the HTTP response line.

When I use WireShark to see what is really going on at the wire level, this failure does not occur. That is, this failure will occur within a few hours for specific clients, but when WireShark is running at both ends, these same clients will run overnight, 14 hours, without a failure.

Has anyone else encountered something like this? What in the world can cause it? I thought that TCP/IP guaranteed packet delivery even across short term network glitches. If I set an SO_TIMEOUT and immediately retry the request upon timeout, the retry always succeeds. (Of course, I first abort the timed-out request and release the connection to ensure that a new socket will be used.)

Thoughts? Ideas? Is there some TCP/IP setting available to Java or a registry setting in Windows that will enable more aggressive TCP/IP retries on lost packets?

like image 831
Eddie Avatar asked Apr 24 '09 20:04

Eddie


1 Answers

If you are using long running GETs, you should timeout on the client side at twice the server timeout, as you have discovered.

On a TCP where the client send a message and expects a response, if the server were to crash, and restart (lets say for the point of examples) then the client would still be waiting on the socket to get a response from the Server yet the server is no longer listening on that socket.

The client will only discover the socket is closed on the server end once it sends more data on that socket, and the server rejects this new data, and closes the socket.

This is why you should have client side time-outs on requests.

But as your server is not crashing, if the server was multi threaded, and thread socket for that client closed, but at that time ( duration minutes) the client has an connectivity outage, then the end socket hand-shaking my be lost, and as you are not sending more data to the server from the client, your client is once again left hanging. This would tie in to your flaking connection observation.

like image 145
Simeon Pilgrim Avatar answered Nov 09 '22 06:11

Simeon Pilgrim