I have recently discovered that problems with intermittent failures for users running my application using Internet Explorer is due to a bug in Internet Explorer. The bug is in the HTTP stack, and should be affecting all applications using POST requests from IE. The result is a failure characterized by a request that seems to hang for about 5 minutes (depending on server type and configuration), then fail from the server end. The browser application will error out of the post request after the server gives up. I will explain the IE bug in detail below.
As far as I can tell this will happen with any application using XMLHttpRequest to send POST requests to the server, if the request is sent at the wrong moment. I have written a sample program that attempts to send the POSTS at just those times. It attempts to send continuous POSTs to the server at the precise moment the server closes the connections. The interval is derived from the Keep-Alive header sent by the server.
I am finding that when running from IE to a server with a bit of latency (i.e. not on the same LAN), the problem occurs after only a few POSTs. When it happens, IE locks up so hard that it has to be force closed. The ticking clock is an indication that the browser is still responding.
You can try it by browsing to: http://pubdev.hitech.com/test.post.php. Please take care that you don't have any important unsaved information in any IE session when you run it, because I am finding that it will crash IE.
The full source can be retrieved at: http://pubdev.hitech.com/test.post.php.txt. You can run it on any server that has php and is configured for persistent connections.
My questions are:
What are other people's experiences with this issue?
Is there a known strategy for working around this problem (other than "use another browser")?
Does Microsoft have better information about this issue than the article I found (see below)?
The problem is that web browsers and servers by default use persistent connections as described in RFC 2616 section 8.1 (see http://www.ietf.org/rfc/rfc2616.txt). This is very important for performance--especially for AJAX applications--and should not be disabled. There is however a small timing hole where the browser may start to send a POST on a previously used connection at the same time the server decides the connection is idle and decides to close it. The result is that the browser's HTTP stack will get a socket error because it is using a closed socket. RFC 2616 section 8.1.4 anticipates this situation, and states, "...clients, servers, and proxies MUST be able to recover from asynchronous close events. Client software SHOULD reopen the transport connection and retransmit the aborted sequence of requests without user interaction..."
Internet Explorer does resend the POST when this happens, but when it does it mangles the request. It sends the POST headers, including the Content-Length of the data as posted, but it does not send the data. This is an improper request, and the server will wait an unspecified amount of time for the promised data before failing the request with an error. I have been able to demonstrate this failure 100% of the time using a C program that simulates an HTTP server, which closes the socket of an incoming POST request without sending a response.
Microsoft seems to acknowledge this failure in http://support.microsoft.com/kb/895954. They say that it affects IE versions 6 through 9. That provide a hotfix for this problem, that has shipped with all versions of IE since IE 7. The hotfix does not seem satisfactory for the following reasons:
It is not enabled unless you use regedit to add a key called FEATURE_SKIP_POST_RETRY_ON_INTERNETWRITEFILE_KB895954 to the registry. This is not something I would expect my users to have to do.
The hotfix does not actually fix the broken POST. Instead, if the socket gets closed as anticipated by the RFC, it simply errors out immediately without trying to resent the POST. The application still fails--it just fails sooner.
The following example is a self contained php program that demonstrates the bug. It attempts to send continuous POSTs to the server at the precise moment the server closes the connections. The interval is derived from the Keep-Alive header sent by the server.
We've encountered this problem with IE on a regular basis. There is no good solution. The only solution that is guaranteed to solve the problem is to ensure that the web server keepalive timeout is higher than the browser keepalive timeout (by default with IE this is 60s). Any situation where the web server is set to a lower value can result in IE attempting to reuse the connection and sending a request that gets rejected with a TCP RST because the socket has been closed. If the web server keepalive timeout value is higher than IE's keepalive timeout then IE's reuse of the connections ensure that the socket won't be closed. With high latency connections you'll have to consider the latency time as the time spent in-transit could be an issue.
Keep in mind however, that increasing the keepalive on the server means that an idle connection is using server sockets for that much longer. So you may need to size the server to handle a large number of inactive idle connections. This can be a problem as it may result in a burst of load to the server that the server isn't able to handle.
Another thing to keep in mind. You note that the RFC section 8.1.4 states :"...clients, servers, and proxies MUST be able to recover from asynchronous close events. Client software SHOULD reopen the transport connection and retransmit the aborted sequence of requests without user interaction..."
You forgot a very important part. Here's the full text: Client software SHOULD reopen the transport connection and retransmit the aborted sequence of requests without user interaction so long as the request sequence is idempotent (see section 9.1.2). Non-idempotent methods or sequences MUST NOT be automatically retried, although user agents MAY offer a human operator the choice of retrying the request(s). Confirmation by user-agent software with semantic understanding of the application MAY substitute for user confirmation. The automatic retry SHOULD NOT be repeated if the second sequence of requests fails
An HTTP POST is non-idempotent as defined by 9.1.2. Thus the behavior of the registry hack is actually technically correct per the RFC.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With