Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.NET WebSockets forcibly closed despite keep-alive and activity on the connection

Tags:

We have written a simple WebSocket client using System.Net.WebSockets. The KeepAliveInterval on the ClientWebSocket is set to 30 seconds.

The connection is opened successfully and traffic flows as expected in both directions, or if the connection is idle, the client sends Pong requests every 30 seconds to the server (visible in Wireshark).

But after 100 seconds the connection is abruptly terminated due to the TCP socket being closed at the client end (watching in Wireshark we see the client send a FIN). The server responds with a 1001 Going Away before closing the socket.

After a lot of digging we have tracked down the cause and found a rather heavy-handed workaround. Despite a lot of Google and Stack Overflow searching we have only seen a couple of other examples of people posting about the problem and nobody with an answer, so I'm posting this to save others the pain and in the hope that someone may be able to suggest a better workaround.

The source of the 100 second timeout is that the WebSocket uses a System.Net.ServicePoint, which has a MaxIdleTime property to allow idle sockets to be closed. On opening the WebSocket if there is an existing ServicePoint for the Uri it will use that, with whatever the MaxIdleTime property was set to on creation. If not, a new ServicePoint instance will be created, with MaxIdleTime set from the current value of the System.Net.ServicePointManager MaxServicePointIdleTime property (which defaults to 100,000 milliseconds).

The issue is that neither WebSocket traffic nor WebSocket keep-alives (Ping/Pong) appear to register as traffic as far as the ServicePoint idle timer is concerned. So exactly 100 seconds after opening the WebSocket it just gets torn down, despite traffic or keep-alives.

Our hunch is that this may be because the WebSocket starts life as an HTTP request which is then upgraded to a websocket. It appears that the idle timer is only looking for HTTP traffic. If that is indeed what is happening that seems like a major bug in the System.Net.WebSockets implementation.

The workaround we are using is to set the MaxIdleTime on the ServicePoint to int.MaxValue. This allows the WebSocket to stay open indefinitely. But the downside is that this value applies to any other connections for that ServicePoint. In our context (which is a Load test using Visual Studio Web and Load testing) we have other (HTTP) connections open for the same ServicePoint, and in fact there is already an active ServicePoint instance by the time that we open our WebSocket. This means that after we update the MaxIdleTime, all HTTP connections for the Load test will have no idle timeout. This doesn't feel quite comfortable, although in practice the web server should be closing idle connections anyway.

We also briefly explore whether we could create a new ServicePoint instance reserved just for our WebSocket connection, but couldn't see a clean way of doing that.

One other little twist which made this harder to track down is that although the System.Net.ServicePointManager MaxServicePointIdleTime property defaults to 100 seconds, Visual Studio is overriding this value and setting it to 120 seconds - which made it harder to search for.

like image 520
Richard Leeke Avatar asked Nov 09 '16 08:11

Richard Leeke


People also ask

What causes WebSocket to close?

The main reason is that the connection was closed abnormally (locally) by the browser implementation. It means that your browser cannot reach your remote code server. The main issue is about your https connection to the server.

Does WebSocket use keep alive?

Keepalive in websockets To avoid these problems, websockets runs a keepalive and heartbeat mechanism based on WebSocket Ping and Pong frames, which are designed for this purpose.


1 Answers

I ran into this issue this week. Your workaround got me pointed in the right direction, but I believe I've narrowed down the root cause.

If a "Content-Length: 0" header is included in the "101 Switching Protocols" response from a WebSocket server, WebSocketClient gets confused and schedules the connection for cleanup in 100 seconds.

Here's the offending code from the .Net Reference Source:

//if the returned contentlength is zero, preemptively invoke calldone on the stream. //this will wake up any pending reads. if (m_ContentLength == 0 && m_ConnectStream is ConnectStream) {     ((ConnectStream)m_ConnectStream).CallDone(); } 

According to RFC 7230 Section 3.3.2, Content-Length is prohibited in 1xx (Informational) messages, but I've found it mistakenly included in some server implementations.

For additional details, including some sample code for diagnosing ServicePoint issues, see this thread: https://github.com/ably/ably-dotnet/issues/107

like image 136
Doug Drinka Avatar answered Oct 13 '22 02:10

Doug Drinka