Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does it mean for TCP connections to churn?

In the context of webservices, I've seen the term "TCP connection churn" used. Specifically Twitter finagle has ways to avoid it happening. How does it happen? What does it mean?

like image 436
eman Avatar asked Feb 02 '13 20:02

eman


People also ask

What is connection churn?

A system is said to have high connection churn when its rate of newly opened connections is consistently high and its rate of closed connection is consistently high. This usually means that an application uses short lived connections.

What is a TCP connection?

Transmission Control Protocol (TCP) is a standard that defines how to establish and maintain a network conversation by which applications can exchange data. TCP works with the Internet Protocol (IP), which defines how computers send packets of data to each other.

How is TCP connection terminated 4 way?

Firstly, from one side of the connection, either from the client or the server the FIN flag will be sent as the request for the termination of the connection. In the second step, whoever receives the FIN flag will then be sending an ACK flag as the acknowledgment for the closing request to the other side.

Does TCP heartbeat?

Interestingly, TCP protocol doesn't provide heartbeats (there are optional keep-alives that are operating on scale of hours, but these are not really useful for swift dectection of network disruption).


1 Answers

There might be multiple uses for this term, but I've always seen it used in cases where many TCP connections are being made in a very short space of time, causing performance issues on the client and potentially the server as well.

This often occurs when client code is written which automatically connects on a TCP failure of any sort. If this failure happens to be a connection failure before the connection is even made (or very early on in the protocol exchange) then the client can go into a near-busy loop constantly making connections. This can cause performance issues on the client side - firstly that there is a process in a very busy loop sucking up CPU cycles, and secondly that each connection attempt consumes a client-side port number - if this goes fast enough the software can wrap around when they hit the maximum port number (as a port is only a 16-bit number this certainly isn't impossible).

While writing robust code is a worthy aim, this simple "automatic retry" approach is a little too naive. You can see similar problems in other contexts - e.g. a parent process continually restarting a child process which immediately crashes. One common mechanism to avoid it is some sort of increasing back-off. So, when the first connection fails you immediately reconnect. If it fails again within a short time (e.g. 30 seconds) then you wait, say, 2 seconds before reconnecting. If it fails again within 30 seconds, you wait 4 seconds, and so on. Read the Wikipedia article on exponential backoff (or this blog post might be more appropriate for this application) for more background on this technique.

This approach has the advantage that it doesn't overwhelm the client or server, but it also means the client can still recover without manual intervention (which is especially crucial for software on an unattended server, for example, or in large clusters).

In cases where recovery time is critical, simple rate-limiting of TCP connection creation is also quite possible - perhaps no more than 1 per second or something. If there are many clients per server, however, this more simplistic approach can still leave the server's swamped by the load of accepting then closing a high connection rate.

One thing to note if you plan to employ exponential backoff - I suggest imposing a maximum wait time, or you might find that prolonged failures leave a client taking too long to recover once the server end does start accepting connections again. I would suggest something like 5 minutes as a reasonable maximum in most circumstances, but of course it depends on the application.

like image 171
Cartroo Avatar answered Sep 21 '22 07:09

Cartroo