I am experimenting with TCP keep alive on my Linux box, and have written the following small server: <pre class="prettyprint"><code>#include <iostream> #include <cstring> #include <netinet/in.h> #include <arpa/inet.h> // inet_ntop #include <netinet/tcp.h> #include <netdb.h> // addrinfo stuff using namespace std; typedef int SOCKET; int main(int argc, char *argv []) { struct sockaddr_in sockaddr_IPv4; memset(&sockaddr_IPv4, 0, sizeof(struct sockaddr_in)); sockaddr_IPv4.sin_family = AF_INET; sockaddr_IPv4.sin_port = htons(58080); if (inet_pton(AF_INET, "10.6.186.24", &sockaddr_IPv4.sin_addr) != 1) return -1; SOCKET serverSock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); if (bind(serverSock, (sockaddr*)&sockaddr_IPv4, sizeof(sockaddr_IPv4)) != 0 || listen(serverSock, SOMAXCONN) != 0) { cout << "Failed to setup listening socket!\n"; } SOCKET clientSock = accept(serverSock, 0, 0); if (clientSock == -1) return -1; // Enable keep-alive on the client socket const int nVal = 1; if (setsockopt(clientSock, SOL_SOCKET, SO_KEEPALIVE, &nVal, sizeof(nVal)) < 0) { cout << "Failed to set keep-alive!\n"; return -1; } // Get the keep-alive options that will be used on the client socket int nProbes, nTime, nInterval; socklen_t nOptLen = sizeof(int); bool bError = false; if (getsockopt(clientSock, IPPROTO_TCP, TCP_KEEPIDLE, &nTime, &nOptLen) < 0) { bError = true; } nOptLen = sizeof(int); if (getsockopt(clientSock, IPPROTO_TCP, TCP_KEEPCNT, &nProbes, &nOptLen) < 0) {bError = true; } nOptLen = sizeof(int); if (getsockopt(clientSock, IPPROTO_TCP, TCP_KEEPINTVL, &nInterval, &nOptLen) < 0) { bError = true; } cout << "Keep alive settings are: time: " << nTime << ", interval: " << nInterval << ", number of probes: " << nProbes << "\n"; if (bError) { // Failed to retrieve values cout << "Failed to get keep-alive options!\n"; return -1; } int nRead = 0; char buf[128]; do { nRead = recv(clientSock, buf, 128, 0); } while (nRead != 0); return 0; } </code></pre> I then adjusted the system-wide TCP keep alive settings to be as follows: <pre class="prettyprint"><code># cat /proc/sys/net/ipv4/tcp_keepalive_time 20 # cat /proc/sys/net/ipv4/tcp_keepalive_intvl 30 </code></pre> I then connected to my server from Windows, and ran a Wireshark trace to see the keep-alive packets. The image below shows the result. <img src="https://i.stack.imgur.com/IhFFU.jpg" alt="Packets 1"> This confused me, since I now understand the keep-alive interval to only come into play if no ACK is received in response to the original keep alive packet (see my other question here). So I would expect the subsequent packets to be consistently sent at 20 second intervals (not 30, which is what we see), not just the first one. I then adjusted the system wide settings as follows: <pre class="prettyprint"><code># cat /proc/sys/net/ipv4/tcp_keepalive_time 30 # cat /proc/sys/net/ipv4/tcp_keepalive_intvl 20 </code></pre> This time when I connect, I see the following in my Wireshark trace: <img src="https://i.stack.imgur.com/3j7XV.jpg" alt="Packets2"> Now we see that the first keep-alive packet is sent after 30 seconds, but each one thereafter is also sent at 30 seconds, not the 20 as would be suggested by the previous run! Can someone please explain this inconsistent behaviour?

Roughly speaking, how it is supposed to work is that a keepalive message will be sent every <code>tcp_keepalive_time</code> seconds. If an <code>ACK</code> is not recieved, it will then probe every <code>tcp_keepalive_intvl</code> seconds. If an <code>ACK</code> is not received after <code>tcp_keepalive_probes</code>, the connection will be aborted. Thus, a connection will be aborted after at most <pre class="prettyprint"><code> tcp_keepalive_time + tcp_keepalive_probes * tcp_keepalive_intvl </code></pre> seconds without a response. See this kernel documentation. We can easily watch this work using netcat keepalive, a version of netcat that allows us to set tcp keepalive parameters (The sysctl keepalive parameters are the default, but they can be overriden on a per socket basis in the <code>tcp_sock</code> struct). First start up a server listening on port <code>8888</code> with <code>keepalive_timer</code> set to 5 seconds, <code>keepalive_intval</code> set to 1 second, and <code>keepalive_probes</code> set to 4. <pre class="prettyprint"><code> $ ./nckl-linux -K -O 5 -I 1 -P 4 -l 8888 >/dev/null & </code></pre> Next, let's use <code>iptables</code> to introduce loss for <code>ACK</code> packets sent to the server: <pre class="prettyprint"><code> $ sudo iptables -A OUTPUT -p tcp --dport 8888 \ > --tcp-flags SYN,ACK,RST,FIN ACK \ > -m statistic --mode random --probability 0.5 \ > -j DROP </code></pre> This will cause packets that are sent to TCP port 8888 with just the <code>ACK</code> flag set to be dropped with probability 0.5. Now let's connect and watch with the vanilla netcat (which will use the sysctl keepalive values): <pre class="prettyprint"><code> $ nc localhost 8888 </code></pre> Here is the capture: <img src="https://i.stack.imgur.com/WcfDJ.png" alt="TCP keepalive capture"> As you can see, it waits 5 seconds after receiving an <code>ACK</code> before sending another keepalive message. If it doesn't receive an <code>ACK</code> within 1 second, it sends another probe, and if it doesn't receive an <code>ACK</code> after 4 probes, it aborts the connection. This is exactly how keepalive is supposed to work. So let's try to reproduce what you were seeing. Let's delete the iptables rule (no loss), start a new server with <code>tcp_keepalive_time</code> set to 1 second, and <code>tcp_keepalive_intvl</code> set to 5 seconds, and then connect with a client. Here is the result: <img src="https://i.stack.imgur.com/rGWyt.png" alt="Capture with keepalive_time < keepalive_intvl, no loss"> Interestingly, we see the same behavior you did: after the first <code>ACK</code>, it waits 1 second to send a keepalive message, and thereafter every 5 seconds. Let's add the iptables rule back in to introduce loss to see what time it actually waits to send another probe if it doesn't get an <code>ACK</code> (using <code>-K -O 1 -I 5 -P 4</code> on the server): <img src="https://i.stack.imgur.com/WHhNo.png" alt="Capture with keepalive_time < keepalive_intvl, with loss"> Again, it waits 1 second from the first <code>ACK</code> to send a keepalive message, but thereafter it waits 5 seconds whether it sees an <code>ACK</code> or not, as if <code>keepalive_time</code> and <code>keepalive_intvl</code> are both set to 5. In order to understand this behavior, we will need to take a look at the linux kernel TCP implementation. Let's first look at <code>tcp_finish_connect</code>: <pre class="prettyprint"><code> if (sock_flag(sk, SOCK_KEEPOPEN)) inet_csk_reset_keepalive_timer(sk, keepalive_time_when(tp)); </code></pre> When the TCP connection is established, the keepalive timer is effectively set to <code>tcp_keepalive_time</code>, which is 1 second in our case. Next, let's take a look at how the timer is processed in <code>tcp_keepalive_timer</code>: <pre class="prettyprint"><code> elapsed = keepalive_time_elapsed(tp); if (elapsed >= keepalive_time_when(tp)) { /* If the TCP_USER_TIMEOUT option is enabled, use that * to determine when to timeout instead. */ if ((icsk->icsk_user_timeout != 0 && elapsed >= icsk->icsk_user_timeout && icsk->icsk_probes_out > 0) || (icsk->icsk_user_timeout == 0 && icsk->icsk_probes_out >= keepalive_probes(tp))) { tcp_send_active_reset(sk, GFP_ATOMIC); tcp_write_err(sk); goto out; } if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) { icsk->icsk_probes_out++; elapsed = keepalive_intvl_when(tp); } else { /* If keepalive was lost due to local congestion, * try harder. */ elapsed = TCP_RESOURCE_PROBE_INTERVAL; } } else { /* It is tp->rcv_tstamp + keepalive_time_when(tp) */ elapsed = keepalive_time_when(tp) - elapsed; } sk_mem_reclaim(sk); resched: inet_csk_reset_keepalive_timer (sk, elapsed); goto out; </code></pre> When <code>keepalive_time_when</code> is greater than <code>keepalive_itvl_when</code> this code works as expected. However, when it is not, you see the behavior you observed. When the initial timer (set when the TCP connection is established) expires after 1 second, we will extend the timer until <code>elapsed</code> is greater than <code>keepalive_time_when</code>. At that point we will send a probe, and will set the timer to <code>keepalive_intvl_when</code>, which is 5 seconds. When this timer expires, if nothing has been received for the last 1 second (<code>keepalive_time_when</code>), we will send a probe, and then set the timer again to <code>keepalive_intvl_when</code>, and wake up in another 5 seconds, and so on. However, if we have received something within <code>keepalive_time_when</code> when the timer expires, it will use <code>keepalive_time_when</code> to reschedule the timer for 1 second since the last time we received anything. So, to answer your question, the linux implementation of TCP keepalive assumes that <code>keepalive_intvl</code> is less than <code>keepalive_time</code>, but nevertheless works "sensibly."

TCP keep-alive parameters not being honoured

Tags:

linux

networking

tcp

sockets

I am experimenting with TCP keep alive on my Linux box, and have written the following small server:

#include <iostream>
#include <cstring>

#include <netinet/in.h>
#include <arpa/inet.h>  // inet_ntop
#include <netinet/tcp.h>
#include <netdb.h>          // addrinfo stuff

using namespace std;

typedef int SOCKET;

int main(int argc, char *argv []) 
{
    struct sockaddr_in sockaddr_IPv4;
    memset(&sockaddr_IPv4, 0, sizeof(struct sockaddr_in));
    sockaddr_IPv4.sin_family = AF_INET;
    sockaddr_IPv4.sin_port = htons(58080);

    if (inet_pton(AF_INET, "10.6.186.24", &sockaddr_IPv4.sin_addr) != 1)
        return -1;

    SOCKET serverSock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);

    if (bind(serverSock, (sockaddr*)&sockaddr_IPv4, sizeof(sockaddr_IPv4)) != 0 || listen(serverSock, SOMAXCONN) != 0) 
    { 
        cout << "Failed to setup listening socket!\n";
    }

    SOCKET clientSock = accept(serverSock, 0, 0);
    if (clientSock == -1) 
        return -1;

    // Enable keep-alive on the client socket
    const int nVal = 1;
    if (setsockopt(clientSock, SOL_SOCKET, SO_KEEPALIVE, &nVal, sizeof(nVal)) < 0)
    {
        cout << "Failed to set keep-alive!\n";
        return -1;
    }

    // Get the keep-alive options that will be used on the client socket

    int nProbes, nTime, nInterval;
    socklen_t nOptLen = sizeof(int);
    bool bError = false;

    if (getsockopt(clientSock, IPPROTO_TCP, TCP_KEEPIDLE, &nTime, &nOptLen) < 0) { bError = true; }
    nOptLen = sizeof(int);

    if (getsockopt(clientSock, IPPROTO_TCP, TCP_KEEPCNT, &nProbes, &nOptLen) < 0) {bError = true; }
    nOptLen = sizeof(int);

    if (getsockopt(clientSock, IPPROTO_TCP, TCP_KEEPINTVL, &nInterval, &nOptLen) < 0) { bError = true; }

    cout << "Keep alive settings are: time: " << nTime << ", interval: " << nInterval << ", number of probes: " << nProbes << "\n";

    if (bError) 
    {
        // Failed to retrieve values
        cout << "Failed to get keep-alive options!\n";
        return -1;
    }

    int nRead = 0;
    char buf[128];
    do 
    {
        nRead = recv(clientSock, buf, 128, 0);
    } while (nRead != 0);


    return 0;
}

I then adjusted the system-wide TCP keep alive settings to be as follows:

# cat /proc/sys/net/ipv4/tcp_keepalive_time
20
# cat /proc/sys/net/ipv4/tcp_keepalive_intvl
30

I then connected to my server from Windows, and ran a Wireshark trace to see the keep-alive packets. The image below shows the result.

Packets 1

This confused me, since I now understand the keep-alive interval to only come into play if no ACK is received in response to the original keep alive packet (see my other question here). So I would expect the subsequent packets to be consistently sent at 20 second intervals (not 30, which is what we see), not just the first one.

I then adjusted the system wide settings as follows:

# cat /proc/sys/net/ipv4/tcp_keepalive_time
30
# cat /proc/sys/net/ipv4/tcp_keepalive_intvl
20

This time when I connect, I see the following in my Wireshark trace:

Packets2

Now we see that the first keep-alive packet is sent after 30 seconds, but each one thereafter is also sent at 30 seconds, not the 20 as would be suggested by the previous run!

Can someone please explain this inconsistent behaviour?

667

asked Mar 15 '17 17:03

Wad

1 Answers

Roughly speaking, how it is supposed to work is that a keepalive message will be sent every tcp_keepalive_time seconds. If an ACK is not recieved, it will then probe every tcp_keepalive_intvl seconds. If an ACK is not received after tcp_keepalive_probes, the connection will be aborted. Thus, a connection will be aborted after at most

    tcp_keepalive_time + tcp_keepalive_probes * tcp_keepalive_intvl

seconds without a response. See this kernel documentation.

We can easily watch this work using netcat keepalive, a version of netcat that allows us to set tcp keepalive parameters (The sysctl keepalive parameters are the default, but they can be overriden on a per socket basis in the tcp_sock struct).

First start up a server listening on port 8888 with keepalive_timer set to 5 seconds, keepalive_intval set to 1 second, and keepalive_probes set to 4.

    $ ./nckl-linux -K -O 5 -I 1 -P 4 -l 8888 >/dev/null &

Next, let's use iptables to introduce loss for ACK packets sent to the server:

    $ sudo iptables -A OUTPUT -p tcp --dport 8888 \
    >   --tcp-flags SYN,ACK,RST,FIN ACK \
    >   -m statistic --mode random --probability 0.5 \
    >   -j DROP

This will cause packets that are sent to TCP port 8888 with just the ACK flag set to be dropped with probability 0.5.

Now let's connect and watch with the vanilla netcat (which will use the sysctl keepalive values):

    $ nc localhost 8888

Here is the capture:

TCP keepalive capture

As you can see, it waits 5 seconds after receiving an ACK before sending another keepalive message. If it doesn't receive an ACK within 1 second, it sends another probe, and if it doesn't receive an ACK after 4 probes, it aborts the connection. This is exactly how keepalive is supposed to work.

So let's try to reproduce what you were seeing. Let's delete the iptables rule (no loss), start a new server with tcp_keepalive_time set to 1 second, and tcp_keepalive_intvl set to 5 seconds, and then connect with a client. Here is the result:

Capture with keepalive_time < keepalive_intvl, no loss

Interestingly, we see the same behavior you did: after the first ACK, it waits 1 second to send a keepalive message, and thereafter every 5 seconds.

Let's add the iptables rule back in to introduce loss to see what time it actually waits to send another probe if it doesn't get an ACK (using -K -O 1 -I 5 -P 4 on the server):

Capture with keepalive_time < keepalive_intvl, with loss

Again, it waits 1 second from the first ACK to send a keepalive message, but thereafter it waits 5 seconds whether it sees an ACK or not, as if keepalive_time and keepalive_intvl are both set to 5.

In order to understand this behavior, we will need to take a look at the linux kernel TCP implementation. Let's first look at tcp_finish_connect:

 if (sock_flag(sk, SOCK_KEEPOPEN))
        inet_csk_reset_keepalive_timer(sk, keepalive_time_when(tp));

When the TCP connection is established, the keepalive timer is effectively set to tcp_keepalive_time, which is 1 second in our case.

Next, let's take a look at how the timer is processed in tcp_keepalive_timer:

  elapsed = keepalive_time_elapsed(tp);

  if (elapsed >= keepalive_time_when(tp)) {
          /* If the TCP_USER_TIMEOUT option is enabled, use that
           * to determine when to timeout instead.
           */
          if ((icsk->icsk_user_timeout != 0 &&
              elapsed >= icsk->icsk_user_timeout &&
              icsk->icsk_probes_out > 0) ||
              (icsk->icsk_user_timeout == 0 &&
              icsk->icsk_probes_out >= keepalive_probes(tp))) {
                  tcp_send_active_reset(sk, GFP_ATOMIC);
                  tcp_write_err(sk);
                  goto out;
          }
          if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) {
                  icsk->icsk_probes_out++;
                  elapsed = keepalive_intvl_when(tp);
          } else {
                  /* If keepalive was lost due to local congestion,
                   * try harder.
                   */
                  elapsed = TCP_RESOURCE_PROBE_INTERVAL;
          }
  } else {
          /* It is tp->rcv_tstamp + keepalive_time_when(tp) */
          elapsed = keepalive_time_when(tp) - elapsed;
  }

  sk_mem_reclaim(sk);

resched:
  inet_csk_reset_keepalive_timer (sk, elapsed);
  goto out;

When keepalive_time_when is greater than keepalive_itvl_when this code works as expected. However, when it is not, you see the behavior you observed.

When the initial timer (set when the TCP connection is established) expires after 1 second, we will extend the timer until elapsed is greater than keepalive_time_when. At that point we will send a probe, and will set the timer to keepalive_intvl_when, which is 5 seconds. When this timer expires, if nothing has been received for the last 1 second (keepalive_time_when), we will send a probe, and then set the timer again to keepalive_intvl_when, and wake up in another 5 seconds, and so on.

However, if we have received something within keepalive_time_when when the timer expires, it will use keepalive_time_when to reschedule the timer for 1 second since the last time we received anything.

So, to answer your question, the linux implementation of TCP keepalive assumes that keepalive_intvl is less than keepalive_time, but nevertheless works "sensibly."

174

answered Sep 20 '22 14:09

Jim D.

Related questions
                            
                                Check if all lines of a file are contained in another file
                            
                                How to cross-compile with MinGW on Linux for Windows?
                            
                                Run a shell command from a variable in a shell script
                            
                                docker container started in Detached mode stopped after process execution
                            
                                Why is my Python script not writing to file when it is backgrounded it in Linux?
                            
                                Does GIT STASH persist even after a computer shutdown? [duplicate]
                            
                                Command line arguments validation with GetOpts and mandatory parameters
                            
                                Copying string from argv to char array in C
                            
                                How to find the timestamp of the latest modified file in a directory (recursively)?
                            
                                Convert from CMYK to RGB
                            
                                Force CMake to use the full library path
                            
                                why sibling list is used to get the task_struct while fetching the children of a process
                            
                                Accessing memory after shm_unlink
                            
                                In bash, should I unset a local variable inside a function?
                            
                                How to narrow down perf.data to a time sub interval
                            
                                Run Laravel Commands with www-data user
                            
                                LFTP - Create directory if it does not exist
                            
                                Missing php_soap.dll in Ubuntu 16
                            
                                how to execute an local script in remote server with parameters
                            
                                ELF program header virtual address and file offset

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With