SO_KEEPALIVE does not work during a call to write()?

Tags:

I'm developing a socket application, which must must be to be robust to network failures.

The application has 2 running threads, one waiting messages from the socket (a read() loop) and the other send messages to the socket (a write() loop).

I'm currently trying to use SO_KEEPALIVE to handle the network failures. It works ok if I'm only blocked on read(). A few seconds after the connection is lost (network cable removed), read() will fail with the message 'Connection timed out'.

But, if I try to wrte() after the network is disconnected (and before the timeout ends), both write() and read() will block forever, without error.

This is a stripped sample code which directs stdin/stdout to the socket. It listens on port 5656:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/types.h> 
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>

int socket_fd;

void error(const char *msg) {
    perror(msg);
    exit(1);
}

//Read from stdin and write to socket
void* write_daemon (void* _arg) {
    while (1) {
        char c;
        int ret = scanf("%c", &c);
        if (ret <= 0) error("read from stdin");
        int ret2 = write(socket_fd, &c, sizeof(c));
        if (ret2 <= 0) error("write to socket");
    }
    return NULL;
}

//Read from socket and write to stdout
void* read_daemon (void* _arg) {
    while (1) {
        char c;
        int ret = read(socket_fd, &c, sizeof(c));
        if (ret <= 0) error("read from socket");
        int ret2 = printf("%c", c);
        if (ret2 <= 0) error("write to stdout");
    }
    return NULL;
}


//Enable and configure KEEPALIVE - To detect network problems quickly
void config_socket() {
    int enable_no_delay   = 1;
    int enable_keep_alive = 1;
    int keepalive_idle     =1; //Very short interval. Just for testing
    int keepalive_count    =1;
    int keepalive_interval =1;
    int result;

    //=> http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/#setsockopt
    result = setsockopt(socket_fd, SOL_SOCKET, SO_KEEPALIVE, &enable_keep_alive, sizeof(int));
    if (result < 0)
        error("SO_KEEPALIVE");

    result = setsockopt(socket_fd, SOL_TCP, TCP_KEEPIDLE, &keepalive_idle, sizeof(int));
    if (result < 0) 
        error("TCP_KEEPIDLE");

    result = setsockopt(socket_fd, SOL_TCP, TCP_KEEPINTVL, &keepalive_interval, sizeof(int));
    if (result < 0) 
        error("TCP_KEEPINTVL");

    result = setsockopt(socket_fd, SOL_TCP, TCP_KEEPCNT, &keepalive_count, sizeof(int));
    if (result < 0) 
        error("TCP_KEEPCNT");
}

int main(int argc, char *argv[]) {
    //Create Server socket, bound to port 5656
    int listen_socket_fd;
    int tr=1;
    struct sockaddr_in serv_addr, cli_addr;
    socklen_t clilen = sizeof(cli_addr);
    pthread_t write_thread, read_thread;

    listen_socket_fd = socket(AF_INET, SOCK_STREAM, 0);
    if (listen_socket_fd < 0)
        error("socket()");

    if (setsockopt(listen_socket_fd,SOL_SOCKET,SO_REUSEADDR,&tr,sizeof(int)) < 0)
        error("SO_REUSEADDR");

    bzero((char *) &serv_addr, sizeof(serv_addr));
    serv_addr.sin_family = AF_INET;
    serv_addr.sin_addr.s_addr = INADDR_ANY;
    serv_addr.sin_port = htons(5656);
    if (bind(listen_socket_fd, (struct sockaddr *) &serv_addr, sizeof(serv_addr)) < 0)
        error("bind()");

    //Wait for client socket
    listen(listen_socket_fd,5);
    socket_fd = accept(listen_socket_fd, (struct sockaddr *) &cli_addr, &clilen);
    config_socket();
    pthread_create(&write_thread, NULL, write_daemon, NULL);
    pthread_create(&read_thread , NULL, read_daemon , NULL);
    close(listen_socket_fd);
    pthread_exit(NULL);
}

To reproduce the error, use telnet 5656. If will exit after a couple os seconds after the connection is lost, unless I try to write something in the terminal. In this case, it will block forever.

So, the questions are: what's wrong? how to fix it? Are there other alternatives?

Thanks!

I've tried using Wireshark to inspect the network connection. If I don't call write(), I can see TCP keep-alive packages being sent and the connection is close after a few seconds.

If, instead, I try to write(), it stops sending the Keep-Alive packets, and starts sending TCP retransmissions instead (It seems OK to me). The problem is, the time between the retransmissions grows bigger and bigger after each failure, and it seems to never give-up and close the socket.

Is there a way to set the maximum number of retransmissions, or anything similar? Thanks

641

asked Oct 14 '11 14:10

Paulo Costa

2 Answers

I have found the TCP_USER_TIMEOUT socket option (rfc5482), which closes the connection if the sent data is not ACK'ed after the specified interval.

It works fine for me =)

//defined in include/uapi/linux/tcp.h (since Linux 2.6.37)
#define TCP_USER_TIMEOUT 18

int tcp_timeout        =10000; //10 seconds before aborting a write()

result = setsockopt(socket_fd, SOL_TCP, TCP_USER_TIMEOUT, &tcp_timeout, sizeof(int));
if (result < 0) 
    error("TCP_USER_TIMEOUT");

Yet, I feel I shouldn't have to use both SO_KEEP_ALIVE and TCP_USER_TIMEOUT. Maybe it's bug somewhere?

answered Sep 28 '22 08:09

Paulo Costa

Not sure if someone else will give you a better alternative, but in several projects I've been involved with, we've run into very similar situations.

For us the solution was to simply take control into your own hands and not rely on underlying OS/drivers to tell you when connection dies. If you control both client and server sides, you could introduce your own ping messages which bounce between the client and the server. This way you can a) control your own connection timeouts and b) easily keep a record indicating the health of the connection.

In the most recent application, we've hid these pings as in-band control messages within the communication library itself so as far as actual client/server application code was concerned, connection timeouts just worked.

answered Sep 28 '22 09:09

DXM

Related questions
                            
                                C++ standards, am a little confused?
                            
                                Linux pipe : Capturing realtime output of ping via popen
                            
                                Binary tree implementation in C question as found in K&R
                            
                                How can I install and use compilers for embedded C on an external server?
                            
                                Can a C implementation implicitly include standard headers when including a different header?
                            
                                Segmentation fault using OpenMp and SSE
                            
                                redirecting standard output in c then resetting standard output
                            
                                How to use pthreads barrier?
                            
                                Assignment or memcpy? What is the preferred approach to setting an array member variable?
                            
                                Is it possible to get pointer to the 'this' structure, when using designated initializer?
                            
                                Memory Protection without MMU
                            
                                Most respected language and free compiler for creating hobby operating systems? [closed]
                            
                                Safe maximum number of records read by fread
                            
                                Does anyone have a working GMP + MINGW installation?
                            
                                Replacement for the ioctl() function
                            
                                ASCII character from VK_Code
                            
                                Hash Table lookup - with perfect hash, in C
                            
                                How are function arguments passed in C?
                            
                                How do I ensure that two types have the same size?
                            
                                GCC fastcall function definition

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SO_KEEPALIVE does not work during a call to write()?

Tags:

c

sockets

keep-alive

Paulo Costa

People also ask

2 Answers

Paulo Costa

DXM

Recent Activity

Donate For Us