I use TCP Keep-Alive option to detect dead connection. It works well with connection that use reading sockets:
setsockopt(mysock,...) // set various keep alive options
epoll_ctl(ep,mysock,{EPOLLIN|EPOLERR|EPOLLHUP},)
epoll_wait -> (exits after several seconds when remove host disconnects cable)
Epoll wait exits with EPOLLIN|EPOLLHUP on socket without a problem.
However if I try to write a lot to socket till I get EAGAIN and then poll for both reading and writing I don't get a error when I disconnect the cable:
setsockopt(mysock,...) // set various keep alive options
while(send() != EAGAIN)
;
epoll_ctl(ep,mysock,{EPOLLIN|EPOLLOUT|EPOLERR|EPOLLHUP},)
epoll_wait -> --- Never exits!!!! even when the cable of the remove host is disconnected!!!
Edit: Additional Information
When I monitor the communication with wireshark, in the first case (of reading) I get once in several seconds request for ack. But in the second case I don't detect ones at all.
The SO_KEEPALIVE socket option is valid only for protocols that support the notion of keep-alive (connection-oriented protocols). For TCP, the default keep-alive timeout is 2 hours and the keep-alive interval is 1 second. The default number of keep-alive probes varies based on the version of Windows.
In TCP, the keepalive is the administrative packet sent to detect stale connection. In HTTP, keepalive means the persistent connection state. This is from TCP specification, Keep-alive packets MUST only be sent when no data or acknowledgement packets have been received for the connection within an interval.
They are used for window size updates detection. Wireshark treats them as keep-alive packets just because these packets look like keep-alive packet. A TCP keep-alive packet is simply an ACK with the sequence number set to one less than the current sequence number for the connection.
Linux has built-in support for keepalive. You need to enable TCP/IP networking in order to use it. You also need procfs support and sysctl support to be able to configure the kernel parameters at runtime.
If you pull the network connection before all the data is transmitted, then the connection is not idle and thus in some implementations the keepalive timer does not start. (Keep in mind that keepalive is NOT part of the TCP specification and as a result it is implemented inconsistently if at all.) In general, because of the combination of exponential backoff and large number of retries (tcp_retries2
defaults to 15) it can take up to 30 minutes for transmission retries to time out before the keepalive timer starts.
The workaround, if there is one, depends on the particular TCP implementation you are using. Some newer versions of Linux (kernel version 2.6.37 released 4 January, 2011) implement TCP_USER_TIMEOUT. More info here.
The usual recommendation is to implement communication timeouts at the application level rather than use TCP-based keepalive anyway. See, for example, HTTP Keep-Alive.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With