Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where does the '{tcp_error, Socket, etimedout}' message of an active socket come from?

Our (Linux) server used the option {active, once} with it's sockets, and there were {tcp_error, Socket, etimedout} messages poping up. I know this may be caused by bad network conditions, but there was something strange about it.

TCP keepalive was enabled system-wide on our machine, and the actual option values were:

net.ipv4.tcp_keepalive_time = 1200
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 75

Which means the sockets would timeout in at least 20 minutes, I believe. But strangely, our processes received {tcp_error, Socket, etimedout} in less than 10 seconds.

I was wondering, counld it be triggered by the gen_tcp:send(...) operations? And then I found it impossible because the send operations were all synchronous, they'd fail instantly.

So, my question is, where did the etimedout message come from? Or what triggered it exactly? I goofed around the C source of Erlang VM, especially inet_drv.c, but no conclusion yet.

Thanks.

like image 417
l04m33 Avatar asked Nov 20 '13 10:11

l04m33


1 Answers

A tcpdump capture showed that it was the timeout event from TCP retransmissions.

Our server machine had /proc/sys/net/ipv4/tcp_retries2 set to 5, which would lead to disconnection in 5 retransmissions, while this value defaults to 15 on developer machines, so we couldn't reproduce the problem locally.

Returning from gen_tcp:send(...) (or equivalent APIs in other languages) only means that the packet is accepted by the TCP stack, but there's no guarantee that it could reach the peer, and errors may bail out when you're blocked on other operations.

Found some brief description about TCP retransmissions here.

like image 163
l04m33 Avatar answered Oct 15 '22 09:10

l04m33