Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

High number of TIME_WAITs in Haproxy

We have haproxy 1.3.26 hosted on CentOS 5.9 machine having 2.13 GHz Intel Xeon processor which is acting as a http & tcp load balancer for numerous services, serving a peak throughput of ~2000 requests/second. It has been running fine for 2 years but gradually both traffic and number of services are increasing.

Off late we've observed that even after reload old haproxy process remains. On further investigation we found that old process has numerous connections in TIME_WAIT state. We also saw that netstat and lsof were taking a long long time. On referring http://agiletesting.blogspot.in/2013/07/the-mystery-of-stale-haproxy-processes.html we introduced option forceclose but it was messing up with various monitoring service hence reverted it. On further digging we realised that in /proc/net/sockstat close to 200K sockets are in tw (TIME_WAIT) state which is surprising as in /etc/haproxy/haproxy.cfg maxconn has been specified as 31000 and ulimit-n as 64000. We had timeout server and timeout client as 300s which we changed to 30s but not much use.

Now the doubts are :-

  • Whether such a high number of TIME_WAITs is acceptable. If yes whats a number after which we should be worried. Looking at What is the cost of many TIME_WAIT on the server side? and Setting TIME_WAIT TCP seems there shouldn't be any issue.
  • How to decrease these TIME_WAITs
  • Are there any alternatives to netstat and lsof which will perform fine even if there are very high number of TIME_WAITs
like image 595
pseudonym Avatar asked Jan 12 '23 18:01

pseudonym


1 Answers

Note: The quotes in this answer are all from a mail by Willy Tarreau (the main author of HAProxy) to the HAProxy mailinglist.

Connections in TIME_WAIT state are harmless and don't really consume any resources anymore. They are kept by the kernel on a server for some time for the rare event that it still receives a package after the connection was closed. The default time a closed connection is held in that state is typically 120 seconds (or 2 times the maximum segment lifetime)

TIME_WAIT are harmless on the server side. You can easily reach millions without any issues.

If you still want to reduce that number to release connections earlier, you can instruct the kernel to do so. To e.g. set it to 30 seconds execute this:

echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout

If you have many connections (either in TIME_WAIT or not), netstat, lsof, ipcs perform very poorly and actually slow the whole system down. To quote Willy again:

There are two commands that you must absolutely never use in a monitoring system :

  • netstat -a
  • ipcs -a

Both of them will saturate the system and considerably slow it down when something starts to go wrong. For the sockets you should use what's in /proc/net/sockstat. You have all the numbers you want. If you need more details, use ss -a instead of netstat -a, it uses the netlink interface and is several orders of magnitude faster.

On Debian and Ubuntu systems, ss is available in the iproute or iproute2 package (depending on the version of your distribution).

like image 55
Holger Just Avatar answered Jan 16 '23 01:01

Holger Just