I have a Hibernate, Spring, Debian, Tomcat, MySql stack on a Linode server in production with some clients. Its a Spring-Multitenant application that hosts webpages for about 30 clients.
The applications starts fine, then after a while, Im getting this error:
java.net.SocketException: Too many open files
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
at java.net.ServerSocket.implAccept(ServerSocket.java:453)
at java.net.ServerSocket.accept(ServerSocket.java:421)
at org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
at java.lang.Thread.run(Thread.java:662)
Before this error is thrown however, nagios alerts me that pings to the server stop responding.
Previously, I had nginx as a proxy, and was getting this nginx errors per request instead, and had to restart tomcat anyway:
2014/04/21 12:31:28 [error] 2259#0: *2441630 no live upstreams while connecting to upstream, client: 66.249.64.115, server: abril, request: "GET /catalog.do?op=requestPage&selectedPage=-195&category=2&offSet=-197&page=-193&searchBox= HTTP/1.1", upstream: "http://appcluster/catalog.do?op=requestPage&selectedPage=-195&category=2&offSet=-197&page=-193&searchBox=", host: "www.anabocafe.com"
2014/04/21 12:31:40 [error] 2259#0: *2441641 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 200.74.195.61, server: abril, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8080/", host: "www.oli-med.com"
This is my server.xml Connector configuration:
<Connector port="80" protocol="HTTP/1.1"
maxHttpHeaderSize="8192"
maxThreads="500" minSpareThreads="250"
enableLookups="false" redirectPort="8443" acceptCount="100"
connectionTimeout="20000" disableUploadTimeout="true"
acceptorThreadCount="2" />
I tried changing the ulimit using this tutorial I was able to change the hard limit for opened file descriptors for the user running tomcat, but it did not fix the problem, the application still hangs.
The last time that I had to restart the server, It ran for about 3 hours, I had these values for socked opened connections:
lsof -p TOMCAT_PID | wc -l
632 (more or less!! i did not write the exact number)
This numbers suddenly starts growing.
I have some applications very similar to this one on other servers, the difference is that they are a Stand Alone version and this is a Multitenant architecture, I notice that in this app Im getting these kind of socket connections, which don't occur in the Stand Alone version of any of the other installations:
java 11506 root 646u IPv6 136862 0t0 TCP lixxx-xxx.members.linode.com:www->180.76.6.16:49545 (ESTABLISHED)
java 11506 root 647u IPv6 136873 0t0 TCP lixxx-xxx.members.linode.com:www->50.31.164.139:37734 (CLOSE_WAIT)
java 11506 root 648u IPv6 135889 0t0 TCP lixxx-xxx.members.linode.com:www->ec2-54-247-188-179.eu-west-1.compute.amazonaws.com:28335 (CLOSE_WAIT)
java 11506 root 649u IPv6 136882 0t0 TCP lixxx-xxx.members.linode.com:www->ec2-54-251-34-67.ap-southeast-1.compute.amazonaws.com:19023 (CLOSE_WAIT)
java 11506 root 650u IPv6 136884 0t0 TCP lixxx-xxx.members.linode.com:www->crawl-66-249-75-113.googlebot.com:39665 (ESTABLISHED)
java 11506 root 651u IPv6 136886 0t0 TCP lixxx-xxx.members.linode.com:www->190.97.240.116.viginet.com.ve:1391 (ESTABLISHED)
java 11506 root 652u IPv6 136887 0t0 TCP lixxx-xxx.members.linode.com:www->ec2-50-112-95-211.us-west-2.compute.amazonaws.com:19345 (ESTABLISHED)
java 11506 root 653u IPv6 136889 0t0 TCP lixxx-xxx.members.linode.com:www->ec2-54-248-250-232.ap-northeast-1.compute.amazonaws.com:51153 (ESTABLISHED)
java 11506 root 654u IPv6 136897 0t0 TCP lixxx-xxx.members.linode.com:www->baiduspider-180-76-5-149.crawl.baidu.com:31768 (ESTABLISHED)
java 11506 root 655u IPv6 136898 0t0 TCP lixxx-xxx.members.linode.com:www->msnbot-157-55-32-60.search.msn.com:35100 (ESTABLISHED)
java 11506 root 656u IPv6 136900 0t0 TCP lixxx-xxx.members.linode.com:www->50.31.164.139:47511 (ESTABLISHED)
java 11506 root 657u IPv6 135924 0t0 TCP lixxx-xxx.members.linode.com:www->ec2-184-73-237-85.compute-1.amazonaws.com:28206 (ESTABLISHED)
They are some kind of automated connections I guess.
So my question is:
How can I determine if the problem is because of my code, server, or some kind of attack and which approach would you recommend to figure this out ?
Thank you in Advance :)
Ok, it turns out that the problem was the jdbc connection settings, has maxActive set to 20 connections, I changed the limit to 200 and the problem stopped.
The way I figured this was the problem was thanks to appdynamics.com´s wonderful tool, which lets you check a great deal of metrics in the ApplicationInfraestructurePerformance metrics.
Also, found this wonderful article about the subject which helped me tune my app:
http://www.tomcatexpert.com/blog/2010/04/01/configuring-jdbc-pool-high-concurrency
the oficial documentation also helped:
https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html.
I guess that the arriving connections started a query which collapsed the server response ability first, and afterwards filled the OS socket limits, in linux, open socket are open files. I hope this helps someone !
EDIT
Hello! This solution solved the issue in the short term, but another error regarding the JDBC connection appeared, the application was not closing the connections, I opened and solved a ticket regarding that issue here
Have you checked your ulimit
for the user that is running tomcat?
Linux has a limit of 1024 open files by default.
More on
How do I change the number of open files limit in Linux?
There is a possibility you have too many connections in the configs or for some reason you are not properly closing some IO streams(higly unlikely).
I would approach it by increasing the ulimit
and then run some load test to see what is spiking the file usage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With