Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java.net.SocketException: Too many open files Spring Hibernate Tomcat

I have a Hibernate, Spring, Debian, Tomcat, MySql stack on a Linode server in production with some clients. Its a Spring-Multitenant application that hosts webpages for about 30 clients.

The applications starts fine, then after a while, Im getting this error:

java.net.SocketException: Too many open files
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
at java.net.ServerSocket.implAccept(ServerSocket.java:453)
at java.net.ServerSocket.accept(ServerSocket.java:421)
at org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
at java.lang.Thread.run(Thread.java:662)

Before this error is thrown however, nagios alerts me that pings to the server stop responding.

Previously, I had nginx as a proxy, and was getting this nginx errors per request instead, and had to restart tomcat anyway:

2014/04/21 12:31:28 [error] 2259#0: *2441630 no live upstreams while connecting to upstream, client: 66.249.64.115, server: abril, request: "GET /catalog.do?op=requestPage&selectedPage=-195&category=2&offSet=-197&page=-193&searchBox= HTTP/1.1", upstream: "http://appcluster/catalog.do?op=requestPage&selectedPage=-195&category=2&offSet=-197&page=-193&searchBox=", host: "www.anabocafe.com"
2014/04/21 12:31:40 [error] 2259#0: *2441641 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 200.74.195.61, server: abril, request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8080/", host: "www.oli-med.com"

This is my server.xml Connector configuration:

<Connector port="80" protocol="HTTP/1.1"
                maxHttpHeaderSize="8192"
               maxThreads="500" minSpareThreads="250"
               enableLookups="false" redirectPort="8443" acceptCount="100"
               connectionTimeout="20000" disableUploadTimeout="true"
               acceptorThreadCount="2" />

I tried changing the ulimit using this tutorial I was able to change the hard limit for opened file descriptors for the user running tomcat, but it did not fix the problem, the application still hangs.

The last time that I had to restart the server, It ran for about 3 hours, I had these values for socked opened connections:

lsof -p TOMCAT_PID | wc -l
632 (more or less!! i did not write the exact number)

This numbers suddenly starts growing.

I have some applications very similar to this one on other servers, the difference is that they are a Stand Alone version and this is a Multitenant architecture, I notice that in this app Im getting these kind of socket connections, which don't occur in the Stand Alone version of any of the other installations:

java    11506 root  646u  IPv6 136862      0t0     TCP lixxx-xxx.members.linode.com:www->180.76.6.16:49545 (ESTABLISHED)
java    11506 root  647u  IPv6 136873      0t0     TCP lixxx-xxx.members.linode.com:www->50.31.164.139:37734 (CLOSE_WAIT)
java    11506 root  648u  IPv6 135889      0t0     TCP lixxx-xxx.members.linode.com:www->ec2-54-247-188-179.eu-west-1.compute.amazonaws.com:28335 (CLOSE_WAIT)
java    11506 root  649u  IPv6 136882      0t0     TCP lixxx-xxx.members.linode.com:www->ec2-54-251-34-67.ap-southeast-1.compute.amazonaws.com:19023 (CLOSE_WAIT)
java    11506 root  650u  IPv6 136884      0t0     TCP lixxx-xxx.members.linode.com:www->crawl-66-249-75-113.googlebot.com:39665 (ESTABLISHED)
java    11506 root  651u  IPv6 136886      0t0     TCP lixxx-xxx.members.linode.com:www->190.97.240.116.viginet.com.ve:1391 (ESTABLISHED)
java    11506 root  652u  IPv6 136887      0t0     TCP lixxx-xxx.members.linode.com:www->ec2-50-112-95-211.us-west-2.compute.amazonaws.com:19345 (ESTABLISHED)
java    11506 root  653u  IPv6 136889      0t0     TCP lixxx-xxx.members.linode.com:www->ec2-54-248-250-232.ap-northeast-1.compute.amazonaws.com:51153 (ESTABLISHED)
java    11506 root  654u  IPv6 136897      0t0     TCP lixxx-xxx.members.linode.com:www->baiduspider-180-76-5-149.crawl.baidu.com:31768 (ESTABLISHED)
java    11506 root  655u  IPv6 136898      0t0     TCP lixxx-xxx.members.linode.com:www->msnbot-157-55-32-60.search.msn.com:35100 (ESTABLISHED)
java    11506 root  656u  IPv6 136900      0t0     TCP lixxx-xxx.members.linode.com:www->50.31.164.139:47511 (ESTABLISHED)
java    11506 root  657u  IPv6 135924      0t0     TCP lixxx-xxx.members.linode.com:www->ec2-184-73-237-85.compute-1.amazonaws.com:28206 (ESTABLISHED)

They are some kind of automated connections I guess.

So my question is:

How can I determine if the problem is because of my code, server, or some kind of attack and which approach would you recommend to figure this out ?

Thank you in Advance :)

like image 421
Ernest Avatar asked Apr 23 '14 18:04

Ernest


2 Answers

Ok, it turns out that the problem was the jdbc connection settings, has maxActive set to 20 connections, I changed the limit to 200 and the problem stopped.

The way I figured this was the problem was thanks to appdynamics.com´s wonderful tool, which lets you check a great deal of metrics in the ApplicationInfraestructurePerformance metrics.

Also, found this wonderful article about the subject which helped me tune my app:

http://www.tomcatexpert.com/blog/2010/04/01/configuring-jdbc-pool-high-concurrency

the oficial documentation also helped:

https://tomcat.apache.org/tomcat-7.0-doc/jdbc-pool.html.

I guess that the arriving connections started a query which collapsed the server response ability first, and afterwards filled the OS socket limits, in linux, open socket are open files. I hope this helps someone !

EDIT

Hello! This solution solved the issue in the short term, but another error regarding the JDBC connection appeared, the application was not closing the connections, I opened and solved a ticket regarding that issue here

like image 109
Ernest Avatar answered Nov 07 '22 15:11

Ernest


Have you checked your ulimit for the user that is running tomcat? Linux has a limit of 1024 open files by default. More on How do I change the number of open files limit in Linux?

There is a possibility you have too many connections in the configs or for some reason you are not properly closing some IO streams(higly unlikely).

I would approach it by increasing the ulimit and then run some load test to see what is spiking the file usage.

like image 2
Mite Mitreski Avatar answered Nov 07 '22 16:11

Mite Mitreski