Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Too many open files error but lsof shows a legal number of open files

Tags:

java

linux

My Java program is failing with

Caused by: java.io.IOException: Too many open files
        at java.io.UnixFileSystem.createFileExclusively(Native Method)
        at java.io.File.createNewFile(File.java:883)...

Here are key lines from /etc/security/limits.conf. They set the max files for a user at 500k:

root                     soft    nofile          500000
root                     hard    nofile          500000
*                        soft    nofile          500000
*                        hard    nofile          500000

I ran lsof to to count the number of files open -- both globally and by the jvm process. I examined counters in /proc/sys/fs. All seems OK. My process only has 4301 files open and the limit is 500k:

:~# lsof | wc -l
5526
:~# lsof -uusername | wc -l
4301
:~# cat /proc/sys/fs/file-max
744363
:~# cat /proc/sys/fs/file-max
744363
:~# cat /proc/sys/fs/file-nr
4736    0       744363

This is an Ubuntu 11.04 server. I have even rebooted so I am positive these parameters are being used.

I don't know if it's relevant, but the process is started by an upstart script, which starts the process using setuidgid, like this:

exec setuidgid username java $JAVA_OPTS -jar myprogram.jar

What I am missing?

like image 427
hughw Avatar asked Jan 25 '12 23:01

hughw


1 Answers

It turns out the problem was that my program was running as an upstart init script, and that the exec stanza does not invoke a shell. ulimit and the settings in limits.conf apply only to user processes in a shell.

I verified this by changing the exec stanza to

exec sudo -u username java $JAVA_OPTS -jar program.jar

which runs java in username's default shell. That allowed the program to use as many open files as it needs.

I have seen it mentioned that you can also call ulimit -n prior to invoking the command; for an upstart script I think you would use a script stanza instead.

I found a better diagnostic than lsof to be ls /proc/{pid}/fd | wc -l, to obtain a precise count of the open file descriptor. By monitoring that I could see that the failures occurred right at 4096 open fds. I don't know where that 4096 comes from; it's not in /etc anywhere; I guess it's compiled into the kernel.

like image 63
hughw Avatar answered Oct 20 '22 07:10

hughw