I have a Java program which will throw 'Too many open files' error after running for about 3 minutes. Increasing the open file limit doesn't work, because it still uses up all the limit, just slower. So there is something wrong with my program and I need to find out.
Here is what I did, 10970
is the pid
cat /proc/10970/fd
and find out most of them are pipeslsof -p 10970 | grep FIFO
to list all pipes and find about 450 pipesjava 10970 service 1w FIFO 0,8 0t0 5890 pipe
java 10970 service 2w FIFO 0,8 0t0 5890 pipe
java 10970 service 169r FIFO 0,8 0t0 2450696 pipe
java 10970 service 201r FIFO 0,8 0t0 2450708 pipe
But I don't know how to continue. 0,8
in the output above means device numbers. How can I find devices with these numbers?
Update
The program is a TCP server and receiving socket connections from client and processing messages. I have two environments. In Production environment it works fine, but in Test environment it has this issue recently. In Production environment I don't see so many pipes. The code and infrastructure of these two environments are same, both managed by Chef.
To kill all the processes running on a particular port we run the following command kill -9 $(lsof -t -i :PORT_NUMBER) . In the following example we are killing all the processes running on port 80.
But I don't know how to continue.
What you need to do is to identify the place or places in your Java code where you are opening these pipes ... and make sure that they are always closed when you are done with them.
The best way to ensure that the pipes are closed is to explicitly close them when you are done with them. For example (using input streams instead of sockets ...):
InputStream is = new FileInputStream("somefile.txt");
try {
// Use file
} finally {
is.close();
}
In Java 7 or later, you can write that more succinctly as ///
try (InputStream is = new FileInputStream("somefile.txt")) {
// Use file
}
In the latter, the InputStream object
is automatically closed when the try
completes ... in an implicit finally
block.
0,8 in the output above means device numbers. How can I find devices with these numbers?
That is probably irrelevant to solving the problem. Focus on why the file descriptors are not being closed. The knowing what device numbers mean doesn't help.
In Production environment I don't see so many pipes.
That's probably a red-herring too. It could be caused by the GC running more frequently, and closing the orphaned file descriptors before the become a problem.
(But forcing the GC to run is not a solution. You should not rely on the GC to close file descriptors. It is inefficient and unreliable.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With