Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

My java process's file descriptors going "bad" and I have no idea why

Tags:

java

linux

lucene

I have a java webapp, built with Lucene, and I keep getting various "file already closed" exceptions - depending on which Directory implementation I use. I've been able to get "java.io.IOException Bad File Descriptor" and "java.nio.channels.ClosedChannelException" out of Lucene, usually wrapped around an AlreadyClosedException for the IndexReader.

The funny thing is, I haven't closed the IndexReader and it seems the file descriptors are going stale on their own. I'm using the latest version of Lucene 3.0 (haven't had time to upgrade out of the 3.0 series), the latest version of Oracle's JDK6, the latest version of Tomcat 6 and the latest version of CentOS. I can replicate the bug with the same software on other Linux systems, but not on Windows systems and I don't have an OSX PC to test with. The linux servers are virtualized with qEmu, if that could matter at all.

This seems to also be load related - how frequently this happens corresponds to the amount of requests/second that Tomcat is serving (to this particular webapp). For example, on one server every request completes as expected until it has to deal with ~2 reqs/sec, then about 10% start having their file descriptors closed from under them, mid-request (the code checks for a valid IndexReader object and creates one at the beginning of processing the request). Once it gets to about 3 reqs/sec, all of the requests start failing with bad file descriptors.

My best guess is that somehow there's resource starvation at an OS level and the OS is cleaning up fds... but that's simply because I've eliminated every other idea I've had. I've already checked the ulimits and the filesystem fd limits and the number of open descriptors is well below either limit (example output from sysctl fs.file-nr: 1020 0 203404, ulimit -n: 10240).

I'm almost completely out of things to test and I'm no closer to solving this than the day that I found out about it. Has anyone experienced anything similar?

EDIT 07/12/2011: I found an OSX machine to use for some testing and have confirmed that this happens on OSX. I've also done testing on physical Linux boxes and replicated the issue, so the only OS that I've been unable to replicate this issue with is Windows. I'm guessing this has something to do with POSIX handling of file descriptors because that seems to be the only relevant difference between the two test systems (JDK version, tomcat version and webapp were all identical across all platforms).

like image 257
oorza Avatar asked Jul 11 '11 16:07

oorza


People also ask

What causes bad file descriptor?

In general, when "Bad File Descriptor" is encountered, it means that the socket file descriptor you passed into the API is not valid, which has multiple possible reasons: The fd is already closed somewhere. The fd has a wrong value, which is inconsistent with the value obtained from socket() api.

How do I find the file descriptor leak?

You can detect a file descriptor leak in two different ways: You may notice a lot of IOExceptions with the message “Too many open files.” During load testing, you periodically run a profiling script, such as lsof (on UNIX), and you notice that the list of file descriptors grows continually.

Can you run out of file descriptors?

File DescriptorseditRunning out of file descriptors can be disastrous and will most probably lead to data loss. Make sure to increase the limit on the number of open files descriptors for the user running Elasticsearch to 65,536 or higher.

Can file descriptors be reused?

Since a file descriptor may be reused, there are some obscure race conditions that may cause unintended side effects.


1 Answers

the reason you probably don't see this happening on Windows, might be that its FSDirectory.open defaults to using SimpleFSDirectory.

check out the warnings at the top of FSDirectory and NIOFSDirectory: the text in red at http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/store/NIOFSDirectory.html:

NOTE: Accessing this class either directly or indirectly from a thread while it's interrupted can close the underlying file descriptor immediately if at the same time the thread is blocked on IO. The file descriptor will remain closed and subsequent access to NIOFSDirectory will throw a ClosedChannelException. If your application uses either Thread.interrupt() or Future.cancel(boolean) you should use SimpleFSDirectory in favor of NIOFSDirectory

https://issues.apache.org/jira/browse/LUCENE-2239

like image 199
Robert Muir Avatar answered Sep 22 '22 19:09

Robert Muir