We have an application that is widely deployed (several hundred workstations running it). At one site (and only one site - our product is widely deployed to many environments), we randomly get the following error:
java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Unknown Source)
Operating system is Windows 7 64 bit We are running in a 32 bit JVM ( 1.7.0_45)
Using Windows Task Manager, I can see that the process has 39 native threads (not very many), so we don't have a thread leak in our app... There are no other processes consuming lots of threads (Explorer has 35, jvisualvm has 24, iexplore has 20, ... I don't have an exact count, but we are probably looking at maybe 300 threads for the user total).
I have attempted to attach JVisualVM, but it fails to connect to the process (probably b/c of thread exhaustion). But from the metrics I can obtain from JVisualVM, the number of Java threads is about 22 live and 11 daemon.
The heap is well behaved - heap is 500MB with 250MB actually used.
The process is launched with -Xmx512m
Our process is showing Memory usage (in Task Manager) of 597,744K.
The workstation has 8GB RAM, of which only 3.8-4.0GB are used (I know, a 32 bit process won't access all of that, but there's still plenty)
Used VMMap, and the stack is 49,920KB size with 2,284K committed.
The process shows 5358KB free, and the largest allocatable block in the free list is 1,024K in size.
I used Resource Monitor and it's showing the Commit (KB) to be 630428, working set (KB) is 676,996, Shareable (KB) is 79,252 and the Private (KB) is 597,744
I am at a complete loss as to what is going on here. I've read a ton of articles on this, and it sounds like on some Linux systems, there is a per-user thread limit that can cause problems (but this is not Linux, and the problems described in other articles usually talk about needing thousands of threads - definitely not our case here).
If our heap was really big, I could see that eating into space available for threads, but 500MB seems like a very reasonable and small heap (esp for a workstation with 8GB RAM).
So I've pretty much exhausted everything I know to do - does anyone have any additional pointers about what might be going on here?
EDIT 1:
I found this interesting article: Eclipse crashes with "Unable to create new native thread" - any ideas? (my settings and info inside)
They are suggesting that stack size could be the problem.
This article: where to find default XSS value for Sun/Oracle JVM? - gives a link to Oracle documentation saying that default stack size is 512KB. So if my app has about 40 threads, we are looking at 20 MB of stack. 500MB heap. This all seems to be well within normal bounds for a 32 bit Java process.
So that leaves me with two possibilities that I can think of:
So, are there any pointers about how to check for memory segmentation?
EDIT 2:
Article linked to by @mikhael ( http://blog.egilh.com/2006/06/2811aspx.html ) gives some rough calculations for allowed # of threads on 32 bit JVM.
I'm going to assume:
OS process space limit: 2GB Modern JVM requires 250MB (this is a big assumption - I just doubled what was in the linked article) Stack size (default Oracle): 512KB Heap: 512MB PermGen: (can't remember exactly, but it was certainly less than 100MB, so let's just use that)
So I have a worst case scenario of: (2GB - .25GB - .5GB - .1GB)/.005GB = 230 threads
EDIT 3:
Info I should have included originally: The application runs fine for a good while (like 24 to 48 hours) before this problem happens. The application does continuous background processing, so has very little idle time. Not sure if that's important or not...
EDIT 4:
More info: Looking at VMMap from another failure, and I'm seeing native heap exhaustion.
The Heap size is 1.2GB, with only 59.8MB committed.
Either the Java runtime is the problem here, or maybe some issue with native resources not being released properly? Like maybe a memory mapped file that isn't getting released?
We do use memory mapped files, so I'll put my focus on those.
EDIT 4:
I think that I've tracked the problem down to an exception that happens as follows:
java.lang.OutOfMemoryError
at java.util.zip.Deflater.init(Native Method)
at java.util.zip.Deflater.<init>(Unknown Source)
at java.util.zip.Deflater.<init>(Unknown Source)
at java.util.zip.DeflaterOutputStream.<init>(Unknown Source)
at java.util.zip.DeflaterOutputStream.<init>(Unknown Source)
at ....
On some very small handful of streams (I have 4 examples now) we are deflating, the above happens. And when it happens, VMMap spikes the heap of the process (not the JVM heap, but the actual native heap) up to 2GB. Once that happens, everything falls apart. This is now very repeatable (running the same stream into the deflater results in the memory spiking)
So, are we maybe looking at a problem with the JRE's zip library? Seems crazy to think that would be it, but I'm really at a loss.
If I take the exact same stream and run it on a different system (even running the same JRE - 32 bit, Java 7u45), we don't get the problem. I have completely uninstalled the JRE and reinstalled it without any change in behavior.
This type of OutOfMemoryError is generated when an application isn't able to create new threads. This error can surface because of following two reasons: There is no room in the memory to accommodate new threads. The number of threads exceeds the Operating System limit.
Each JVM server can have a maximum of 256 threads to run Java applications.
"Native threads" refers to a in which the Java virtual machine creates and manages Java threads using the operating system threads library - named libthread on UnixWare - and each Java thread is mapped to one threads library thread.
Finally figured this out.
There were a couple of data streams that we processed (4 out of 10 million at this site) that wound up creating a ton of DeflaterOutputStream objects. A 3rd party library we were using was calling finish() on the stream instead of close(). The underlying Deflater finalizer was cleaning things up, so as long as the load wasn't too high, there were no problems. But past a tipping point, we started running into this:
http://jira.pentaho.com/browse/PRD-3211
which led us to this:
http://bugs.sun.com/view_bug.do?bug_id=4797189
Several hours after that happened, the system finally got itself into a corner that it couldn't get out of and was unable to create a native thread when we needed.
The fix was to get the 3rd party library to close the DeflaterOutputStream.
So definitely a native resource leak. If anyone else is ever hitting something like this, the VMMap tool was indispensable for eventually tracking down which data streams were causing the problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With