Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tomcat process killed by Linux kernel after running out of swap space; don't get any JVM OutOfMemory error

I was performing load testing against a tomcat server. The server has 10G physical memory and 2G swap space. The heap size (xms and xmx) was set to 3G before, and the server just worked fine. Since I still saw a lot free memory left and the performance was not good, I increased heap size to 7G and ran the load testing again. This time I observed physical memory was eaten up very quickly, and the system started consuming swap space. Later, tomcat crashed after running out of swap space. I included -XX:+HeapDumpOnOutOfMemoryError when starting tomcat, but I didn't get any heap dump. When I checked /var/log/messages, I saw kernel: Out of memory: Kill process 2259 (java) score 634 or sacrifice child.

To provide more info, here's what I saw from Linux top command when heap size set to 3G and 7G

xms&xmx = 3G (which worked fine):

  • Before starting tomcat:

    Mem:  10129972k total,  1135388k used,  8994584k free,    19832k buffers
    Swap:  2097144k total,        0k used,  2097144k free,    56008k cached
    
  • After starting tomcat:

    Mem:  10129972k total,  3468208k used,  6661764k free,    21528k buffers
    Swap:  2097144k total,        0k used,  2097144k free,   143428k cached
    PID  USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    2257 tomcat    20   0 5991m 1.9g  19m S 352.9 19.2   3:09.64 java
    
  • After starting load for 10 min:

    Mem:  10129972k total,  6354756k used,  3775216k free,    21960k buffers
    Swap:  2097144k total,        0k used,  2097144k free,   144016k cached
    PID  USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    2257 tomcat    20   0 6549m 3.3g  10m S 332.1 34.6  16:46.87 java
    

xms&xmx = 7G (which caused tomcat crash):

  • Before starting tomcat:

    Mem:  10129972k total,  1270348k used,  8859624k free,    98504k buffers
    Swap:  2097144k total,        0k used,  2097144k free,    74656k cached
    
  • After starting tomcat:

    Mem:  10129972k total,  6415932k used,  3714040k free,    98816k buffers
    Swap:  2097144k total,        0k used,  2097144k free,   144008k cached
    PID  USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    2310 tomcat    20   0  9.9g 3.5g  10m S  0.3 36.1   3:01.66 java
    
  • After starting load for 10 min (right before tomcat was killed):

    Mem:  10129972k total,  9960256k used,   169716k free,      164k buffers
    Swap:  2097144k total,  2095056k used,     2088k free,     3284k cached
    PID  USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    2310 tomcat    20   0 10.4g 5.3g  776 S  9.8 54.6  14:42.56 java
    

Java and JVM Version:

Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)

Tomcat Version:

6.0.36

Linux Server:

Red Hat Enterprise Linux Server release 6.4 (Santiago)

So my questions are:

  1. Why would this issue happen? When JVM runs out of memory why is there no OutOfMemoryError thrown? And why does it go straight to using swap?
  2. Why top RES shows that java is using 5.3G memory, there's much more memory consumed?

I have been investigating and searching for a while, still cannot find the root cause for this issue. Thanks a lot!

like image 357
baggiowen Avatar asked Jun 19 '13 23:06

baggiowen


1 Answers

Why would this issue happen? When JVM runs out of memory why is there no OutOfMemoryException thrown?

It is not the JVM that has run out of memory. It is the Host Operating System that has run out of memory-related resources, and is taking drastic action. The OS has no way of knowing that the process (in this case the JVM) is capable of shutting down in an orderly fashion when told "No" in response to a request for more memory. It HAS to hard-kill something or else there is a serious risk of the entire OS hanging.

Anyway, the reason you are not seeing OOMEs is that this is not an OOME situation. In reality, the JVM has already been given too much memory by the OS, and there is no way to take it back. That's the problem the OS has to deal with by hard-killing processes.

And why does it go straight to using swap?

It uses swap because the total virtual memory demand of the entire system won't fit in physical memory. This is NORMAL behaviour for a UNIX / Linux operating system.

Why top RES shows that java is using 5.3G memory, there's much more memory consumed

The RES numbers can be a little misleading. What they refer to is the amount of physical memory that the process is currently using ... excluding stuff that is shared or shareable with other processes. The VIRT number is more relevant to your problem. It says your JVM is using 10.4g of virtual ... which is more than the available physical memory on your system.


As the other answer says, it is concerning that it concerns you that you don't get an OOME. Even if you did get one, it would be unwise to do anything with it. An OOME is liable to do collateral damage to your application / container that is hard to detect and harder to recover from. That's why OOME is an Error not an Exception.


Recommendations:

  • Don't try to use significantly more virtual memory than you have physical memory, especially with Java. When a JVM is running a full garbage collection, it will touch most of its VM pages, multiple times in random order. If you have over-allocated your memory significantly this is liable to cause thrashing which kills performance for the entire system.

  • Do increase your system's swap space. (But that might not help ...)

  • Don't try to recover from OOMEs.

like image 189
Stephen C Avatar answered Sep 27 '22 17:09

Stephen C