Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to debug leak in native memory on JVM?

Tags:

memory

jvm

We have a java application running on Mule. We have the XMX value configured for 6144M, but are routinely seeing the overall memory usage climb and climb. It was getting close to 20 GB the other day before we proactively restarted it.

Thu Jun 30 03:05:57 CDT 2016
top - 03:05:58 up 149 days,  6:19,  0 users,  load average: 0.04, 0.04, 0.00
Tasks: 164 total,   1 running, 163 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.2%us,  1.7%sy,  0.0%ni, 93.9%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  24600552k total, 21654876k used,  2945676k free,   440828k buffers
Swap:  2097144k total,    84256k used,  2012888k free,  1047316k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3840 myuser  20   0 23.9g  18g  53m S  0.0 79.9 375:30.02 java

The jps command shows:

10671 Jps
3840 MuleContainerBootstrap

The jstat command shows:

 S0C    S1C    S0U    S1U      EC       EU        OC         OU       PC     PU    YGC     YGCT    FGC    FGCT     GCT
37376.0 36864.0 16160.0  0.0   2022912.0 1941418.4 4194304.0   445432.2  78336.0 66776.7    232    7.044  17     17.403   24.447

The startup arguments are (sensitive bits have been changed):

3840 MuleContainerBootstrap -Dmule.home=/mule -Dmule.base=/mule -Djava.net.preferIPv4Stack=TRUE -XX:MaxPermSize=256m -Djava.endorsed.dirs=/mule/lib/endorsed -XX:+HeapDumpOnOutOfMemoryError -Dmyapp.lib.path=/datalake/app/ext_lib/ -DTARGET_ENV=prod -Djava.library.path=/opt/mapr/lib -DksPass=mypass -DsecretKey=aeskey -DencryptMode=AES -Dkeystore=/mule/myStore -DkeystoreInstance=JCEKS -Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf -Dmule.mmc.bind.port=1521 -Xms6144m -Xmx6144m -Djava.library.path=%LD_LIBRARY_PATH%:/mule/lib/boot -Dwrapper.key=a_guid -Dwrapper.port=32000 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.disable_console_input=TRUE -Dwrapper.pid=10744 -Dwrapper.version=3.5.19-st -Dwrapper.native_library=wrapper -Dwrapper.arch=x86 -Dwrapper.service=TRUE -Dwrapper.cpu.timeout=10 -Dwrapper.jvmid=1 -Dwrapper.lang.domain=wrapper -Dwrapper.lang.folder=../lang

Adding up the "capacity" items from jps shows that only my 6144m is being used for java heap. Where the heck is the rest of the memory being used? Stack memory? Native heap? I'm not even sure how to proceed.

If left to continue growing, it will consume all memory on the system and we will eventually see the system freeze up throwing swap space errors.

I have another process that is starting to grow. Currently at about 11g resident memory.

pmap 10746 > pmap_10746.txt
cat pmap_10746.txt | grep anon | cut -c18-25 | sort -h | uniq -c | sort -rn | less

Top 10 entries by count:
    119     12K
    112   1016K
     56      4K
     38 131072K
     20  65532K
     15 131068K
     14  65536K
     10    132K
      8  65404K
      7    128K


Top 10 entries by allocation size:
     1 6291456K
      1 205816K
      1 155648K
     38 131072K
     15 131068K
      1 108772K
      1  71680K
     14  65536K
     20  65532K
      1  65512K

And top 10 by total size:
Count   Size    Aggregate
1   6291456K    6291456K
38  131072K 4980736K
15  131068K 1966020K
20  65532K  1310640K
14  65536K  917504K
8   65404K  523232K
1   205816K 205816K
1   155648K 155648K
112 1016K   113792K

This seems to be telling me that because the Xmx and Xms are set to the same value, there is a single allocation of 6291456K for the java heap. Other allocations are NOT java heap memory. What are they? They are getting allocated in rather large chunks.

like image 280
Galuvian Avatar asked Jul 01 '16 17:07

Galuvian


1 Answers

Expanding a bit more details on Peter's answer.

You can take a binary heap dump from within VisualVM (right click on the process in the left-hand side list, and then on heap dump - it'll appear right below shortly after). If you can't attach VisualVM to your JVM, you can also generate the dump with this:

jmap -dump:format=b,file=heap.hprof $PID

Then copy the file and open it with Visual VM (File, Load, select type heap dump, find the file.)

As Peter notes, a likely cause for the leak may be non collected DirectByteBuffers (e.g.: some instance of another class is not properly de-referencing buffers, so they are never GC'd).

To identify where are these references coming from, you can use Visual VM to examine the heap and find all instances of DirectByteByffer in the "Classes" tab. Find the DBB class, right click, go to instances view.

This will give you a list of instances. You can click on one and see who's keeping a reference each one:

Visual VM - Instances view

Note the bottom pane, we have "referent" of type Cleaner and 2 "mybuffer". These would be properties in other classes that are referencing the instance of DirectByteBuffer we drilled into (it should be ok if you ignore the Cleaner and focus on the others).

From this point on you need to proceed based on your application.

Another equivalent way to get the list of DBB instances is from the OQL tab. This query:

select x from java.nio.DirectByteBuffer x

Gives us the same list as before. The benefit of using OQL is that you can execute more more complex queries. For example, this gets all the instances that are keeping a reference to a DirectByteBuffer:

select referrers(x) from java.nio.DirectByteBuffer x
like image 150
Galo Navarro Avatar answered Sep 28 '22 10:09

Galo Navarro