Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Direct ByteBuffer ever increasing on HornetQ server leading to OOM?

Configuration

I have setup a standalone HornetQ (2.4.7-Final) cluster on Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-29-generic x86_64). The instance has 16GB of RAM with 2 cores and I have allocated -Xms5G -Xmx10G to the JVM.

Following is the address setting in the HornetQ configuration:

   <address-settings>
      <address-setting match="jms.queue.pollingQueue">
         <dead-letter-address>jms.queue.DLQ</dead-letter-address>
         <expiry-address>jms.queue.ExpiryQueue</expiry-address>
         <redelivery-delay>86400000</redelivery-delay>
         <max-delivery-attempts>10</max-delivery-attempts>
         <max-size-bytes>1048576000</max-size-bytes>
         <page-size-bytes>10485760</page-size-bytes>
         <address-full-policy>PAGE</address-full-policy>
         <message-counter-history-day-limit>10</message-counter-history-day-limit>
      </address-setting>
      <address-setting match="jms.queue.offerQueue">
         <dead-letter-address>jms.queue.DLQ</dead-letter-address>
         <expiry-address>jms.queue.ExpiryQueue</expiry-address>
         <redelivery-delay>3600000</redelivery-delay>
         <max-delivery-attempts>25</max-delivery-attempts>
         <max-size-bytes>1048576000</max-size-bytes>
         <page-size-bytes>10485760</page-size-bytes>
         <address-full-policy>PAGE</address-full-policy>
         <message-counter-history-day-limit>10</message-counter-history-day-limit>
      </address-setting>
      <address-setting match="jms.queue.smsQueue">
         <dead-letter-address>jms.queue.DLQ</dead-letter-address>
         <expiry-address>jms.queue.ExpiryQueue</expiry-address>
         <redelivery-delay>3600000</redelivery-delay>
         <max-delivery-attempts>25</max-delivery-attempts>
         <max-size-bytes>1048576000</max-size-bytes>
         <page-size-bytes>10485760</page-size-bytes>
         <address-full-policy>PAGE</address-full-policy>
         <message-counter-history-day-limit>10</message-counter-history-day-limit>
      </address-setting>
      <!--default for catch all-->
      <!-- delay redelivery of messages for 1hr -->
      <address-setting match="#">
         <dead-letter-address>jms.queue.DLQ</dead-letter-address>
         <expiry-address>jms.queue.ExpiryQueue</expiry-address>
         <redelivery-delay>3600000</redelivery-delay>
         <max-delivery-attempts>25</max-delivery-attempts>
         <max-size-bytes>1048576000</max-size-bytes>
         <page-size-bytes>10485760</page-size-bytes>
         <address-full-policy>PAGE</address-full-policy>
         <message-counter-history-day-limit>10</message-counter-history-day-limit>
      </address-setting>
   </address-settings>

There are 10 other queues bound to the default address specified by wildcard.

Problem

Over a period of time the Direct ByteBuffer memory gradually increases in size and even occupies the swap space eventually throwing OutOfMemoryError ("Direct buffer memory").

I have tried a lot of JVM and JMS tuning but in vain. Even specifying a -XX:MaxDirectMemorySize=4G to the JVM resulted in an early OOME for the same reason. It seems either the ByteBuffer isn't being read or GC isn't claiming the unreferenced memory.

Has anybody faced the same issue before?

Any suggestions are welcome and thanks in advance.

like image 682
Tushu Avatar asked Jan 24 '16 15:01

Tushu


1 Answers

I don't know anything about hornetq's internals, so this answer only covers DBBs in general:

  • its an ordinary leak, the DBB objects simply are still reachable and thus not freed. This could arise either from a bug in or incorrect usage of the application.
    The usual approach here is to take a heap dump and determine what keeps the objects alive.

  • the buffers become unreachable but the garbage collector performs a old gen collection so rarely that it takes a long time until they are actually collected and the native memory gets freed. If the server runs with -XX:+DisableExplicitGC that also suppresses the last-ditch Full GC attempted when the MaxDirectMemorySize limit is reached.
    Tuning the GC to run more frequently to ensure timely release of the DBBs could solve that case.

like image 73
the8472 Avatar answered Sep 17 '22 13:09

the8472