Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to estimate if the JVM has enough free memory for a particular data structure?

I have the following situation: there are a couple of machines forming a cluster. Clients can load data-sets and we need to select the node on which the dataset will be loaded and refuse to load / avoid an OOM error if there is no one machine which could fit the dataset.

What we do currently: we now the entry count in the dataset and estimate the memory to be used as entry count * empirical factor (determined manually). Then check if this is lower than free memory (got by Runtime.freeMemory()) and if so, load it (otherwise redo the process on other nodes / report that there is no free capacity).

The problems with this approach are:

  • the empirical factor needs to be revisited and updated manually
  • freeMemory sometimes may underreport because of some non-cleaned-up garbage (which could be avoided by running System.gc before each such call, however that would slow down the sever and also potentially lead to premature promotion)
  • an alternative would be to "just try to load the dataset" (and back out if an OOM is thrown) however once an OOM is thrown, you potentially corrupted other threads running in the same JVM and there is no graceful way of recovering from it.

Are there better solutions to this problem?

like image 328
Grey Panther Avatar asked Apr 14 '16 12:04

Grey Panther


People also ask

How do I check my JVM free memory?

You can use Runtime. getRuntime. totalMemory() to get total memory from JVM which represents the current heap size of JVM which is a combination of used memory currently occupied by objects and free memory available for new objects.

How much memory does the JVM have?

The JVM has a default setting of 1/4 of main memory. If you have 4 GB it will default to 1 GB. Note: this is a pretty small system and you get get some embedded devices and phones which this much memory. If you can afford to buy a little more memory it will make your life easier.

How is Java heap memory calculated?

Use this code: // Get current size of heap in bytes long heapSize = Runtime. getRuntime(). totalMemory(); // Get maximum size of heap in bytes.


4 Answers

The empirical factor can be calculated as build step and placed in a properties file.

While freeMemory() is almost always less than the amount which would be free after a GC, you can check it to see if it is available and call a System.gc() if the maxMemory() indicates there might be plenty.

NOTE: Using System.gc() in production only makes in very rare situations and in general it often incorrectly used resulting in a reduction in performance and obscuring the real problem.

I would avoid triggering an OOME unless you are running is a JVM you can restart as required.

like image 148
Peter Lawrey Avatar answered Oct 29 '22 21:10

Peter Lawrey


My solution:

  1. Set the Xmx as 90%-95% of RAM of physical machine if no other process is running except your program. For 32 GB RAM machine, set Xmx as 27MB - 28MB.

  2. Use one of good gc algorithms - CMS or G1GC and fine tune relevant parameters. I prefer G1GC if you need more than 4 GB RAM for your application. Refer to this question if you chose G1GC:

    Agressive garbage collector strategy

    Reducing JVM pause time > 1 second using UseConcMarkSweepGC

  3. Calculate Cap on memory usage by yourself instead of checking free memory. Add used memory and memory to be allocated. Subtract it from your own cap like 90% of Xmx. If you still have available memory, grant memory allocation request.

like image 39
Ravindra babu Avatar answered Oct 29 '22 22:10

Ravindra babu


An alternative approach is to isolate each data-load in its own JVM. You just predefine each JVM's max-heap-size and so on, and set the number of JVMs per host in such a way that each JVM can take up its full max-heap-size. This will use a bit more resources — it means you can't make use of every last byte of memory by cramming in more low-memory data-loads — but it massively simplifies the problem (and reduces the risk of getting it wrong), it makes it feasible to tell when/whether you need to add new hosts, and most importantly, it reduces the impact that any one client can have on all other clients.

With this approach, a given JVM is either "busy" or "available".

After any given data-load completes, the relevant JVM can either declare itself available for a new data-load, or it can just close. (Either way, you'll want to have a separate process to monitor the JVMs and make sure that the right number are always running.)

like image 37
ruakh Avatar answered Oct 29 '22 20:10

ruakh


an alternative would be to "just try to load the dataset" (and back out if an OOM is thrown) however once an OOM is thrown, you potentially corrupted other threads running in the same JVM and there is no graceful way of recovering from it.

There isn't good ways to handle and recover from OOME in JVM but there is way to react before OOM happens. Java has java.lang.ref.SoftReference which is guaranteed to have been cleared before the virtual machine throws an OutOfMemoryError. This fact can be used for early prediction of OOM. For example data load can be aborted if prediction triggered.

    ReferenceQueue<Object> q = new ReferenceQueue<>();
    SoftReference<Object> reference = new SoftReference<>(new Object(), q);
    q.remove();
    // reference removed - stop data load immediately

Sensitivity can be tuned with -XX:SoftRefLRUPolicyMSPerMB flag (for Oracle JVM). Solution not ideal, it effectiveness depends on various factors - do other soft references used in code, how GC tuned, JVM version, weather on Mars... But it can help if you lucky.

like image 30
Andrew Kolpakov Avatar answered Oct 29 '22 21:10

Andrew Kolpakov