I have the following situation: there are a couple of machines forming a cluster. Clients can load data-sets and we need to select the node on which the dataset will be loaded and refuse to load / avoid an OOM error if there is no one machine which could fit the dataset.
What we do currently: we now the entry count
in the dataset and estimate the memory to be used
as entry count * empirical factor
(determined manually). Then check if this is lower than free memory (got by Runtime.freeMemory()
) and if so, load it (otherwise redo the process on other nodes / report that there is no free capacity).
The problems with this approach are:
empirical factor
needs to be revisited and updated manuallyfreeMemory
sometimes may underreport because of some non-cleaned-up garbage (which could be avoided by running System.gc
before each such call, however that would slow down the sever and also potentially lead to premature promotion)Are there better solutions to this problem?
You can use Runtime. getRuntime. totalMemory() to get total memory from JVM which represents the current heap size of JVM which is a combination of used memory currently occupied by objects and free memory available for new objects.
The JVM has a default setting of 1/4 of main memory. If you have 4 GB it will default to 1 GB. Note: this is a pretty small system and you get get some embedded devices and phones which this much memory. If you can afford to buy a little more memory it will make your life easier.
Use this code: // Get current size of heap in bytes long heapSize = Runtime. getRuntime(). totalMemory(); // Get maximum size of heap in bytes.
The empirical factor
can be calculated as build step and placed in a properties file.
While freeMemory()
is almost always less than the amount which would be free after a GC, you can check it to see if it is available and call a System.gc()
if the maxMemory()
indicates there might be plenty.
NOTE: Using System.gc()
in production only makes in very rare situations and in general it often incorrectly used resulting in a reduction in performance and obscuring the real problem.
I would avoid triggering an OOME unless you are running is a JVM you can restart as required.
My solution:
Set the Xmx as 90%-95%
of RAM of physical machine if no other process is running except your program. For 32 GB RAM machine, set Xmx
as 27MB - 28MB
.
Use one of good gc algorithms - CMS or G1GC and fine tune relevant parameters. I prefer G1GC if you need more than 4 GB RAM for your application
. Refer to this question if you chose G1GC:
Agressive garbage collector strategy
Reducing JVM pause time > 1 second using UseConcMarkSweepGC
Calculate Cap on memory usage by yourself instead of checking free memory. Add used memory and memory to be allocated. Subtract it from your own cap like 90% of Xmx
. If you still have available memory, grant memory allocation request.
An alternative approach is to isolate each data-load in its own JVM. You just predefine each JVM's max-heap-size and so on, and set the number of JVMs per host in such a way that each JVM can take up its full max-heap-size. This will use a bit more resources — it means you can't make use of every last byte of memory by cramming in more low-memory data-loads — but it massively simplifies the problem (and reduces the risk of getting it wrong), it makes it feasible to tell when/whether you need to add new hosts, and most importantly, it reduces the impact that any one client can have on all other clients.
With this approach, a given JVM is either "busy" or "available".
After any given data-load completes, the relevant JVM can either declare itself available for a new data-load, or it can just close. (Either way, you'll want to have a separate process to monitor the JVMs and make sure that the right number are always running.)
an alternative would be to "just try to load the dataset" (and back out if an OOM is thrown) however once an OOM is thrown, you potentially corrupted other threads running in the same JVM and there is no graceful way of recovering from it.
There isn't good ways to handle and recover from OOME in JVM but there is way to react before OOM happens. Java has java.lang.ref.SoftReference which is guaranteed to have been cleared before the virtual machine throws an OutOfMemoryError. This fact can be used for early prediction of OOM. For example data load can be aborted if prediction triggered.
ReferenceQueue<Object> q = new ReferenceQueue<>();
SoftReference<Object> reference = new SoftReference<>(new Object(), q);
q.remove();
// reference removed - stop data load immediately
Sensitivity can be tuned with -XX:SoftRefLRUPolicyMSPerMB flag (for Oracle JVM). Solution not ideal, it effectiveness depends on various factors - do other soft references used in code, how GC tuned, JVM version, weather on Mars... But it can help if you lucky.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With