Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I reset Java heap space maximum after use?

I'm working with some modeling algorithms in R, one of which runs in Java (bartMachine). I've found that with the size of my data I need to increase the maximum heap space for java before running the modeling algorithm.

I'm doing this like so:

options(java.parameters = "-Xmx16g")

My question is, do I need to reset the heap space afterwards, if no other algorithm is going to be using java (or at least that much heap space)? Or will the memory allocated to java be reclaimed as needed with no performance loss?

I've already searched around some on the subject, and I understand how to change/lower the heap space. I also understand that R/Java will do garbage collection to remove old objects from memory to free more space.

What I don't understand is how changing the heap space affects the memory available for other programs, and whether it is necessary or even a good idea in this case to alter the heap size post-use.

Some of the answers/resources I've already looked at:

Is there a way to lower Java heap when not in use?

Java garbage collector - When does it collect?

http://www.bramschoenmakers.nl/en/node/726

https://cran.r-project.org/web/packages/bartMachine/bartMachine.pdf

like image 622
eleventhend Avatar asked Apr 20 '16 15:04

eleventhend


2 Answers

It's implementation defined and depending on implementation effected by quite a few parameters. The garbage collector can affect it. On a Mac using Oracles JVM 1.7 it defaults to the parallel collector -XX:+UseParallelGC and this collector doesn't release memory back to the OS. I tried it on a mac and it didn't free up anything but using -XX:+UseG1GC did. You can see what version is the default for you using this:

java -XX:+PrintGCDetails -XX:+PrintCommandLineFlags -version

There are a few parameters you can use to tweak how memory is released if you're using a JVM that supports it and the correct garbage collector, i.e.

-XX:MinHeapFreeRatio (default is 40)
-XX:MaxHeapFreeRatio (default is 70)

but they're hit and miss (the JVM decides when it releases memory, just freeing up a ton of objects might not trigger it).

like image 163
Harry Avatar answered Nov 11 '22 14:11

Harry


Having worked with a non ML program that is Java-heavy lately, I feel your pain.

I cannot tell you whether or not to reset the dynamically allocated memory based on a single undeniable technical fact, but my personal experience tells me that if you are going to continue processing in the native R environment after your Java work, you probably should. It is best to control what you can.

Here is why:

The only times I have ever run out of memory (even working with MASSIVE flat files) is when I have been using JVM in some way. It is not a one time thing, it has happened often.

It even happens just reading and writing large excel files through XLConnect which is Java driven; the memory gets jammed up super quickly. It seems to be a failure in the way R and Java play with each other.

And, r does not automatically garbage collect the way you would hope. It collects when the OS asks for more memory, but things can get slow long before that happens.

Also R only sees objects in memory which it creates, not those it interprets, thus your Java kulch will linger around unbeknownst to R. So if the JVM created it, R will not clean it up if Java does not do so before going dormant. And if memory is selectively recycled you can have fragmented memory gaps which affect performance a lot.

My personal approach has been to create sets, variables, frames...subset to only what I need, then rm() and gc()...remove and force garbage collection.

Go on the the next step and do heavy lifting. If I run a Java-based package, I will do this purging more frequently to keep the memory clean.

Once the Java process is done, I use detach(yourlibraryname) and gc() to clear everything out.

If you have adjusted 'heaps', I would write the re-adjust here lowering the allocation you give to Javas dynamic memory, because R has no way of taking it back if the Java Virtual Machine is still engaged but not operating as far as I have been able to ascertain. So you should reset it and to give back to R what is R's to use. I think in the long run it will benefit you with faster processing and less lock-ups.

The best way to know how it affects your system as you are using it is to use a sys.time or proc.time function to see how long your script takes both with and without forced garbage collections, removals, detachments and heap reallocation.

You can get a solid grasp on how to do this here:

IDRE -UCLE proc.time functions

Hope this helps some!

like image 5
sconfluentus Avatar answered Nov 11 '22 15:11

sconfluentus