Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it a good idea to force garbage collection in Clojure?

I have a Clojure program that is consuming a large amount of heap while running (I once measured it at somewhere around 2.8GiB), and I'm trying to find a way to reduce its memory footprint. My current plan is to force garbage collection every so often, but I'm wondering if this is a good idea. I've read How to force garbage collection in Java? and Can I Force Garbage Collection in Java? and understand how to do it — just call (System/gc) — but I don't know if it's a good idea, or even if it's needed.

Here's how the program works. I have a large number of documents in a legacy format that I'm trying to convert to HTML. The legacy format consists of several XML files: a metadata file that describes the document, and contains links to any number of content files (usually one, but it can be several — for example, some documents have "main" content and footnotes in separate files). The conversion takes anywhere from a few milliseconds for the smallest documents, to about 58 seconds for the largest document. Basically, I'm writing a glorified XSLT processor, though in a much nicer language than XSLT.

My current (rather naïve) approach, written when I was just starting out in Clojure, builds a list of all the metadata files, then does the following:

(let [parsed-trees (map parse metadata-files)]
  (dorun (map work-func parsed-trees)))

work-func converts the files to HTML and writes the result to disk, returning nil. (I was trying to throw away the parsed-XML trees for each document, which is quite large, after each pass through a single document). I now realize that although map is lazy and dorun throws away the head of the sequence it's iterating over, the fact that I was holding onto the head of the seq in parsed-trees is why I was failing.

My new plan is to move the parsing into work-func, so that it will look like:

(defn work-func [metadata-filename]
  (-> metadata-filename
      e/parse
      xml-to-html
      write-html-file)
  (System/gc))

Then I can call work-func with map, or possibly pmap since I have two dual-core CPUs, and hopefully throw away the large XML trees after each document is processed.

My question, though, is: is it a good idea to be telling Java "please clean up after me" so often? Or should I just skip the (System/gc) call in work-func, and let the Java garbage collector run when it feels the need to? My gut says to keep the call in, because I know (as Java can't) that at that point in work-func, there is going to be a large amount of data on the heap that can be gotten rid of, but I would welcome input from more experienced Java and/or Clojure coders.

like image 430
rmunn Avatar asked Feb 26 '14 10:02

rmunn


People also ask

Is it possible to force garbage collection in?

You really can't force Java GC. The Java garbage collection algos are non-deterministic, and while all of these methods can motivate the JVM to do GC, you can't actually force it.

Does garbage collection affect performance?

The most common performance problem associated with Java™ relates to the garbage collection mechanism. If the size of the Java heap is too large, the heap must reside outside main memory. This causes increased paging activity, which affects Java performance.

Is garbage collection Necessary?

It is not strictly necessary. Given enough time and effort you can always translate a program that depends on garbage collection to one that doesn't.

Can we force the garbage collector to run at any time?

Running the Garbage Collector You can ask the garbage collector to run at any time by calling System 's gc method: System. gc(); You might want to run the garbage collector to ensure that it runs at the best time for your program rather than when it's most convenient for the runtime system to run it.


1 Answers

Calling System/gc is not a helpful strategy. Assuming for now that you can't reduce the actual memory footprint of your code, what you should ensure is avoiding major GC. This will either happen automatically for you (by resizing the Young Generation until all your temp data fits), or you can tune it with explict JVM options to make the YG exceptionally large.

As long as you keep your short-lived objects from spilling into the Old generation for lack of space, you'll experience very short GC pauses. You will also not have to worry about explicitly invoking GC: it happens as soon as the Eden Space fills up.

like image 102
Marko Topolnik Avatar answered Sep 22 '22 02:09

Marko Topolnik