Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursion between different methods of the same multimethod

I'm writing a Clojure library to parse Mac OS X's XML-based property list files. The code works fine unless you give it a large input file, at which point you get java.lang.OutOfMemoryError: Java heap space.

Here's an example input file (small enough to work fine):

<plist version="1.0">
<dict>
    <key>Integer example</key>
    <integer>5</integer>
    <key>Array example</key>
    <array>
        <integer>2</integer>
        <real>3.14159</real>
    </array>
    <key>Dictionary example</key>
    <dict>
        <key>Number</key>
        <integer>8675309</integer>
    </dict>
</dict>
</plist>

clojure.xml/parse turns this into:

{:tag :plist, :attrs {:version "1.0"}, :content [
    {:tag :dict, :attrs nil, :content [
        {:tag :key, :attrs nil, :content ["Integer example"]}
        {:tag :integer, :attrs nil, :content ["5"]}
        {:tag :key, :attrs nil, :content ["Array example"]}
        {:tag :array, :attrs nil, :content [
            {:tag :integer, :attrs nil, :content ["2"]}
            {:tag :real, :attrs nil, :content ["3.14159"]}
        ]}
        {:tag :key, :attrs nil, :content ["Dictionary example"]}
        {:tag :dict, :attrs nil, :content [
            {:tag :key, :attrs nil, :content ["Number"]}
            {:tag :integer, :attrs nil, :content ["8675309"]}
        ]}
    ]}
]}

My code turns this into the Clojure data structure

{"Dictionary example" {"Number" 8675309},
 "Array example" [2 3.14159],
 "Integer example" 5}

The relevant part of my code looks like

; extract the content contained within e.g. <integer>...</integer>
(defn- first-content
  [c]
  (first (c :content)))

; return a parsed version of the given tag
(defmulti content (fn [c] (c :tag)))

(defmethod content :array
  [c]
  (apply vector (for [item (c :content)] (content item))))

(defmethod content :dict
  [c]
  (apply hash-map (for [item (c :content)] (content item))))

(defmethod content :integer
  [c]
  (Long. (first-content c)))

(defmethod content :key
  [c]
  (first-content c))

(defmethod content :real
  [c]
  (Double. (first-content c)))

; take a java.io.File (or similar) and return the parsed version
(defn parse-plist
  [source]
  (content (first-content (clojure.xml/parse source))))

The meat of the code is the content function, a multimethod that dispatches on the :tag (the name of the XML tag). I'm wondering whether there is something different I should be doing in order to make this recursion work better. I tried replacing all three calls to content with trampoline content, but that didn't work. Is there anything fancy I should do to get this mutual recursion to work more efficiently? Or am I taking a fundamentally wrong approach?

Edit: By the way, this code is available on GitHub, in which form it might be easier to play around with.

like image 326
bdesham Avatar asked Feb 01 '11 19:02

bdesham


2 Answers

You have multiple (one per child) recursive calls from a single method so your code isn't (and can't be without a heavy reorg)tail-recursive. trampoline is intended for mutual tail-recursive functions.

How deep, how long is your large XML file? I'm asking because you are getting an OoM not a SO.

Anyway, to solve your recursion problem (which is unlikely to be the one causing the exception) you have to walk down your XML datastructure (eg with xml-zip) while maintaining a stack (vector or list) representing your result tree under construction. It's ironic that the traversal of the XML datastructure is somewhat equivalent to the sax events which were used to build the structure.

like image 190
cgrand Avatar answered Nov 18 '22 09:11

cgrand


Heavy recursion will cause a StackOverflowException, not an OutOfMemoryError. Also the recursion does not seem to be very deep here (only 3 levels as per the XML file in your example).

My guess is, the OutOfMemoryError is being thrown because the data structure your large XML files are being parsed into are too large to fit in the JVM heap. You can try increasing the heap size using -Xms and -Xmx options. However, the correct way to parse huge XML files is to use SAX events rather than building a tree (DOM or Clojure data structure).

like image 41
Abhinav Sarkar Avatar answered Nov 18 '22 10:11

Abhinav Sarkar