Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting a org.w3c.dom.NodeList to a Clojure ISeq

I am trying to get a handle on the new defprotocol, reify, etc.

I have a org.w3c.dom.NodeList returned from an XPath call and I would like to "convert" it to an ISeq.

In Scala, I implemented an implicit conversion method:

implicit def nodeList2Traversable(nodeList: NodeList): Traversable[Node] = {
  new Traversable[Node] {
    def foreach[A](process: (Node) => A) {
      for (index <- 0 until nodeList.getLength) {
        process(nodeList.item(index))
      }
    }
  }
}

NodeList includes methods int getLength() and Node item(int index).

How do I do the equivalent in Clojure? I expect that I will need to use defprotocol. What functions do I need to define to create a seq?

If I do a simple, naive, conversion to a list using loop and recur, I will end up with a non-lazy structure.

like image 311
Ralph Avatar asked May 05 '11 13:05

Ralph


2 Answers

Most of Clojure's sequence-processing functions return lazy seqs, include the map and range functions:

(defn node-list-seq [^org.w3c.dom.NodeList node-list]
  (map (fn [index] (.item node-list index))
       (range (.getLength node-list))))

Note the type hint for NodeList above isn't necessary, but improves performance.

Now you can use that function like so:

(map #(.getLocalName %) (node-list-seq your-node-list))
like image 65
Chouser Avatar answered Sep 29 '22 03:09

Chouser


Use a for comprehension, these yield lazy sequences.

Here's the code for you. I've taken the time to make it runnable on the command line; you only need to replace the name of the parsed XML file.

Caveat 1: avoid def-ing your variables. Use local variables instead.

Caveat 2: this is the Java API for XML, so there objects are mutable; since you have a lazy sequence, if any changes happen to the mutable DOM tree while you're iterating, you might have unpleasant race changes.

Caveat 3: even though this is a lazy structure, the whole DOM tree is already in memory anyway (I'm not really sure about this last comment, though. I think the API tries to defer reading the tree in memory until needed, but, no guarantees). So if you run into trouble with big XML documents, try to avoid the DOM approach.

(require ['clojure.java.io :as 'io])
(import [javax.xml.parsers DocumentBuilderFactory])
(import [org.xml.sax InputSource])

(def dbf (DocumentBuilderFactory/newInstance))
(doto dbf
  (.setValidating false)
  (.setNamespaceAware true)
  (.setIgnoringElementContentWhitespace true))
(def builder (.newDocumentBuilder dbf))
(def doc (.parse builder (InputSource. (io/reader "C:/workspace/myproject/pom.xml"))))

(defn lazy-child-list [element]
  (let [nodelist (.getChildNodes element)
        len (.getLength nodelist)]
    (for [i (range len)]
      (.item nodelist i))))

;; To print the children of an element
(-> doc
    (.getDocumentElement)
    (lazy-child-list)
    (println))

;; Prints clojure.lang.LazySeq
(-> doc
    (.getDocumentElement)
    (lazy-child-list)
    (class)
    (println))
like image 42
Leonel Avatar answered Sep 29 '22 02:09

Leonel