How can I lazily parse big XHTML file in Clojure?

Question

I have valid XHTML file (100 megabytes of data) with one large table. First tr are columns (for database), all other tr's are data. It is the only table in whole document and it is in structure html->body->div->table.

How can I parse it lazy way in Clojure?

I know about data.xml but because I am Clj beginner it is very difficult for me to let it work. Especially because REPL is very slow while working with such a big file.

Mikita Belahlazau · Accepted Answer

data.xml docs says it creates lazy tree of a document: parse. I checked locally and it seems to be true:

; Load libs
(require '[clojure.data.xml :as xml])
(require '[clojure.java.io :as io])

; standard.xml is 100MB xml file from here http://www.xml-benchmark.org/downloads.html
(def xml-tree (xml/parse (io/reader "standard.xml")))
(:tag xml-tree) => :site

(def child (first (:content xml-tree)))
(:tag child) => :regions

(dorun (:content xml-tree)) => REPL hangs for ~30 seconds on my computer because it tries to parse whole file

How can I lazily parse big XHTML file in Clojure?

Tags:

parsing

html-parsing

xhtml

clojure

Jiri Knesl

1 Answers

Mikita Belahlazau

Recent Activity

Donate For Us

How can I lazily parse big XHTML file in Clojure?

Tags:

parsing

html-parsing

xhtml

clojure

Jiri Knesl

1 Answers

Mikita Belahlazau

Related questions

Recent Activity

Donate For Us