I have valid XHTML file (100 megabytes of data) with one large table. First tr are columns (for database), all other tr's are data. It is the only table in whole document and it is in structure html->body->div->table.
How can I parse it lazy way in Clojure?
I know about data.xml but because I am Clj beginner it is very difficult for me to let it work. Especially because REPL is very slow while working with such a big file.
data.xml
docs says it creates lazy tree of a document: parse. I checked locally and it seems to be true:
; Load libs
(require '[clojure.data.xml :as xml])
(require '[clojure.java.io :as io])
; standard.xml is 100MB xml file from here http://www.xml-benchmark.org/downloads.html
(def xml-tree (xml/parse (io/reader "standard.xml")))
(:tag xml-tree) => :site
(def child (first (:content xml-tree)))
(:tag child) => :regions
(dorun (:content xml-tree)) => REPL hangs for ~30 seconds on my computer because it tries to parse whole file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With