Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse html file using clojure?

Tags:

html

clojure

I'm new to clojure and I need some examples. Please show me how to parse html file using clojure?

like image 915
slawter Avatar asked Mar 18 '13 10:03

slawter


3 Answers

Enlive is a great tool for this. In short:

(ns foo.bar
  (:require [net.cgrand.enlive-html :as html]))

(defn fetch-page [url]
  (html/html-resource (java.net.URL. url)))

Here is a nice tutorial on using it both as a scraper/parser and as a template engine:

Here is a short example of scraping a page.

Another option is clj-tagsoup. Enlive also uses tagsoup, but in addition has a pluggable parser so you can add support for other parsers.

like image 57
ebaxt Avatar answered Nov 14 '22 21:11

ebaxt


Clojure's xml parsing library is there for you.

Parses and loads the source s, which can be a File, InputStream or String naming a URI. Returns a tree of the xml/element struct-map, which has the keys :tag, :attrs, and :content. and accessor fns tag, attrs, and content. Other parsers can be supplied by passing startparse, a fn taking a source and a ContentHandler and returning a parser

Or use enlive, it's framework fully on clojure or use Java based HtmlCleaner.

like image 23
Abimaran Kugathasan Avatar answered Nov 14 '22 20:11

Abimaran Kugathasan


HTML Parsers

  • clj-tagsoup clj
  • Crouton clj
  • Hickory clj cljs
  • Tupelo clj cljs
  • Webmine clj

source - https://www.clojure-toolbox.com

like image 39
Aleksei Sotnikov Avatar answered Nov 14 '22 22:11

Aleksei Sotnikov