Parsing HTML with OCaml

Question

I'm looking for a library to parse HTML files in OCaml. Basically the equivalent of Jsoup/Beautiful Soup. The main requirement is being able to query the DOM with CSS selectors. Something in the form of

page.fetch("http://www.url.com")
page.find("#tag")

antron · Accepted Answer

I had a need for something like this recently, so after seeing this question and reading the recommendations in the comments, I wrote a library "Lambda Soup" over the weekend for fun.

You will want to use a library like ocurl or Cohttp to retrieve the actual HTML. After you have it, you can do

html |> parse $ "#tag"

to do what is asked in the question. For other possibilities and the full signature, see the documentation. You may want to look at the documentation postprocessor or tests for a fairly thorough demonstration of usage and capabilities, including CSS support and extensions.

~~Per comments, Lambda Soup uses Ocamlnet's HTML parser.~~ Lambda Soup uses Markup.ml. Otherwise, it has no dependencies, except OUnit if you wish to run the tests. I'm happy for any feedback, including about modifying the interface (it is at an early stage) or discussions of adding an HTTP downloader to the library (which seems iffy because it greatly alters the scope of the library as it now is, but I am happy to hear arguments).

The license is BSD.

Parsing HTML with OCaml

Tags:

html

ocaml

gidim

1 Answers

antron

Recent Activity

Donate For Us

Parsing HTML with OCaml

Tags:

html

ocaml

gidim

1 Answers

antron

Related questions

Recent Activity

Donate For Us