I would like to write an OCaml function which takes a URL and returns a string made up of the contents of the HTML file at that location. Any ideas?
Thanks a lot!
Best, Surikator.
I've done both of those things using ocurl and nethtml
ocurl to read the contents of the URL (tons of properties here; this is the minimum),
let string_of_uri uri =
try let connection = Curl.init () and write_buff = Buffer.create 1763 in
Curl.set_writefunction connection
(fun x -> Buffer.add_string write_buff x; String.length x);
Curl.set_url connection uri;
Curl.perform connection;
Curl.global_cleanup ();
Buffer.contents write_buff;
with _ -> raise (IO_ERROR uri)
and from nethtml; (you might need to set up a DTD for Nethtml.parse
)
let parse_html_string uri =
let ch = new Netchannels.input_string (string_of_uri uri) in
let docs = Nethtml.parse ?return_pis:(Some false) ch in
ch # close_in ();
docs
Cheers!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With