Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Options for web scraping - C++ version only

I'm looking for a good C++ library for web scraping.
It has to be C/C++ and nothing else so please do not direct me to Options for HTML scraping or other SO questions/answers where C++ is not even mentioned.

like image 852
Piotr Dobrogost Avatar asked May 07 '09 13:05

Piotr Dobrogost


2 Answers

  • libcurl to download the html file
  • libtidy to convert to valid xml
  • libxml to parse/navigate the xml
like image 192
Kyle Simek Avatar answered Sep 17 '22 18:09

Kyle Simek


Use myhtml C/C++ parser here; dead simple, very fast. No dependencies except C99. And has CSS selectors built in (example here)

like image 22
Halcyon Avatar answered Sep 17 '22 18:09

Halcyon