Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Html string reader

I need to load HTML and parse it, I think that it should be something simple, I pass a string with a "HTML" it reads the string in a Dom like object, so I can search and parse the content of the HTML, facilitating scraping and things like that.

Do you guys know about any thing like that.

Thanks

like image 706
Oakcool Avatar asked Dec 07 '22 03:12

Oakcool


2 Answers

HTML Agility Pack

Similar API to XmlDocument, for example (from the examples page):

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
 }
 doc.Save("file.htm");

(you should also be able to use LoadHtml to load a string of html, rather than from a path)

like image 172
Marc Gravell Avatar answered Dec 28 '22 02:12

Marc Gravell


If you're running in-browser, you should be able to use the Html DOM Bridge, load the HTML into it, and walk the DOM Tree for that.

like image 23
JustinAngel Avatar answered Dec 28 '22 01:12

JustinAngel