I need to load HTML and parse it, I think that it should be something simple, I pass a string with a "HTML" it reads the string in a Dom like object, so I can search and parse the content of the HTML, facilitating scraping and things like that.
Do you guys know about any thing like that.
Thanks
HTML Agility Pack
Similar API to XmlDocument
, for example (from the examples page):
HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
{
HtmlAttribute att = link["href"];
att.Value = FixLink(att);
}
doc.Save("file.htm");
(you should also be able to use LoadHtml
to load a string of html, rather than from a path)
If you're running in-browser, you should be able to use the Html DOM Bridge, load the HTML into it, and walk the DOM Tree for that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With