Can Html Agility Pack be used to parse an html string fragment?
Such As:
var fragment = "<b>Some code </b>";
Then extract all <b>
tags? All the examples I seen so far have been loading like html documents.
Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a . NET code library that allows you to parse "out of the web" HTML files.
By using DOMParser you can easily parse the HTML document. Usually, you have to resort to trick the browser into parsing it for you, for instance by adding a new element to the current document. domParser = new DOMParser(); doc = domParser.
Parsing means analyzing and converting a program into an internal format that a runtime environment can actually run, for example the JavaScript engine inside browsers. The browser parses HTML into a DOM tree. HTML parsing involves tokenization and tree construction.
If it's html then yes.
string str = "<b>Some code</b>";
// not sure if needed
string html = string.Format("<html><head></head><body>{0}</body></html>", str);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
// look xpath tutorials for how to select elements
// select 1st <b> element
HtmlNode bNode = doc.DocumentNode.SelectSingleNode("b[1]");
string boldText = bNode.InnerText;
I dont think this is really the best use of HtmlAgilityPack.
Normally I see people trying to parse large amounts of html using regular expressions and I point them towards HtmlAgilityPack but in this case I think it would be better to use a regex.
Roy Osherove has a blog post describing how you can strip out all the html from a snippet:
Even if you did get the correct xpath with Mika Kolari's sample this would only work for a snippet with a <b> tag in it and would break if the code changed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With