Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I use Html Agility Pack To Parse HTML Fragment?

Can Html Agility Pack be used to parse an html string fragment?

Such As:

var fragment = "<b>Some code </b>";

Then extract all <b> tags? All the examples I seen so far have been loading like html documents.

like image 660
chobo2 Avatar asked Mar 29 '10 05:03

chobo2


People also ask

What is HTML agility pack?

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a . NET code library that allows you to parse "out of the web" HTML files.

How do you parse an element in HTML?

By using DOMParser you can easily parse the HTML document. Usually, you have to resort to trick the browser into parsing it for you, for instance by adding a new element to the current document. domParser = new DOMParser(); doc = domParser.

Can we parse HTML?

Parsing means analyzing and converting a program into an internal format that a runtime environment can actually run, for example the JavaScript engine inside browsers. The browser parses HTML into a DOM tree. HTML parsing involves tokenization and tree construction.


2 Answers

If it's html then yes.

string str = "<b>Some code</b>";
// not sure if needed
string html = string.Format("<html><head></head><body>{0}</body></html>", str);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);

// look xpath tutorials for how to select elements
// select 1st <b> element
HtmlNode bNode = doc.DocumentNode.SelectSingleNode("b[1]");
string boldText = bNode.InnerText;
like image 160
Mike Koder Avatar answered Oct 20 '22 08:10

Mike Koder


I dont think this is really the best use of HtmlAgilityPack.

Normally I see people trying to parse large amounts of html using regular expressions and I point them towards HtmlAgilityPack but in this case I think it would be better to use a regex.

Roy Osherove has a blog post describing how you can strip out all the html from a snippet:

  • http://weblogs.asp.net/rosherove/archive/2003/05/13/6963.aspx

Even if you did get the correct xpath with Mika Kolari's sample this would only work for a snippet with a <b> tag in it and would break if the code changed.

like image 30
rtpHarry Avatar answered Oct 20 '22 08:10

rtpHarry