Can Html Agility Pack be used to parse HTML fragments?

Question

I need to get LINK and META elements from ASP.NET pages, user controls and master pages, grab their contents and then write back updated values to these files in a utility I'm working on.

I could try using regular expressions to grab just these elements but there are several issues with that approach:

I expect many of the input files to contain broken HTML (missing / out-of-sequence elements, etc.)
SCRIPT elements that contain comments and/or VBScript/JavaScript that looks like valid elements, etc.
I need to be able to special-case IE conditional comments and META and LINK elements inside IE conditional comments
Not to mention how HTML is not a regular language

I did some research for HTML parsers in .NET and many SO posts and blogs recommend the HTML Agility Pack. I've never used it before and I don't know if it can parse broken HTML and HTML fragments. (For example, imagine a user control that only contains a HEAD element with some content in it - no HTML or BODY.) I know I could read the documentation but it'd save me quite a bit of time if someone could advise. (Most SO posts involve parsing full HTML pages.)

D'Arcy Rittich · Accepted Answer

Absolutely, that is what it excels at.

In fact, many web pages you'll find in the wild could be described as HTML fragments, due to missing <html> tags, or improperly closed tags.

The HtmlAgilityPack simulates what the browser has to do - try to make sense from what is sometimes a jumble of mismatched tags. An imperfect science, but HtmlAgilgityPack does it very well.

Can Html Agility Pack be used to parse HTML fragments?

Tags:

html

c#

.net

parsing

html-agility-pack

xxbbcc

1 Answers

D'Arcy Rittich

Recent Activity

Donate For Us

Can Html Agility Pack be used to parse HTML fragments?

Tags:

html

c#

.net

parsing

html-agility-pack

xxbbcc

1 Answers

D'Arcy Rittich

Related questions

Recent Activity

Donate For Us