I need to get LINK
and META
elements from ASP.NET pages, user controls and master pages, grab their contents and then write back updated values to these files in a utility I'm working on.
I could try using regular expressions to grab just these elements but there are several issues with that approach:
SCRIPT
elements that contain comments and/or VBScript/JavaScript that looks like valid elements, etc.META
and LINK
elements inside IE conditional commentsI did some research for HTML parsers in .NET and many SO posts and blogs recommend the HTML Agility Pack. I've never used it before and I don't know if it can parse broken HTML and HTML fragments. (For example, imagine a user control that only contains a HEAD
element with some content in it - no HTML
or BODY
.) I know I could read the documentation but it'd save me quite a bit of time if someone could advise. (Most SO posts involve parsing full HTML pages.)
Absolutely, that is what it excels at.
In fact, many web pages you'll find in the wild could be described as HTML fragments, due to missing <html>
tags, or improperly closed tags.
The HtmlAgilityPack simulates what the browser has to do - try to make sense from what is sometimes a jumble of mismatched tags. An imperfect science, but HtmlAgilgityPack does it very well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With