I'm trying to load a piece of (possibly) malformed HTML into an XMLDocument object, but it fails with XMLExceptions... since there are extra opening/closing tags, and malformed XML tags such as <img >
instead of <img />
How do I get the XML to parse with all the errors in the data? Is there any XML validator that I can apply before parsing, to correct these errors? Or would handling the exception parse whatever can be parsed?
The HTML Agility Pack will parse html, rather than xhtml, and is quite forgiving. The object model will be familiar if you've used XmlDocument
.
You might want to check out the answer to this question.
Basically somewhere between a .NET port of beautifulsoup and the HTML agility pack there is a way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With