Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse malformed XML

I'm trying to load a piece of (possibly) malformed HTML into an XMLDocument object, but it fails with XMLExceptions... since there are extra opening/closing tags, and malformed XML tags such as <img > instead of <img />

How do I get the XML to parse with all the errors in the data? Is there any XML validator that I can apply before parsing, to correct these errors? Or would handling the exception parse whatever can be parsed?

like image 389
Robin Rodricks Avatar asked Jun 15 '09 14:06

Robin Rodricks


2 Answers

The HTML Agility Pack will parse html, rather than xhtml, and is quite forgiving. The object model will be familiar if you've used XmlDocument.

like image 159
Marc Gravell Avatar answered Sep 25 '22 11:09

Marc Gravell


You might want to check out the answer to this question.

Basically somewhere between a .NET port of beautifulsoup and the HTML agility pack there is a way.

like image 45
annakata Avatar answered Sep 22 '22 11:09

annakata