I want to extract a couple of links from an html page downloaded from the internet, I think that using linq to XML would be a good solution for my case.
My problem is that I can't create an XmlDocument from the HTML, using Load(string url) didn't work so I downloaded the html to a string using:
public static string readHTML(string url)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse res = (HttpWebResponse)req.GetResponse();
StreamReader sr = new StreamReader(res.GetResponseStream());
string html = sr.ReadToEnd();
sr.Close();
return html;
}
When I try to load that string using LoadXml(string xml) I get the exception
'--' is an unexpected token. The expected token is '>'
What way should I take to read the html file to a parsable XML
Click on the URL button, Enter URL and Submit. Parsing HTML into XML supports loading the HTML File to transform to XML. Click on the Upload button and select File. HTML to Plain XML Converter Online works well on Windows, MAC, Linux, Google Chrome, Firefox, Edge, and Safari.
You can try parsing an HTML file using a XML parser, but it's likely to fail. The reason is that HTML documents can have the following HTML features that XML parsers don't understand. XML parsers will fail to parse any HTML document that uses any of those features.
HTML and XML are related to each other, where HTML displays data and describes the structure of a webpage, whereas XML stores and transfers data. HTML is a simple predefined language, while XML is a standard language that defines other languages.
An XML file is an extensible markup language file, and it is used to structure data for storage and transport. In an XML file, there are both tags and text. The tags provide the structure to the data. The text in the file that you wish to store is surrounded by these tags, which adhere to specific syntax guidelines.
HTML simply isn’t the same as XML (unless the HTML actually happens to be conforming XHTML or HTML5 in XML mode). The best way is to use a HTML parser to read the HTML. Afterwards you may transform it to Linq to XML – or process it directly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With