I would like to parse an HTML table and disaply contents using XML to LINQ in an bound listbox.
I am using HTML Agility pack and using this code.
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://www.SourceURL");
HtmlNode rateNode = doc.DocumentNode.SelectSingleNode("//div[@id='FlightInfo_FlightInfoUpdatePanel']");
string rate = rateNode.InnerText;
this.richTextBox1.Text = rate;
The HTML looks like this..
<div id="FlightInfo_FlightInfoUpdatePanel">
<table cellspacing="0" cellpadding="0"><tbody>
<tr class="">
<td class="airline"><img src="/images/airline logos/NZ.gif" title="AIR NEW ZEALAND LIMITED. " alt="AIR NEW ZEALAND LIMITED. " /></td>
<td class="flight">NZ8</td>
<td class="codeshare"> </td>
<td class="origin">San Francisco</td>
<td class="date">01 Sep</td>
<td class="time">17:15</td>
<td class="est">18:00</td>
<td class="status">DEPARTED</td>
</tr>
But it is returning this
NZ8 San Francisco01 Sep17:1518:00DEPARTEDAC6103NZ8San Francisco01 Sep17:1518:00DEPARTEDCO6754NZ8San Francisco01 Sep17:1518:00DEPARTEDLH7157NZ8San Francisco01 Sep17:1518:00DEPARTEDUA6754NZ8San Francisco01 Sep17:1518:00DEPARTEDUS5308NZ8San Francisco01 Sep17:1518:00DEPARTEDVS7408NZ8San Francisco01 Sep17:1518:00DEPARTEDEK407 Melbourne/Dubai01 Sep17:5017:50DEPARTEDEK413 Sydney/Dubai01 Sep18:0018:00DEPARTEDQF44 Sydney01
What I would like is pasrse this to XML format and then use LINQ to XML to parse the XML to a bound listbox itemsource.
I am thinking I need to use a variation of the below for each class, but would like some help.
HtmlNodeCollection cols = rows[i].SelectNodes(".//td[@class='flight']");
Instead of writing your own parsing engine, the HTML Agility Pack has everything you need to find specific DOM elements, traverse through child and parent nodes, and retrieve text and properties (e.g., HREF links) within specified elements. The first step is to install the HTML Agility Pack after you create your C# .NET project.
Html Agility Pack by default will also not include <form> and <option> tags when parsing html. Remember these differences and you will have greater success with XPath compatibility between the browser and Html Agility Pack.
HTML Parser allow you to parse HTML and return an HtmlDocument. Loads an HTML document from a file. Loads the HTML document from the specified string. Gets an HTML document from an Internet resource. Gets an HTML document from a WebBrowser.
The Agility Pack is standard for parsing HTML content in C#, because it has several methods and properties that conveniently work with the DOM.
You are using InnerText
which strips out the HTML.
Use InnerHtml
:
string rate = rateNode.InnerHtml;
You can create an XML document from this string (assuming it is valid XML).
You can also query the rateNode
in the same way you retrieved it - selecting its child nodes:
var firstRow = rateNode.SelectSingleNode("./table/tbody/tr[0]");
string origin = firstRow.SelectSingleNode("./td[@class = 'origin']");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With