I would like to parse an HTML table and disaply contents using XML to LINQ in an bound listbox. I am using HTML Agility pack and using this code. <pre class="prettyprint"><code> HtmlWeb web = new HtmlWeb(); HtmlAgilityPack.HtmlDocument doc = web.Load("http://www.SourceURL"); HtmlNode rateNode = doc.DocumentNode.SelectSingleNode("//div[@id='FlightInfo_FlightInfoUpdatePanel']"); string rate = rateNode.InnerText; this.richTextBox1.Text = rate; </code></pre> The HTML looks like this.. <pre class="prettyprint"><code><div id="FlightInfo_FlightInfoUpdatePanel"> <table cellspacing="0" cellpadding="0"><tbody> <tr class=""> <td class="airline"><img src="/images/airline logos/NZ.gif" title="AIR NEW ZEALAND LIMITED. " alt="AIR NEW ZEALAND LIMITED. " /></td> <td class="flight">NZ8</td> <td class="codeshare">&nbsp;</td> <td class="origin">San Francisco</td> <td class="date">01 Sep</td> <td class="time">17:15</td> <td class="est">18:00</td> <td class="status">DEPARTED</td> </tr> </code></pre> But it is returning this <pre class="prettyprint"><code>NZ8&nbsp;San Francisco01 Sep17:1518:00DEPARTEDAC6103NZ8San Francisco01 Sep17:1518:00DEPARTEDCO6754NZ8San Francisco01 Sep17:1518:00DEPARTEDLH7157NZ8San Francisco01 Sep17:1518:00DEPARTEDUA6754NZ8San Francisco01 Sep17:1518:00DEPARTEDUS5308NZ8San Francisco01 Sep17:1518:00DEPARTEDVS7408NZ8San Francisco01 Sep17:1518:00DEPARTEDEK407&nbsp;Melbourne/Dubai01 Sep17:5017:50DEPARTEDEK413&nbsp;Sydney/Dubai01 Sep18:0018:00DEPARTEDQF44&nbsp;Sydney01 </code></pre> What I would like is pasrse this to XML format and then use LINQ to XML to parse the XML to a bound listbox itemsource. I am thinking I need to use a variation of the below for each class, but would like some help. <pre class="prettyprint"><code>HtmlNodeCollection cols = rows[i].SelectNodes(".//td[@class='flight']"); </code></pre>

You are using <code>InnerText</code> which strips out the HTML. Use <code>InnerHtml</code>: <pre class="prettyprint"><code>string rate = rateNode.InnerHtml; </code></pre> You can create an XML document from this string (assuming it is valid XML). You can also query the <code>rateNode</code> in the same way you retrieved it - selecting its child nodes: <pre class="prettyprint"><code>var firstRow = rateNode.SelectSingleNode("./table/tbody/tr[0]"); string origin = firstRow.SelectSingleNode("./td[@class = 'origin']"); </code></pre>

HTML Agility Parsing

Tags:

c#

xml

linq

html-agility-pack

I would like to parse an HTML table and disaply contents using XML to LINQ in an bound listbox.

I am using HTML Agility pack and using this code.

Click to copy

    HtmlWeb web = new HtmlWeb();
    HtmlAgilityPack.HtmlDocument doc = web.Load("http://www.SourceURL");
    HtmlNode rateNode = doc.DocumentNode.SelectSingleNode("//div[@id='FlightInfo_FlightInfoUpdatePanel']");
    string rate = rateNode.InnerText;
    this.richTextBox1.Text = rate;

The HTML looks like this..

Click to copy

<div id="FlightInfo_FlightInfoUpdatePanel">

   <table cellspacing="0" cellpadding="0"><tbody>
     <tr class="">
     <td class="airline"><img src="/images/airline logos/NZ.gif" title="AIR NEW ZEALAND LIMITED. " alt="AIR NEW ZEALAND LIMITED. " /></td>
     <td class="flight">NZ8</td>
     <td class="codeshare">&nbsp;</td>
     <td class="origin">San Francisco</td>
     <td class="date">01 Sep</td>
     <td class="time">17:15</td>
     <td class="est">18:00</td>
     <td class="status">DEPARTED</td>
     </tr>

But it is returning this

Click to copy

NZ8&nbsp;San Francisco01 Sep17:1518:00DEPARTEDAC6103NZ8San Francisco01 Sep17:1518:00DEPARTEDCO6754NZ8San Francisco01 Sep17:1518:00DEPARTEDLH7157NZ8San Francisco01 Sep17:1518:00DEPARTEDUA6754NZ8San Francisco01 Sep17:1518:00DEPARTEDUS5308NZ8San Francisco01 Sep17:1518:00DEPARTEDVS7408NZ8San Francisco01 Sep17:1518:00DEPARTEDEK407&nbsp;Melbourne/Dubai01 Sep17:5017:50DEPARTEDEK413&nbsp;Sydney/Dubai01 Sep18:0018:00DEPARTEDQF44&nbsp;Sydney01

What I would like is pasrse this to XML format and then use LINQ to XML to parse the XML to a bound listbox itemsource.

I am thinking I need to use a variation of the below for each class, but would like some help.

Click to copy

HtmlNodeCollection cols = rows[i].SelectNodes(".//td[@class='flight']");

569

asked Sep 01 '11 07:09

Rhys

1 Answers

You are using InnerText which strips out the HTML.

Use InnerHtml:

Click to copy

string rate = rateNode.InnerHtml;

You can create an XML document from this string (assuming it is valid XML).

You can also query the rateNode in the same way you retrieved it - selecting its child nodes:

Click to copy

var firstRow = rateNode.SelectSingleNode("./table/tbody/tr[0]");
string origin = firstRow.SelectSingleNode("./td[@class = 'origin']");

185

answered Oct 13 '22 00:10

Oded

Related questions
                            
                                Foreach List with unknown item type
                            
                                Use reflection to iterate types in all referenced assemblies, in Silverlight?
                            
                                set Enums using reflection
                            
                                What Event Happens After Loaded For A WPF UserControl?
                            
                                How can I change the list index of TemplateInfo.HtmlFieldPrefix?
                            
                                C# params keyword and function overloading
                            
                                Using Action<PointerClass*> as an argument
                            
                                2D array vs 1D array
                            
                                Clicking Confirm Dialog Selenium in .NET
                            
                                ASP.NET Button click redirect to new page
                            
                                How to fix "unhandled exception" when clicking on WPF DataGrid?
                            
                                Reading attribute in OnAction Executing in asp.net mvc3
                            
                                C# Static Libraries
                            
                                Rx Let function
                            
                                How to set the BackImage of a ChartArea at run time?
                            
                                MEF GetExportedValue with metadata
                            
                                What to do when a form's class becomes too large?
                            
                                Parsing non-standard date formats with DateTime.TryParseExact
                            
                                MongoDB c# driver - Can a field called Id not be Id?
                            
                                How to efficiently implement immutable types

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

HTML Agility Parsing

Tags:

c#

xml

linq

html-agility-pack

Rhys

People also ask

1 Answers

Oded

Recent Activity

Donate For Us