Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting data from HTML table into a datatable

Ok so I need to query a live website to get data from a table, put this HTML table into a DataTable and then use this data. I have so far managed to use Html Agility Pack and XPath to get to each row in the table I need but I know there must be a way to parse it into a DataTable. (C#) The code I am currently using is:

string htmlCode = "";
using (WebClient client = new WebClient())
{
htmlCode = client.DownloadString("http://www.website.com");
}
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

doc.LoadHtml(htmlCode);

//My attempt at LINQ to solve the issue (not sure where to go from here)
var myTable = doc.DocumentNode
.Descendants("table")
.Where(t =>t.Attributes["summary"].Value == "Table One")
.FirstOrDefault();

//Finds all the odd rows (which are the ones I actually need but would prefer a
//DataTable containing all the rows!
foreach (HtmlNode cell in doc.DocumentNode.SelectNodes("//tr[@class='odd']/td"))
{
string test = cell.InnerText;
//Have not gone further than this yet!
}

The HTML table on the website I am querying looks like this:

<table summary="Table One">
<tbody>
<tr class="odd">
<td>Some Text</td>
<td>Some Value</td>
</tr>
<tr class="even">
<td>Some Text1</td>
<td>Some Value1</td>
</tr>
<tr class="odd">
<td>Some Text2</td>
<td>Some Value2</td>
</tr>
<tr class="even">
<td>Some Text3</td>
<td>Some Value3</td>
</tr>
<tr class="odd">
<td>Some Text4</td>
<td>Some Value4</td>
</tr>
</tbody>
</table>

I'm not sure whether it is better/easier to use LINQ + HAP or XPath + HAP to get the desired result, I tried both with limited success as you can probably see. This is the first time I have ever made a program to query a website or even interact with a website in any way so I am very unsure at the moment! Thanks for any help in advance :)

like image 348
Jack Eker Avatar asked May 09 '12 09:05

Jack Eker


People also ask

What is HTML DataTable?

DataTables is a powerful jQuery plugin which can be used to create HTML tables with functionality like searching, sorting and pagination. It can use almost any data source like DOM, Ajax, and server-side processing. It is mobile friendly library which can adapt to the viewport size.

Can we use DataTable with Div?

No you will not be able to do this...


2 Answers

There's no such method out of the box from the HTML Agility Pack, but it shouldn't be too hard to create one. There's samples out there that do XML to Datatable from Linq-to-XML. These can be re-worked into what you need.

If needed I can help out creating the whole method, but not today :).

See also:

  • HTML Agility pack - parsing tables
  • parsing html with HTMLAGILITYPACK and loading into datatable C#
like image 41
jessehouwing Avatar answered Sep 29 '22 21:09

jessehouwing


Using some of Jack Eker's code above and some code from Mark Gravell (see post here) , I managed to come with a solution. This code snippet is used to obtain the public holidays for the year of 2012 in South Africa as of writing this article

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Web;
using System.Net;
using HtmlAgilityPack;



namespace WindowsFormsApplication
{
    public partial class Form1 : Form
    {
        private DataTable dt;
        public Form1()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, EventArgs e)
        {

            string htmlCode = "";
            using (WebClient client = new WebClient())
            {
                client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
                htmlCode = client.DownloadString("http://www.info.gov.za/aboutsa/holidays.htm");
            }
            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

            doc.LoadHtml(htmlCode);

            dt = new DataTable();
            dt.Columns.Add("Name", typeof(string));
            dt.Columns.Add("Value", typeof(string));

            int count = 0;


            foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table"))
            {

                foreach (HtmlNode row in table.SelectNodes("tr"))
                {

                    if (table.Id == "table2")
                    {
                        DataRow dr = dt.NewRow();

                        foreach (var cell in row.SelectNodes("td"))
                        {
                            if ((count % 2 == 0))
                            {
                                dr["Name"] = cell.InnerText.Replace("&nbsp;", " ");
                            }
                            else
                            {

                                dr["Value"] = cell.InnerText.Replace("&nbsp;", " ");

                                dt.Rows.Add(dr);
                            }
                            count++;

                        }


                    }

                }


                dataGridView1.DataSource = dt;

            }
        }

    }
}
like image 136
abidmix Avatar answered Sep 29 '22 22:09

abidmix