Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HtmlAgilityPack and selecting Nodes and Subnodes

Hope somebody can help me.

Let´s say I have a html document that contains multiple divs like this example:

<div class="search_hit">
    <span prop="name">Richard Winchester</span>
    <span prop="company">Kodak</span>
    <span prop="street">Arlington Road 1</span>
</div>
<div class="search_hit">
    <span prop="name">Ted Mosby</span>
    <span prop="company">HP</span>
    <span prop="street">Arlington Road 2</span>
</div>

I´m using HtmlAgilityPack to get the html document. What I need to know is how can I get the spans for each search_hit-div?

My first thought was something like this:

foreach (HtmlAgilityPack.HtmlNode node in
    doc.DocumentNode.SelectNodes("//div[@class='search_hit']"))
{
     foreach (HtmlAgilityPack.HtmlNode node2 in node.SelectNodes("//span[@prop]"))
     {
     }
}

Each div should be an object with the included spans as properties:

public class Record
{
    public string Name { get; set; }
    public string company { get; set; }
    public string street { get; set; }
}

And this List shall be filled then:

public List<Record> Results = new List<Record>();

But the XPATH I'm using is not doing a search in the sub node as it should do. It seams that it searches the whole document again and again.

I mean I already got it working in that way that I just get the the spans of the whole page, but then I have no relation between the spans and divs. Means, I don´t know anymore which span is related to which div.

Does somebody know a solution? I already played around that much that I'm totally confused now. :)

Any help is appreciated!

like image 575
The Jack Avatar asked Feb 21 '13 13:02

The Jack


3 Answers

If you use //, it searches from the document begin.

Use .// to search all from the current node

 foreach (HtmlAgilityPack.HtmlNode node2 in node.SelectNodes(".//span[@prop]"))

Or drop the prefix entirely to search just for direct children:

 foreach (HtmlAgilityPack.HtmlNode node2 in node.SelectNodes("span[@prop]"))
like image 59
BeniBela Avatar answered Nov 02 '22 00:11

BeniBela


The following works for me. The important bit is just as BeniBela noted to add a dot in second call to 'SelectNodes'.

List<Record> lstRecords=new List<Record>();
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[@class='search_hit']"))
{
  Record record=new Record();
  foreach (HtmlNode node2 in node.SelectNodes(".//span[@prop]"))
  {
    string attributeValue = node2.GetAttributeValue("prop", "");
    if (attributeValue == "name")
    {
      record.Name = node2.InnerText;
    }
    else if (attributeValue == "company")
    {
      record.company = node2.InnerText;
    }
    else if (attributeValue == "street")
    {
      record.street = node2.InnerText;
    }
  }
  lstRecords.Add(record);
}
like image 41
shriek Avatar answered Nov 02 '22 02:11

shriek


First of all, take a look at this: Html Agility Pack - Problem selecting subnode

Here is a full working solution for your question:

IList<Record> results = new List<Record>();
foreach (var node in doc.DocumentNode.SelectNodes("//div[@class='search_hit']")) {
    var record = new Record();
    record.Name = node.SelectSingleNode(".//span[@prop='name']").InnerText;
    record.company = node.SelectSingleNode(".//span[@prop='company']").InnerText;
    record.street = node.SelectSingleNode(".//span[@prop='street']").InnerText;
    results.Add(record);
}

If you read the question I pointed you to, you will see that doing ./span[@prop='name'] is exactly the same, since those span nodes are (direct) children of the div node.


If the span nodes do not have those prop attributes, and you want to assign them depending on the order they appear, you can do:

foreach (var node in doc.DocumentNode.SelectNodes("//div[@class='search_hit']")) {
    var spanNodes = node.SelectNodes("./span");
    var record = new Record();
    record.Name = spanNodes[0].InnerText;
    record.company = spanNodes[1].InnerText;
    record.street = spanNodes[2].InnerText;
    results.Add(record);
}
like image 3
Oscar Mederos Avatar answered Nov 02 '22 02:11

Oscar Mederos