Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# HTML Agility Pack (not/wrong) iterating over node collection

im using HTML Agility Pack to fetch URLs from w webpage. The URL is:

http://goo.gl/DqfQl

If i use the code below i get the links i want:

String html = getHtml("http://goo.gl/DqfQl");

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

doc.LoadHtml(html);

HtmlNodeCollection address_rows = doc.DocumentNode.SelectNodes("//div[@class='name']/a"); 

foreach (HtmlNode row in address_rows)
{
    MessageBox.Show(row.GetAttributeValue("href",LINK_NOT_FOUND));
}

But when i change the HtmlNodeCollection to fetch the containg div with class="row' and the want to fetch the URL i get always the first URL.

HtmlNodeCollection address_rows = doc.DocumentNode.SelectNodes("//div[@class='row']"); 

foreach (HtmlNode element in address_rows) {
    MessageBox.Show(element.SelectSingleNode("//div[@class='name']/a").GetAttributeValue("href",LINK_NOT_FOUND));
}   

I played a little with this code and for a while i thought i worked. But now i cant using the second code snippet select all the URLs i want. Can you help?

like image 381
Robert Niestroj Avatar asked Dec 05 '22 15:12

Robert Niestroj


1 Answers

You have to add a dot "." to the XPath, otherwise it wil match from the beginning of the Document and not inside the node.

Just change your second string to ".//div[@class='name']/a" and it should work

like image 115
shriek Avatar answered Dec 10 '22 10:12

shriek