im using HTML Agility Pack to fetch URLs from w webpage. The URL is:
http://goo.gl/DqfQl
If i use the code below i get the links i want:
String html = getHtml("http://goo.gl/DqfQl");
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
HtmlNodeCollection address_rows = doc.DocumentNode.SelectNodes("//div[@class='name']/a");
foreach (HtmlNode row in address_rows)
{
MessageBox.Show(row.GetAttributeValue("href",LINK_NOT_FOUND));
}
But when i change the HtmlNodeCollection
to fetch the containg div
with class="row'
and the want to fetch the URL i get always the first URL.
HtmlNodeCollection address_rows = doc.DocumentNode.SelectNodes("//div[@class='row']");
foreach (HtmlNode element in address_rows) {
MessageBox.Show(element.SelectSingleNode("//div[@class='name']/a").GetAttributeValue("href",LINK_NOT_FOUND));
}
I played a little with this code and for a while i thought i worked. But now i cant using the second code snippet select all the URLs i want. Can you help?
You have to add a dot "." to the XPath, otherwise it wil match from the beginning of the Document and not inside the node.
Just change your second string to ".//div[@class='name']/a"
and it should work
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With