Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HtmlAgilityPack Get all links inside a DIV

I want to be able to get 2 links from inside a div.

Currently I can select one but whene there's more it doesn't seem to work.

HtmlWeb web = new HtmlWeb();
        HtmlDocument doc = web.Load(url);

HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='myclass']");

            if (node != null)
            {
                foreach (HtmlNode type in node.SelectNodes("//a@href"))
                {
                    recipe.type += type.InnerText;
                }
            }
            else
                recipe.type = "Error fetching type.";

Trying to get it from this piece of HTML:

<div class="myclass">
<h3>Not Relevant Header</h3>
    <a href="#">This text</a>, 
    <a href="#">and this text</a>
</div>

Any help is appreciated, Thanks in advance.

like image 881
Deejdd Avatar asked Dec 15 '12 21:12

Deejdd


3 Answers

var div = doc.DocumentNode.SelectSingleNode("//div[@class='myclass']");
if(div!=null)
{
     var links = div.Descendants("a")
                    .Select(a => a.InnerText)
                    .ToList();
}
like image 174
L.B Avatar answered Nov 06 '22 11:11

L.B


Use this XPath:

//div[@class = 'myclass']//a

It grabs all descendant a elements in div with class = 'myclass'.

And //a@href is incorrect XPath.

like image 35
Kirill Polishchuk Avatar answered Nov 06 '22 13:11

Kirill Polishchuk


Use:

//div[contains(concat(' ', @class, ' '), ' myclass ')]//a

This selects any a element that is a descendant of any div whose class attribute contains a classname of "myclass".

The classname may be single, or the attribute may also contain other classnames. In this case the classname may be the starting one, or the last one or may be surrounded by other classnames -- the above XPath expression correctly selects the wanted nodes in all of these different cases.

like image 36
Dimitre Novatchev Avatar answered Nov 06 '22 13:11

Dimitre Novatchev