Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GetElementsByTagName in Htmlagilitypack

How do I select an element for e.g. textbox if I don't know its id?

If I know its id then I can simply write:

HtmlAgilityPack.HtmlNode node = doc.GetElementbyId(id);

But I don't know textbox's ID and I can't find GetElementsByTagName method in HtmlagilityPack which is available in webbrowser control. In web browser control I could have simply written:

HtmlElementCollection elements = browser[i].Document.GetElementsByTagName("form");
foreach (HtmlElement currentElement in elements)
{

}

EDIT

Here is the HTML form I am talking about

<form id="searchform" method="get" action="/test.php">
<input name="sometext" type="text">
</form>

Please note I don't know the ID of form. And there can be several forms on same page. The only thing I know is "sometext" and I want to get this element using just this name. So I guess I will have to parse all forms one by one and then find this name "sometext" but how do I do that?

like image 655
Ali Avatar asked Apr 21 '12 15:04

Ali


2 Answers

If you're looking for the tag by its tagName (such as form for <form name="someForm">), then you can use:

var forms = document.DocumentNode.Descendants("form");

If you're looking for the tag by its name property (such as someForm for <form name="someForm">, then you can use:

var forms = document.DocumentNode.Descendants().Where(node => node.Name == "formName");

For the last one you could create a simple extension method:

public static class HtmlNodeExtensions
{
    public static IEnumerable<HtmlNode> GetElementsByName(this HtmlNode parent, string name)
    {
        return parent.Descendants().Where(node => node.Name == name);
    }

    public static IEnumerable<HtmlNode> GetElementsByTagName(this HtmlNode parent, string name)
    {
        return parent.Descendants(name);
    }
}

Note: You can also use SelectNodes and XPath to query your document:

var nodes = doc.DocumentNode.SelectNodes("//form//input");

Would give you all inputs on the page that are in a form tag.

var nodes = doc.DocumentNode.SelectNodes("//form[1]//input");

Would give you all the inputs of the first form on the page

like image 61
jessehouwing Avatar answered Nov 07 '22 10:11

jessehouwing


I think you are looking for something like this

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml("....");

var inputs = doc.DocumentNode.Descendants("input")
    .Where(n => n.Attributes["name"]!=null && n.Attributes["name"].Value == "sometext")
    .ToArray();
like image 6
L.B Avatar answered Nov 07 '22 10:11

L.B