Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get a value of an attribute by XPath and HtmlAgilityPack

I have a HTML document and I parse it with XPath. I want to get a value of the element input, but it didn't work.

My Html:

<tbody>
  <tr>
    <td>
      <input type="text" name="item" value="10743" readonly="readonly" size="10"/>
    </td>
  </tr>
</tbody>

My code:

using HtmlAgilityPack;

HtmlAgilityPack.HtmlDocument doc; 
HtmlWeb hw = new HtmlWeb();
HtmlNodeCollection node = doc.DocumentNode.SelectNodes("//input/@value");
string s=node[0].InnerText;

So I want to get the value: "10743" (and I don't mind to get another tags with the answer.)

like image 322
Chani Poz Avatar asked Dec 29 '11 10:12

Chani Poz


People also ask

How to get attribute value in HtmlAgilityPack?

HtmlWeb web = new HtmlWeb(); HtmlAgilityPack. HtmlDocument htmldoc = web. Load(Url); htmldoc. OptionFixNestedTags = true; var navigator = (HtmlNodeNavigator)htmldoc.

What is attribute and value in XPath?

XPath Tutorial from basic to advance level. This attribute can be easily retrieved and checked by using the @attribute-name of the element. @name − get the value of attribute "name". <td><xsl:value-of select = "@rollno"/></td> Attribute can be used to compared using operators.

What is attribute in XPath?

Definition of XPath attribute. For finding an XPath node in an XML document, use the XPath Attribute expression location path. We can use XPath to generate attribute expressions to locate nodes in an XML document.


2 Answers

you can get it in .Attributes collection:

var doc = new HtmlAgilityPack.HtmlDocument();
doc.Load("file.html");
var node = doc.DocumentNode.SelectNodes("//input") [0];
var val = node.Attributes["value"].Value; //10743
like image 190
Kakashi Avatar answered Oct 07 '22 06:10

Kakashi


Update2: Here is a code example how to get values of attributes using Html Agility Pack:

http://htmlagilitypack.codeplex.com/wikipage?title=Examples

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link.Attributes["href"];
    att.Value = FixLink(att);
 }
 doc.Save("file.htm");

You obviously need to adapt this code to your needs -- for example you will not modify the attributes, but will just use att.Value .


Update: You may also look at this question:

Selecting attribute values with html Agility Pack


Your problem is most likely a default namespace problem -- search for "XPath default namespace c#" and you will find many good solutions (hint: use the overload of SelectNodes() that has an XmlNamespaceManager argument).

The following code shows what one gets for an attribute in a document in "no namespace":

using System;
using System.IO;
using System.Xml;

public class Sample
{

    public static void Main()
    {

        XmlDocument doc = new XmlDocument();
        doc.LoadXml("<input value='novel' ISBN='1-861001-57-5'>" +
                    "<title>Pride And Prejudice</title>" +
                    "</input>");

        XmlNode root = doc.DocumentElement;

        XmlNode value = doc.SelectNodes("//input/@value")[0];

        Console.WriteLine("Inner text: " + value.InnerText);
        Console.WriteLine("InnerXml: " + value.InnerXml);
        Console.WriteLine("OuterXml: " + value.OuterXml);
        Console.WriteLine("Value: " + value.Value);

    }
}

The result from running this app is:

Inner text: novel
InnerXml: novel
OuterXml: value="novel"
Value: novel

Now, for a document that is in a default namespace:

using System;
using System.IO;
using System.Xml;

public class Sample
{

    public static void Main()
    {

        XmlDocument doc = new XmlDocument();
        doc.LoadXml("<input xmlns='some:Namespace' value='novel' ISBN='1-861001-57-5'>" +
                    "<title>Pride And Prejudice</title>" +
                    "</input>");

        XmlNode root = doc.DocumentElement;

        XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
        nsmgr.AddNamespace("x", "some:Namespace");

        XmlNode value = doc.SelectNodes("//x:input/@value", nsmgr)[0];

        Console.WriteLine("Inner text: " + value.InnerText);
        Console.WriteLine("InnerXml: " + value.InnerXml);
        Console.WriteLine("OuterXml: " + value.OuterXml);
        Console.WriteLine("Value: " + value.Value);

    }
}

Running this app produces again the wanted results:

Inner text: novel
InnerXml: novel
OuterXml: value="novel"
Value: novel
like image 36
Dimitre Novatchev Avatar answered Oct 07 '22 06:10

Dimitre Novatchev