Using C# I would like to know how to get the Textbox value (i.e: john) from this sample html script :
<TD class=texte width="50%"> <DIV align=right>Name :<B> </B></DIV></TD> <TD width="50%"><INPUT class=box value=John maxLength=16 size=16 name=user_name> </TD> <TR vAlign=center>
If you just want to parse HTML and your HTML is intended for the body of your document, you could do the following : (1) var div=document. createElement("DIV"); (2) div. innerHTML = markup; (3) result = div. childNodes; --- This gives you a collection of childnodes and should work not just in IE8 but even in IE6-7.
For users who are unafamiliar with “HTML Agility Pack“, this is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT. In simple words, it is a . NET code library that allows you to parse “out of the web” files (be it HTML, PHP or aspx).
Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a . NET code library that allows you to parse "out of the web" HTML files.
HTML is a markup language with a simple structure. It would be quite easy to build a parser for HTML with a parser generator. Actually, you may not need even to do that, if you choose a popular parser generator, like ANTLR. That is because there are already available grammars ready to be used.
There are a number of ways to select elements using the agility pack.
Let's assume we have defined our HtmlDocument
as follows:
string html = @"<TD class=texte width=""50%""> <DIV align=right>Name :<B> </B></DIV></TD> <TD width=""50%""> <INPUT class=box value=John maxLength=16 size=16 name=user_name> </TD> <TR vAlign=center>"; HtmlDocument htmlDoc = new HtmlDocument(); htmlDoc.LoadHtml(html);
1. Simple LINQ
We could use the Descendants()
method, passing the name of an element we are in search of:
var inputs = htmlDoc.DocumentNode.Descendants("input"); foreach (var input in inputs) { Console.WriteLine(input.Attributes["value"].Value); // John }
2. More advanced LINQ
We could narrow that down by using fancier LINQ:
var inputs = from input in htmlDoc.DocumentNode.Descendants("input") where input.Attributes["class"].Value == "box" select input; foreach (var input in inputs) { Console.WriteLine(input.Attributes["value"].Value); // John }
3. XPath
Or we could use XPath.
string name = htmlDoc.DocumentNode .SelectSingleNode("//td/input") .Attributes["value"].Value; Console.WriteLine(name); //John
HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(html); XPathNavigator docNav = doc.CreateNavigator(); XPathNavigator node = docNav.SelectSingleNode("//td/input/@value"); if (node != null) { Console.WriteLine("result: " + node.Value); }
I wrote this pretty quickly, so you'll want to do some testing with more data.
NOTE: The XPath strings apparently have to be in lower-case.
EDIT: Apparently the beta now supports Linq to Objects directly, so there's probably no need for the converter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With