Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to grab elements by class or id in HTML Source in C#?

I am trying to grab elements from HTML source based on the class or id name, using C# windows forms application. I am putting the source into a string using WebClient and plugging it into the HTMLAgilityPack using HtmlDocument.

However, all the examples I find with the HTMLAgilityPack pack parse through and find items based on tags. I need to find a specific id, of say a link in the html, and retrieve the value inside of the tags. Is this possible and what would be the most efficient way to do this? Everything I am trying to parse out the ids is giving me exceptions. Thanks!

like image 943
Drew Avatar asked Oct 19 '11 15:10

Drew


1 Answers

You should be able to do this with XPath:

HtmlDocument doc = new HtmlDocument();
doc.Load(@"file.htm");

HtmlNode node = doc.DocumentNode.SelectSingleNode("//*[@id=\"my_control_id\"]");
string value = (node == null) ? "Error, id not found" : node.InnerHtml;

Quick explanation of the xpath here:

  • // means search everywhere in the path, Use SelectNodes if it will be matching multiples
  • * means match any type of node
  • [] define "Predicates" which are basically checking properties relative to this node
  • [@id=\"my_control_id\"] means find nodes that have an attribute named "id" with the value "my_control_id"

Further reference

like image 149
Thymine Avatar answered Sep 23 '22 06:09

Thymine