Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get the value of an HTML element

Tags:

c#

regex

I have the HTML code of a webpage in a text file. I'd like my program to return the value that is in a tag. E.g. I want to get "Julius" out of

<span class="hidden first">Julius</span>

Do I need regular expression for this? Otherwise what is a string function that can do it?

like image 605
disasterkid Avatar asked Nov 05 '12 14:11

disasterkid


1 Answers

You should be using an html parser like htmlagilitypack .Regex is not a good choice for parsing HTML files as HTML is not strict nor is it regular with its format.

You can use below code to retrieve it using HtmlAgilityPack

HtmlDocument doc = new HtmlDocument();
doc.Load(yourStream);

var itemList = doc.DocumentNode.SelectNodes("//span[@class='hidden first']")//this xpath selects all span tag having its class as hidden first
                  .Select(p => p.InnerText)
                  .ToList();

//itemList now contain all the span tags content having its class as hidden first
like image 56
Anirudha Avatar answered Oct 01 '22 04:10

Anirudha