Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex Find Text Between Tags C#

Tags:

c#

regex

I want to strip the html tags and only return the text between the tags. Here is what I'm currently using.

string regularExpressionPattern1 = @"<td(.*?)<\/td>";
Regex regex = new Regex(regularExpressionPattern1, RegexOptions.Singleline);
MatchCollection collection = regex.Matches(value.ToString());

I currently get <td>13</td>, and I just want 13.

Thanks,

like image 307
Trey Balut Avatar asked Nov 19 '13 18:11

Trey Balut


3 Answers

You need to get value of group not of the match. Try this

Match m = collection[0];
var stripped = m.Groups[1].Value;
like image 180
Yevgeniy.Chernobrivets Avatar answered Nov 20 '22 11:11

Yevgeniy.Chernobrivets


You can use look-behind ?<= and look-ahead ?= like this:

(?<=<td>)(.*?)(?=<\/td>)

That should give you just the text between the tags. More info on Regex and look-ahead/look-behind can be found Here.

Also, a good Regex tester can be found Here. I use it to test all my Regex strings when I'm writing them.

like image 44
Mike Webb Avatar answered Nov 20 '22 09:11

Mike Webb


So, using the HTML AgilityPack, this would be really easy...

 HtmlDocument  doc = doc.LoadHtml(value);
 var nodes = doc.DocumentNode.SelectNodes("//td//text()");

Puts the TextNodes in the nodes variable.

like image 22
jessehouwing Avatar answered Nov 20 '22 11:11

jessehouwing