I want to strip the html tags and only return the text between the tags. Here is what I'm currently using.
string regularExpressionPattern1 = @"<td(.*?)<\/td>";
Regex regex = new Regex(regularExpressionPattern1, RegexOptions.Singleline);
MatchCollection collection = regex.Matches(value.ToString());
I currently get <td>13</td>
, and I just want 13
.
Thanks,
You need to get value of group not of the match. Try this
Match m = collection[0];
var stripped = m.Groups[1].Value;
You can use look-behind ?<=
and look-ahead ?=
like this:
(?<=<td>)(.*?)(?=<\/td>)
That should give you just the text between the tags. More info on Regex and look-ahead/look-behind can be found Here.
Also, a good Regex tester can be found Here. I use it to test all my Regex strings when I'm writing them.
So, using the HTML AgilityPack, this would be really easy...
HtmlDocument doc = doc.LoadHtml(value);
var nodes = doc.DocumentNode.SelectNodes("//td//text()");
Puts the TextNodes in the nodes variable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With