I have a program I'm writing that is supposed to strip html tags out of a string. I've been trying to replace all strings that start with "<" and end with ">". This (obviously because I'm here asking this) has not worked so far. Here's what I've tried:
StrippedContent = Regex.Replace(StrippedContent, "\<.*\>", "")
That just returns what seems like a random part of the original string. I've also tried
For Each StringMatch As Match In Regex.Matches(StrippedContent, "\<.*\>")
StrippedContent = StrippedContent.Replace(StringMatch.Value, "")
Next
Which did the same thing (returns what seems like a random part of the original string). Is there a better way to do this? By better I mean a way that works.
This expression will:
Regex: <(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>
Replace with: nothing
Sample Text
Note the difficult edge case in the mouse over function
these are <a onmouseover=' href="NotYourHref" ; if (6/a>3) { funRotator(href) } ; ' href=abc.aspx?filter=3&prefix=&num=11&suffix=>the droids</a> you are looking for.
Code
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim sourcestring as String = "replace with your source string"
Dim replacementstring as String = ""
Dim matchpattern as String = "<(?:[^>=]|='[^']*'|=""[^""]*""|=[^'""][^\s>]*)*>"
Console.Writeline(regex.Replace(sourcestring,matchpattern,replacementstring,RegexOptions.IgnoreCase OR RegexOptions.IgnorePatternWhitespace OR RegexOptions.Multiline OR RegexOptions.Singleline))
End Sub
End Module
String after replacement
these are the droids you are looking for.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With