I'm trying to use a Regex expression I've found in this website and it doesn't seem to work. Any ideas?
Input string:
sFetch = "123<script type=\"text/javascript\">\n\t\tfunction utmx_section(){}function utmx(){}\n\t\t(function()})();\n\t</script>456";
Regex:
sFetch = Regex.Replace(sFetch, "<script.*?>.*?</script>", "", RegexOptions.IgnoreCase);
Use Regex to remove all the HTML tags out of a string in JavaScript. Here is the code for it:- It will strip out all the html-tags.
Take the string in a variable. Anything between the less than symbol and the greater than symbol is removed from the string by the RegExp. Finally we will get the text. Example 1: This example using the approach defined above. How to remove HTML tags with RegExp in JavaScript?
Attempting to remove HTML markup using a regular expression is problematic. You don’t know what’s in there as script or attribute values. One way is to insert it as the innerHTML of a div, remove any script elements and return the innerHTML, e.g.
Here string contains a part of the document and we need to extract only the text part from it. Here we are going to do that with the help of JavaScript. Take the string in a variable. Anything between the less than symbol and the greater than symbol is removed from the string by the RegExp.
Add RegexOptions.Singleline
RegexOptions.IgnoreCase | RegexOptions.Singleline
And that will never work on follow one.
<script
>
alert(1)
</script
/**/
>
So, Find a HTML parser like HTML Agility Pack
The reason the regex fails is that your input has newlines
and the meta char .
does not match it.
To solve this you can use the RegexOptions.Singleline
option as S.Mark says, or you can change the regex to:
"<script[\d\D]*?>[\d\D]*?</script>"
which used [\d\D]
instead of .
.
\d
is any digit and \D
is any non-digit, so [\d\D]
is a digit or a non-digit which is effectively any char.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With