Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Regex to remove script tags

Tags:

c#

regex

I'm trying to use a Regex expression I've found in this website and it doesn't seem to work. Any ideas?

Input string:

sFetch = "123<script type=\"text/javascript\">\n\t\tfunction utmx_section(){}function utmx(){}\n\t\t(function()})();\n\t</script>456";

Regex:

sFetch = Regex.Replace(sFetch, "<script.*?>.*?</script>", "", RegexOptions.IgnoreCase);
like image 647
amitre Avatar asked Mar 24 '10 07:03

amitre


People also ask

How to remove all HTML tags out of a string in JavaScript?

Use Regex to remove all the HTML tags out of a string in JavaScript. Here is the code for it:- It will strip out all the html-tags.

How to remove text from string using regexp in JavaScript?

Take the string in a variable. Anything between the less than symbol and the greater than symbol is removed from the string by the RegExp. Finally we will get the text. Example 1: This example using the approach defined above. How to remove HTML tags with RegExp in JavaScript?

How do I remove HTML markup from a regular expression?

Attempting to remove HTML markup using a regular expression is problematic. You don’t know what’s in there as script or attribute values. One way is to insert it as the innerHTML of a div, remove any script elements and return the innerHTML, e.g.

How to extract only the text part of a string using JavaScript?

Here string contains a part of the document and we need to extract only the text part from it. Here we are going to do that with the help of JavaScript. Take the string in a variable. Anything between the less than symbol and the greater than symbol is removed from the string by the RegExp.


2 Answers

Add RegexOptions.Singleline

RegexOptions.IgnoreCase | RegexOptions.Singleline

And that will never work on follow one.

<script
>
alert(1)
</script
/**/
>

So, Find a HTML parser like HTML Agility Pack

like image 124
YOU Avatar answered Oct 04 '22 13:10

YOU


The reason the regex fails is that your input has newlines and the meta char . does not match it.

To solve this you can use the RegexOptions.Singleline option as S.Mark says, or you can change the regex to:

"<script[\d\D]*?>[\d\D]*?</script>"

which used [\d\D] instead of ..

\d is any digit and \D is any non-digit, so [\d\D] is a digit or a non-digit which is effectively any char.

like image 27
codaddict Avatar answered Oct 04 '22 13:10

codaddict