Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does this set of regular expressions FULLY protect against cross site scripting?

What's an example of something dangerous that would not be caught by the code below?

EDIT: After some of the comments I added another line, commented below. See Vinko's comment in David Grant's answer. So far only Vinko has answered the question, which asks for specific examples that would slip through this function. Vinko provided one, but I've edited the code to close that hole. If another of you can think of another specific example, you'll have my vote!

public static string strip_dangerous_tags(string text_with_tags)
{
    string s = Regex.Replace(text_with_tags, @"<script", "<scrSAFEipt", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"</script", "</scrSAFEipt", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"<object", "</objSAFEct", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"</object", "</obSAFEct", RegexOptions.IgnoreCase);
    // ADDED AFTER THIS QUESTION WAS POSTED
    s = Regex.Replace(s, @"javascript", "javaSAFEscript", RegexOptions.IgnoreCase);

    s = Regex.Replace(s, @"onabort", "onSAFEabort", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onblur", "onSAFEblur", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onchange", "onSAFEchange", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onclick", "onSAFEclick", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"ondblclick", "onSAFEdblclick", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onerror", "onSAFEerror", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onfocus", "onSAFEfocus", RegexOptions.IgnoreCase);

    s = Regex.Replace(s, @"onkeydown", "onSAFEkeydown", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onkeypress", "onSAFEkeypress", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onkeyup", "onSAFEkeyup", RegexOptions.IgnoreCase);

    s = Regex.Replace(s, @"onload", "onSAFEload", RegexOptions.IgnoreCase);

    s = Regex.Replace(s, @"onmousedown", "onSAFEmousedown", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onmousemove", "onSAFEmousemove", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onmouseout", "onSAFEmouseout", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onmouseup", "onSAFEmouseup", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onmouseup", "onSAFEmouseup", RegexOptions.IgnoreCase);

    s = Regex.Replace(s, @"onreset", "onSAFEresetK", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onresize", "onSAFEresize", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onselect", "onSAFEselect", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onsubmit", "onSAFEsubmit", RegexOptions.IgnoreCase);
    s = Regex.Replace(s, @"onunload", "onSAFEunload", RegexOptions.IgnoreCase);

    return s;
}
like image 481
Corey Trager Avatar asked Oct 12 '08 16:10

Corey Trager


People also ask

What is a way to not prevent cross-site scripting?

To protect most from XSS vulnerabilities, follow three practices: Escape user input. Escaping means to convert the key characters in the data that a web page receives to prevent the data from being interpreted in any malicious way. It doesn't allow the special characters to be rendered. Validate user input.

What is the most effective defense against cross-site scripting attacks?

A web application firewall (WAF) can be a powerful tool for protecting against XSS attacks. WAFs can filter bots and other malicious activity that may indicate an attack. Attacks can then be blocked before any script is executed.


4 Answers

It's never enough – whitelist, don't blacklist

For example javascript: pseudo-URL can be obfuscated with HTML entities, you've forgotten about <embed> and there are dangerous CSS properties like behavior and expression in IE.

There are countless ways to evade filters and such approach is bound to fail. Even if you find and block all exploits possible today, new unsafe elements and attributes may be added in the future.

There are only two good ways to secure HTML:

  • convert it to text by replacing every < with &lt;.
    If you want to allow users enter formatted text, you can use your own markup (e.g. markdown like SO does).

  • parse HTML into DOM, check every element and attribute and remove everything that is not whitelisted.
    You will also need to check contents of allowed attributes like href (make sure that URLs use safe protocol, block all unknown protocols).
    Once you've cleaned up the DOM, generate new, valid HTML from it. Never work on HTML as if it was text, because invalid markup, comments, entities, etc. can easily fool your filter.

Also make sure your page declares its encoding, because there are exploits that take advantage of browsers auto-detecting wrong encoding.

like image 166
Kornel Avatar answered Nov 15 '22 09:11

Kornel


You're much better off turning all < into &lt; and all > into &gt;, then converting acceptable tags back. In other words, whitelist, don't blacklist.

like image 25
ceejayoz Avatar answered Nov 15 '22 09:11

ceejayoz


As David shows, there's no easy way to protect with just some regexes you can always forget something, like javascript: in your case. You better escape the HTML entities on output. There is a lot of discussion about the best way to do this, depending on what you actually need to allow, but what's certain is that your function is not enough.

Jeff has talked a bit about this here.

like image 37
Vinko Vrsalovic Avatar answered Nov 15 '22 10:11

Vinko Vrsalovic


<a href="javascript:document.writeln('on' + 'unload' + ' and more malicious stuff here...');">example</a>

Any time you can write a string to the document, a big door swings open.

There are myriad places to inject malicious things into HTML/JavaScript. For this reason, Facebook didn't initially allow JavaScript in their applications platform. Their solution was to later implement a markup/script compiler that allows them to seriously filter out the bad stuff.

As said already, whitelist a few tags and attributes and strip out everything else. Don't blacklist a few known malicious attributes and allow everything else.

like image 36
David Grant Avatar answered Nov 15 '22 11:11

David Grant