We have a high security application and we want to allow users to enter URLs that other users will see. This introduces a high risk of XSS hacks - a user could potentially enter javascript that another user ends up executing. Since we hold sensitive data it's essential that this never happens. What are the best practices in dealing with this? Is any security whitelist or escape pattern alone good enough? Any advice on dealing with redirections ("this link goes outside our site" message on a warning page before following the link, for instance) Is there an argument for not supporting user entered links at all? <hr> Clarification: Basically our users want to input: <blockquote> stackoverflow.com </blockquote> And have it output to another user: <pre class="prettyprint"><code><a href="http://stackoverflow.com">stackoverflow.com</a> </code></pre> What I really worry about is them using this in a XSS hack. I.e. they input: <blockquote> alert('hacked!'); </blockquote> So other users get this link: <pre class="prettyprint"><code><a href="javascript:alert('hacked!');">stackoverflow.com</a> </code></pre> My example is just to explain the risk - I'm well aware that javascript and URLs are different things, but by letting them input the latter they may be able to execute the former. You'd be amazed how many sites you can break with this trick - HTML is even worse. If they know to deal with links do they also know to sanitise <code><iframe></code>, <code><img></code> and clever CSS references? I'm working in a high security environment - a single XSS hack could result in very high losses for us. I'm happy that I could produce a Regex (or use one of the excellent suggestions so far) that could exclude everything that I could think of, but would that be enough?

If you think URLs can't contain code, think again! https://owasp.org/www-community/xss-filter-evasion-cheatsheet Read that, and weep. Here's how we do it on Stack Overflow: <pre class="prettyprint"><code>/// <summary> /// returns "safe" URL, stripping anything outside normal charsets for URL /// </summary> public static string SanitizeUrl(string url) { return Regex.Replace(url, @"[^-A-Za-z0-9+&@#/%?=~_|!:,.;\(\)]", ""); } </code></pre>

The process of rendering a link "safe" should go through three or four steps: <ul> <li>Unescape/re-encode the string you've been given (RSnake has documented a number of tricks at http://ha.ckers.org/xss.html that use escaping and UTF encodings).</li> <li>Clean the link up: Regexes are a good start - make sure to truncate the string or throw it away if it contains a " (or whatever you use to close the attributes in your output); If you're doing the links only as references to other information you can also force the protocol at the end of this process - if the portion before the first colon is not 'http' or 'https' then append 'http://' to the start. This allows you to create usable links from incomplete input as a user would type into a browser and gives you a last shot at tripping up whatever mischief someone has tried to sneak in.</li> <li>Check that the result is a well formed URL (protocol://host.domain[:port][/path][/[file]][?queryField=queryValue][#anchor]).</li> <li>Possibly check the result against a site blacklist or try to fetch it through some sort of malware checker.</li> </ul> If security is a priority I would hope that the users would forgive a bit of paranoia in this process, even if it does end up throwing away some safe links.

Best way to handle security and avoid XSS with user entered URLs

Tags:

security

url

xss

html-sanitizing

We have a high security application and we want to allow users to enter URLs that other users will see.

This introduces a high risk of XSS hacks - a user could potentially enter javascript that another user ends up executing. Since we hold sensitive data it's essential that this never happens.

What are the best practices in dealing with this? Is any security whitelist or escape pattern alone good enough?

Any advice on dealing with redirections ("this link goes outside our site" message on a warning page before following the link, for instance)

Is there an argument for not supporting user entered links at all?

Clarification:

Basically our users want to input:

stackoverflow.com

And have it output to another user:

<a href="http://stackoverflow.com">stackoverflow.com</a>

What I really worry about is them using this in a XSS hack. I.e. they input:

alert('hacked!');

So other users get this link:

<a href="javascript:alert('hacked!');">stackoverflow.com</a>

My example is just to explain the risk - I'm well aware that javascript and URLs are different things, but by letting them input the latter they may be able to execute the former.

You'd be amazed how many sites you can break with this trick - HTML is even worse. If they know to deal with links do they also know to sanitise <iframe>, <img> and clever CSS references?

I'm working in a high security environment - a single XSS hack could result in very high losses for us. I'm happy that I could produce a Regex (or use one of the excellent suggestions so far) that could exclude everything that I could think of, but would that be enough?

409

asked Oct 15 '08 18:10

Keith

2 Answers

If you think URLs can't contain code, think again!

https://owasp.org/www-community/xss-filter-evasion-cheatsheet

Read that, and weep.

Here's how we do it on Stack Overflow:

/// <summary> /// returns "safe" URL, stripping anything outside normal charsets for URL /// </summary> public static string SanitizeUrl(string url) {     return Regex.Replace(url, @"[^-A-Za-z0-9+&@#/%?=~_|!:,.;\(\)]", ""); }

answered Oct 16 '22 20:10

Jeff Atwood

The process of rendering a link "safe" should go through three or four steps:

Unescape/re-encode the string you've been given (RSnake has documented a number of tricks at http://ha.ckers.org/xss.html that use escaping and UTF encodings).
Clean the link up: Regexes are a good start - make sure to truncate the string or throw it away if it contains a " (or whatever you use to close the attributes in your output); If you're doing the links only as references to other information you can also force the protocol at the end of this process - if the portion before the first colon is not 'http' or 'https' then append 'http://' to the start. This allows you to create usable links from incomplete input as a user would type into a browser and gives you a last shot at tripping up whatever mischief someone has tried to sneak in.
Check that the result is a well formed URL (protocol://host.domain[:port][/path][/[file]][?queryField=queryValue][#anchor]).
Possibly check the result against a site blacklist or try to fetch it through some sort of malware checker.

If security is a priority I would hope that the users would forgive a bit of paranoia in this process, even if it does end up throwing away some safe links.

answered Oct 16 '22 21:10

Bell

Related questions
                            
                                How to fake $_SERVER['REMOTE_ADDR'] variable?
                            
                                Best practices to store CreditCard information into DataBase
                            
                                is there a yarn alternative for npm audit?
                            
                                IE 11 first-party session cookies being lost in iframe
                            
                                Is this Rails JSON authentication API (using Devise) secure?
                            
                                Is JSONP safe to use?
                            
                                encrypt and decrypt md5
                            
                                SQLAlchemy + SQL Injection
                            
                                Exploitable C# Functions [closed]
                            
                                Is it possible to check if an email is confirmed on Facebook?
                            
                                How should a Facebook user access token be consumed on the server-side?
                            
                                How does the RSA private key passphrase work under the hood?
                            
                                Restrict access to a specific controller by IP address in ASP.NET MVC Beta
                            
                                Best way to store encryption keys in .NET C#
                            
                                SQL Server returns error "Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'." in Windows application
                            
                                Securing a password in a properties file [duplicate]
                            
                                Windows equivalent of OS X Keychain?
                            
                                Node.js https pem error: routines:PEM_read_bio:no start line
                            
                                Detecting if a browser is using Private Browsing mode
                            
                                Black hat knowledge for white hat programmers [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With