Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I avoid double URL encoding when rendering URLs in my website?

Users provide both properly escaped URLs and raw URLs to my website in a text input; for example I consider these two URLs equivalent:

https://www.cool.com/cool%20beans
https://www.cool.com/cool beans

Now I want to render these as <a> tags later, when viewing this data. I am stuck between encoding the given text and getting these links:

<a href="https://www.cool.com/cool%2520beans">   <!-- This one is broken! -->
<a href="https://www.cool.com/cool%20beans">

Or not encoding it and getting this:

<a href="https://www.cool.com/cool%20beans">
<a href="https://www.cool.com/cool beans">       <!-- This one is broken! -->

What's the best way out from a user experience standpoint with modern browsers? I'm torn between doing a decoding pass over their input, or the second option I listed above where we don't encode the href attribute.

like image 512
Cory Kendall Avatar asked Apr 18 '13 22:04

Cory Kendall


People also ask

How do I stop URL decoding?

Another interesting oddity is that when you copy URLs out of Firefox or Chrome they are URL encoded, which can be very annoying. To prevent this simply type a character in the URL and erase it, before you copy the URL.

How do I copy URL without encoding?

No more percent-encoding, no more punycode. Use Alt+U shortcut or click the icon to copy URL from address bar.

What happens if you double encode a URL?

By using double encoding it's possible to bypass security filters that only decode user input once. The second decoding process is executed by the backend platform or modules that properly handle encoded data, but don't have the corresponding security checks in place.


1 Answers

If you want to avoid double encoding the links you can just use urldecode() on both links, and then urlencode() afterwards, as decoding a URL such as "https://www.cool.com/cool beans" would return the same value, whereas decoding "https://www.cool.com/cool%20beans" would return with the space. This leaves both links free to be encoded properly.

Alternatively, encoded characters could be scanned for using strpos() function, e.g.

if ($pos = strpos($url, "%20") {
    //Encoded character found
}

Ideally for this an array of common encoded characters would be scanned for, in the place of the "%20"

like image 146
Chris Brown Avatar answered Oct 18 '22 16:10

Chris Brown