Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-8 characters in URLs

Tags:

html

url

utf-8

I just stumbled upon the following article:

http://www.josscrowcroft.com/2011/code/utf-8-multibyte-characters-in-url-parameters-%E2%9C%93/

The article talks about using UTF-8 characters in URL's.

I would like to know whether it is safe to use it.

I have basically the same setup (browser + OS) as the guy who wrote the article. So I can't really test it.

So... is it safe to use UTF-8 characters in URL's?

And the bonus question: If it's safe how come not many websites use it?

like image 627
PeeHaa Avatar asked Jul 08 '11 13:07

PeeHaa


People also ask

Can URLs have UTF-8 characters?

Building a valid URL By the same token, any code that generates or accepts UTF-8 input might treat URLs with UTF-8 characters as "valid", but would also need to translate those characters before sending them out to a web server. This process is called URL-encoding or percent-encoding.

Can a URL have Unicode characters?

Unicode contains many characters that have similar appearance to other characters. Allowing the full range of Unicode into a URL means that characters which look similar—or even identical to—other characters could be used to spoof users.

Which characters are allowed in URLs?

There are only certain characters that are allowed in the URL string, alphabetic characters, numerals, and a few characters ; , / ? : @ & = + $ - _ . ! ~ * ' ( ) # that can have special meanings.

What does %2F mean in URL?

URL encoding converts characters into a format that can be transmitted over the Internet. - w3Schools. So, "/" is actually a seperator, but "%2f" becomes an ordinary character that simply represents "/" character in element of your url.


1 Answers

Unicode characters in the url (I'm not talking about the domainname) are safe to use. There is no security risk, if you use them on your site. (There are some risks to the end user if he visits a fraudulent site using unicode on the page as Oded said).

The only real problem is how older browsers (and OSs) show them. Browsers not supporting them will show those ugly percentage encoded chars in the url. You probably also have to percentage-encode the urls inside the html in case older browsers don't encode it for you and the user can't follow the link (which is bad). Modern browsers show the decoded url in the addressbar, but use the encoded version to send the request, so the user always sees the pretty unicode characters.

like image 70
Gerben Avatar answered Oct 22 '22 03:10

Gerben