Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What do % signs mean in a url?

Tags:

url

When I copy paste this Wikipedia article it looks like this.

http://en.wikipedia.org/wiki/Gruy%C3%A8re_%28cheese%29

However if you paste this back into the URL address the percent signs disappear and what appears to be Unicode characters ( and maybe special URL characters ) take the place of the percent signs.

Are these abbreviations for Unicode and special URL characters?

I'm use to seeing \u00ff, etc. in JavaScript.

like image 621
employee-0 Avatar asked Dec 21 '22 01:12

employee-0


2 Answers

The reference you're looking for is RFC 3987: Internationalized Resource Identifiers, specifically the section on mapping IRIs to URIs.

RFC 3986: Uniform Resource Identifiers specifies that reserved characters must be percent-encoded, but it also specifies that percent-encoded characters are decoded to US-ASCII, which does not include characters such as è.

RFC 3987 specifies that non-ASCII characters should first be encoded as UTF-8 so they can be percent-encoded as per RFC 3986. If you'll permit me to illustrate in Python:

>>> u'è'.encode('utf-8')
'\xc3\xa8'

Here I've asked Python to encode the Unicode è to a string of bytes using UTF-8. The bytes returned are 0xc3 and 0xa8. Percent-encoded, this looks like %C3%A8.

The parenthesis also appearing in your URL do fit in US-ASCII, so they are percent-escaped with their US-ASCII code points, which are also valid UTF-8.

So, no, there is no simple 16×16 table—such a table could never represent the richness of Unicode. But there is a method to the apparent madness.

like image 93
zigg Avatar answered Dec 22 '22 13:12

zigg


% in a URI is followed by two characters from 0-9A-F, and is the escaped version of writing the character with that hex code. Doing this means you can write a URI with characters that might have special meaning in other languages.

Common examples are %20 for a space and %5B and %5C for [ and ], respectively.

like image 43
Paul S. Avatar answered Dec 22 '22 15:12

Paul S.