Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Wikipedia use a modified percent encoding in their URL fragments?

I noticed that Wikipedia uses percent encoding for the path section of a URL, but converts the % character to . for the #fragment.

For example, on the Russian 'Russia' page, the URL for section 2 (История) is

http://ru.wikipedia.org/wiki/%D0%A0%D0%BE%D1%81%D1%81%D0%B8%D1%8F#.D0.98.D1.81.D1.82.D0.BE.D1.80.D0.B8.D1.8F

instead of

http://ru.wikipedia.org/wiki/%D0%A0%D0%BE%D1%81%D1%81%D0%B8%D1%8F#%D0%98%D1%81%D1%82%D0%BE%D1%80%D0%B8%D1%8F

Neither are valid HTML<5 tokens for an id/name as the token must start with [A-Za-z]. HTML5 currently states that you can use at least one of any characters apart from space (so you don't need to encode at all), but Wikipedia is not HTML5.

So, why has Wikipedia used this scheme?

like image 594
Deebster Avatar asked Jun 22 '12 11:06

Deebster


1 Answers

One possible answer is cross-browser problems. Browsers are inconsistent in how they handle unicode, especially with URL fragments.

For example, with the link

<a id="foo" href="%D1%83%D0%BE%D0%BC%D0%B1%D0%BB%D1%8B">Уомблы</a>

Browser      | Hover   | Location bar | href*   | path*
----------------------------------------------------------
Chrome 19    | Unicode | Unicode      | Percent | Percent
Firefox 13   | Unicode | Unicode      | Percent | Percent
IE 9         | Percent | Percent      | Percent | Percent

but with a fragment:

<a id="foo" href="#%D1%83%D0%BE%D0%BC%D0%B1%D0%BB%D1%8B">Уомблы</a>

Browser      | Hover   | Location bar | href*   | hash*
----------------------------------------------------------
Chrome 19    | Percent | Percent      | Percent | Percent
Firefox 13   | Unicode | Unicode      | Percent | Unicode
IE 9         | Percent | Percent      | Percent | Percent

href = javascript:document.getElementById('foo').href

path = javascript:location.pathname after following link

hash = javascript:location.hash after following link

So Firefox will decode the fragment's percent-encoding to unicode when you ask for the hash, causing it to not match the id/name attribute's value. Note, this is only an issue in JavaScript; following links works fine.

like image 159
Deebster Avatar answered Dec 18 '22 22:12

Deebster