I noticed that Wikipedia uses percent encoding for the path section of a URL, but converts the %
character to .
for the #fragment.
For example, on the Russian 'Russia' page, the URL for section 2 (История) is
http://ru.wikipedia.org/wiki/%D0%A0%D0%BE%D1%81%D1%81%D0%B8%D1%8F#.D0.98.D1.81.D1.82.D0.BE.D1.80.D0.B8.D1.8F
instead of
http://ru.wikipedia.org/wiki/%D0%A0%D0%BE%D1%81%D1%81%D0%B8%D1%8F#%D0%98%D1%81%D1%82%D0%BE%D1%80%D0%B8%D1%8F
Neither are valid HTML<5 tokens for an id/name as the token must start with [A-Za-z]. HTML5 currently states that you can use at least one of any characters apart from space (so you don't need to encode at all), but Wikipedia is not HTML5.
So, why has Wikipedia used this scheme?
One possible answer is cross-browser problems. Browsers are inconsistent in how they handle unicode, especially with URL fragments.
For example, with the link
<a id="foo" href="%D1%83%D0%BE%D0%BC%D0%B1%D0%BB%D1%8B">Уомблы</a>
Browser | Hover | Location bar | href* | path*
----------------------------------------------------------
Chrome 19 | Unicode | Unicode | Percent | Percent
Firefox 13 | Unicode | Unicode | Percent | Percent
IE 9 | Percent | Percent | Percent | Percent
but with a fragment:
<a id="foo" href="#%D1%83%D0%BE%D0%BC%D0%B1%D0%BB%D1%8B">Уомблы</a>
Browser | Hover | Location bar | href* | hash*
----------------------------------------------------------
Chrome 19 | Percent | Percent | Percent | Percent
Firefox 13 | Unicode | Unicode | Percent | Unicode
IE 9 | Percent | Percent | Percent | Percent
href = javascript:document.getElementById('foo').href
path = javascript:location.pathname
after following link
hash = javascript:location.hash
after following link
So Firefox will decode the fragment's percent-encoding to unicode when you ask for the hash, causing it to not match the id/name attribute's value. Note, this is only an issue in JavaScript; following links works fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With