How do Wikipedia (or MediaWiki in general) encode page titles in URIs? It's not normal URI encoding, since spaces are replaced with underscores and double quotes are not encoded and things like that.
http://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_%28technical_restrictions%29 - here you've got some kind of description of what their engine enforces on article names.
They should have something like this in their LocalSettings.php: $wgArticlePath = '/wiki/$1';
and proper server URI rewrites configuration - they seem to be using Apache (HTTP header), so it's probably mod_rewrite. http://www.mediawiki.org/wiki/Manual:Short_URL
You can also refer to the index.php file for an article on Wikipedia like this: http://en.wikipedia.org/w/index.php?title=Foo%20bar
and get redirected by the engine to http://en.wikipedia.org/wiki/Foo_bar
. Behind the scenes mod_rewrite translates it into /index.php?title=Foo_bar. For the MediaWiki engine it's the same as if you visited http://en.wikipedia.org/w/index.php?title=Foo_bar
- this page doesn't redirect you.
The process is quite complex and isn't exactly pretty. You need to look at the Title
class found in includes/Title.php
. You should start with the newFromText
method, but the bulk of the logic is in the secureAndSplit
method.
Note that (as ever with MediaWiki) the code is not decoupled in the slightest. If you want to replicate it, you'll need to extract the logic rather than simply re-using the class.
The logic looks something like this:
Apple#Name
../directory/page
)~~~
) (for some reason)Furthermore, I believe I'm right in saying that quotation marks don't need to be encoded by the original user -- browsers can handle them transparently.
I hope that helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With