I've seen a lot of discussion on URL Routing, and LOTS of great suggestions... but in the real world, one thing I haven't seen discussed are:
Say you're building a Medical site, which has Articles with a Category and optional Subcategory. (1 to many). ( Could've used any example, but the medical field has lots of long words)
In a structure like this, you're going to have some LOOONG URLs if you go: /{Category}/{subcategory}/{Article Title}
In addition, there are numerous illegal characters, like # ! ? ' é " etc.
note: I always see nice pretty examples like /products/beverages/Short-Product-Name/ how about handling some ugly examples ^_^
Original answer from RFC 1738 specification: Thus, only alphanumerics, the special characters " $-_. +! *'(), ", and reserved characters used for their reserved purposes may be used unencoded within a URL.
Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format. URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces.
I myself prefer _ to - for readability reasons ( you put an underline on it and the _
's virtually go_away ) , if you're going to strip spaces.
You may want to try casting extended characters, ie, ü , to close-ascii equivelants where possible, ie:
ü -> u
However, in my experience the biggest problem with Actual SEO related issues, is not that the URL contains all the lovely text, its that when people change the text in the link, all your SEO work turns to crap because you now have DEADLINKS in the indexes.
For this, I would suggest what stackoverflow do, and have a numeric part which references a constant entity, and totally ignore the rest of the text ( and/or update it when its wrong )
Also, the grossly hericichial nature just makes for bad usability by humans. Humans hate long urls. Copy pasting them sucks and they're just more prone to breaking. If you can subdivide it into lower teirs, ie
/article/1/Some_Article_Title_Here
/article/1/Section/5/Section_Title_Here
/section/19023/Section_Title_here ( == above link )
That way the only time you need to do voodoo magic is when the numbered article actually has been deleted, at which time you use the text part as a search string to try find the real article or something like it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With