Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

URL Routing: Handling Spaces and Illegal Characters When Creating Friendly URLs

I've seen a lot of discussion on URL Routing, and LOTS of great suggestions... but in the real world, one thing I haven't seen discussed are:

  1. Creating Friendly URLs with Spaces and illegal characters
  2. Querying the DB

Say you're building a Medical site, which has Articles with a Category and optional Subcategory. (1 to many). ( Could've used any example, but the medical field has lots of long words)


Example Categories/Sub/Article Structure:

  1. Your General Health (Category)
    • Natural Health (Subcategory)
      1. Your body's immune system and why it needs help. (Article)
      2. Are plants and herbs really the solution?
      3. Should I eat fortified foods?
    • Homeopathic Medicine
      1. What's homeopathic medicine?
    • Healthy Eating
      1. Should you drink 10 cups of coffee per day?
      2. Are Organic Vegetables worth it?
      3. Is Burger King® evil?
      4. Is "French café" or American coffee healthier?
  2. Diseases & Conditions (Category)
    • Auto-Immune Disorders (Subcategory)
      1. The #1 killer of people is some disease
      2. How to get help
    • Genetic Conditions
      1. Preventing Spina Bifida before pregnancy.
      2. Are you predisposed to live a long time?
  3. Dr. FooBar's personal suggestions (Category)
    1. My thoughts on Herbal medicine & natural remedies (Article - no subcategory)
    2. Why should you care about your health?
    3. It IS possible to eat right and have a good diet.
    4. Has bloodless surgery come of age?

In a structure like this, you're going to have some LOOONG URLs if you go: /{Category}/{subcategory}/{Article Title}

In addition, there are numerous illegal characters, like # ! ? ' é " etc.

SO, the QUESTION(S) ARE:

  1. How would you handle illegal characters and Spaces? (Pros and Cons?)
  2. Would you handle getting this from the Database
    • In other words, would you trust the DB to find the Item, passing the title, or pull all the titles and find the key in code to get the key to pass to the Database (two calls to the database)?

note: I always see nice pretty examples like /products/beverages/Short-Product-Name/ how about handling some ugly examples ^_^

like image 450
Armstrongest Avatar asked Nov 05 '08 21:11

Armstrongest


People also ask

What punctuation is allowed in a URL?

Original answer from RFC 1738 specification: Thus, only alphanumerics, the special characters " $-_. +! *'(), ", and reserved characters used for their reserved purposes may be used unencoded within a URL.

Can you have spaces in URL?

Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format. URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces.


1 Answers

I myself prefer _ to - for readability reasons ( you put an underline on it and the _'s virtually go_away ) , if you're going to strip spaces.

You may want to try casting extended characters, ie, ü , to close-ascii equivelants where possible, ie:

ü -> u

However, in my experience the biggest problem with Actual SEO related issues, is not that the URL contains all the lovely text, its that when people change the text in the link, all your SEO work turns to crap because you now have DEADLINKS in the indexes.

For this, I would suggest what stackoverflow do, and have a numeric part which references a constant entity, and totally ignore the rest of the text ( and/or update it when its wrong )

Also, the grossly hericichial nature just makes for bad usability by humans. Humans hate long urls. Copy pasting them sucks and they're just more prone to breaking. If you can subdivide it into lower teirs, ie

/article/1/Some_Article_Title_Here
/article/1/Section/5/Section_Title_Here
/section/19023/Section_Title_here  ( == above link ) 

That way the only time you need to do voodoo magic is when the numbered article actually has been deleted, at which time you use the text part as a search string to try find the real article or something like it.

like image 183
Kent Fredric Avatar answered Sep 23 '22 01:09

Kent Fredric