Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

redirect 404 to similar urls

I have a website with stories in it. I can have multiple types of stories within multiple categories like:

  • children
  • romance
  • scifi
  • action
  • thriler
  • quests

The stories are accessible using urls like:

www.example.com/action/story-name-action/ www.example.com/romance/story-name-romance/ 

and the first param (action) and the second (story-name-action) are redirected with .htaccess using rules. This part works just fine.

Lately, I get few dozen of 404 from different sites and here's what I want to do but I dont know how:

If someone types, for example: /action/story-nme-ction, I want to redirect to: action/story-name-action/

Is there an efficient way to implement this?

like image 711
natalia Avatar asked Jan 20 '12 12:01

natalia


People also ask

Can you redirect a 404 page?

404s should not always be redirected. 404s should not be redirected globally to the home page. 404s should only be redirected to a category or parent page if that's the most relevant user experience available. It's okay to serve a 404 when the page doesn't exist anymore (crazy, I know).

How do I redirect 404 to 301?

Installing the plugin – Simple Alternatively, download the plugin and upload the contents of 404-to-301. zip to your plugins directory, which usually is /wp-content/plugins/ . Go to 404 to 301 tab on your admin menus. Configure the plugin options with available settings.


2 Answers

Oh man, oh man!

What you're asking for is not simple and need you to have a powerful computer, but the results are simply amazing.

Here's what I'd suggest to do:

  • For 404 proper handling, you have the ErrorDocument redirection in vhost configuration. Mine looks like this: ErrorDocument 404 /404.php ;
  • When having a 404, Apache will call /404.php with all the arguments (which bad URL and so on, dump $_SERVER to see this). You have to test if there are only two expressions in the URL / i.e. http://mysite.com/(expr1)/(expr2)/
  • If not, then do a classical 404.
  • If yes then do a SOUNDEX search with MySQL (in your 404 Php file). See query sample here.
  • Then, in this "special" 404 case, do a suggestion, like google does, i.e.: "did you mean /action/story-name-action/? if so, click on the link".

This a hard work, but it's both interesting and shows your skill. Very few websites do this (I just know google actually).

Here's a demo on my French table that could give you an overview of how it works:

mysql> SELECT * FROM job WHERE SOUNDEX( description ) LIKE SOUNDEX('Machiniste cinéma'); +-------+--------------------+ | id    | description        | +-------+--------------------+ | 14018 | Machiniste cinéma  | +-------+--------------------+ 1 row in set (0.06 sec)  mysql> SELECT * FROM job WHERE SOUNDEX( description ) LIKE SOUNDEX('Mchiniste cinéma'); +-------+--------------------+ | id    | description        | +-------+--------------------+ | 14018 | Machiniste cinéma  | +-------+--------------------+ 1 row in set (0.06 sec)  mysql> SELECT * FROM job WHERE SOUNDEX( description ) LIKE SOUNDEX('Machnste cinema'); +-------+--------------------+ | id    | description        | +-------+--------------------+ | 14018 | Machiniste cinéma  | +-------+--------------------+ 1 row in set (0.06 sec)  mysql>  
like image 57
Olivier Pons Avatar answered Sep 20 '22 07:09

Olivier Pons


Unless you are very sure of the URL the user really wanted to navigate to, using rewrite / redirecting to a specific URL is a very bad idea.

Taking your example, suppose you want to handle every case where two letters may have been dropped, with 17 characters in the last part of the URL, that's 17*16 = 272 combinations, while it may be possible to match multiple 'false' urls with one regex, you're stil going to need a lot of rewrite rules.

A better solution would be, to implement 404 handler using PHP (since you included that tag in your q), to generate a list of (say) the top 10 URLs whose paths have the shortest levenstein distance from the requested path, along with a default link and supporting text. (There are mysql based implementations - try Google for URLs). NB handler should still return a 404 status - NB HTML content must be more than a minimum length to suppress MSIE's 'friendly' error message.

like image 42
symcbean Avatar answered Sep 20 '22 07:09

symcbean