Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rewriting dynamic URLs

I' trying to rewrite 1 dynamic URL to another dynamic URL as follows;

/category?category=News to
/categorysearch/result/?q=news

I was wondering if anyone had an idea on how this can be done in htaccess?

Thanks.

like image 289
doubleplusgood Avatar asked Oct 09 '11 12:10

doubleplusgood


People also ask

What is the difference between URL rewriting and redirecting?

Simply put, a redirect is a client-side request to have the web browser go to another URL. This means that the URL that you see in the browser will update to the new URL. A rewrite is a server-side rewrite of the URL before it's fully processed by IIS.

Is URL rewriting safe?

While the process of URL rewriting may be understood by IT professionals, many users may be under the impression that any 'Safe' link is indeed safe - although this is often not the case. Thus, URL rewriting can actually have the side effect of increasing the likelihood of users clicking on malicious links.

What is URL rewriting Tool?

URL Rewrite permits Web administrators to easily replace the URLs generated by a Web application in the response HTML with a more user friendly and search engine friendly equivalent. Links can be modified in the HTML markup generated by a Web application behind a reverse proxy.


1 Answers

So the final EDITED solution is (for your linked .htaccess):

RewriteCond %{REQUEST_FILENAME} ethical [NC]
RewriteCond %{QUERY_STRING} (^|&|%26|%20)ethical(=|%3D)([^&]+) [NC]
RewriteRule .* /catalogsearch/result/?q=%3 [L,R]

If I use the 'category' terms like in this current question:

RewriteCond %{REQUEST_FILENAME} category [NC]
RewriteCond %{QUERY_STRING} (^|&|%26|%20)category(=|%3D)([^&]+) [NC]
RewriteRule .* /categorysearch/result/?q=%3 [L,R]

But the first one is more clear, the word category is not used everywhere. If you want to add the lowercase transformation we'll even do:

# outside of a Directory section
RewriteMap lowercase int:tolower
# in a Directory section (not a .htaccess)
RewriteCond %{REQUEST_FILENAME} category [NC]
RewriteCond %{QUERY_STRING} (^|&|%26|%20)category(=|%3D)([^&]+) [NC]
RewriteRule .* /catalogsearch/result/?q=${lowercase:%3} [L,R]

But as commented this last one with tolower could not work in a .htaccess -- anyway .htaccess are bad, really, for performances, you should really think about setting AllowOverride None and move all this stuff in a VirtualHost. And nice things such as rewriteMaps cannot work on .htaccess files.

So Now I'll explain the rules. First the main problem in your situation is that everything after the "?" is the query_string, not the requested filename. And most rewriteRules try to work on the requested filename. So I first check we're on the targeted quested filename:

RewriteCond %{REQUEST_FILENAME} category [NC]

Then we are going to work on the Query string (everything after the question mark). We could have written something quite simplier like that:

RewriteCond %{REQUEST_FILENAME} category [NC]
RewriteCond %{QUERY_STRING} ^category=(.+) [NC]
RewriteRule .* /catalogsearch/result/?q=${lowercase:%1} [L,R]

To catch everything after ?category= in a variable %1 available for the RewriteRule. But the QUERY_STRING parameter has several problems, let's try to have some fun with it:

  • url decoding is not yet done on this parameter
  • category is maybe not the first parameter in the list of parameters
  • it's maybe even not the last one (&foo=bar)
  • you could have some spaces before

EDITED:

My first answer was:

RewriteCond %{REQUEST_FILENAME} category [NC]
RewriteCond %{QUERY_STRING} [^|&|%26|%20][category](=|%3D)([^&]+) [NC]
RewriteRule .* /catalogsearch/result/?q=${lowercase:%2} [L,R]

With [^|&|%26|%20] catching the fact it could be the starting parameter or a parameter after an & or a space, %26 is an & urlencoded.

But this was wrong, thanks to @mootinator comment I've checked a little more, [category] is matching 'category' but also 'atcatcategory' and any combination of theses letters.

RewriteCond %{QUERY_STRING} category(=|%3D)([^&]+) [NC]

Is matching the 'category' argument, removing the ^ allows this argument to be placed anywhere. But this would also match an argument named subcategory. So the 1st letter must be a space or & (or the start of the query string).

This was the reason for [^|&|%26|%20], meaning for me start of chain OR & or space or & in urlencoded. But that was also wrong, the ^ inside [] means negate. So we need to use here as well matching parenthesis: (^|&|%26|%20), now it works, the only problem is that the value of the argument is now %3 and not %2

Then we catch the (category) word, to be really exact we should in fact catch it letter after letter with uppercase, lowercase, and urlencoded version of the letter in lower and uppercase, something like an awfull [c|C|%43|%63][a|A|%61|%41][t|T|%74|...]to be continued -- and I should maybe use parenthesis instead of brackets, hell, check urlencoded characters list here.

(=|%3D) is the '=' character urlencoded or not. I could have used [=|%3D] but the %3D is not matched well in this case, I don't understand why (mod_rewrite is a land of strange things). Because of this first matching parenthesis used we'll have to use %2 variable and not %1.

And then we have the nice ([^&]+) which means anything but not an '&'.

After theses conditions we apply the RewriteRule. We take everything (this is .*) and redirect it to /catalogsearch/result/?q=%2 where we use %3 and not $1 or $3 as we do not capture anything on the rewriteRule but on a condition before (%3 is the third captured data on the last condition so it's ([^&]+)). If you add the QSA flag ([L,R,QSA]) in the rewriteRule you will even get back other parameters on the final query string so that:

ethical?bar=toto& ethical=Foo&zorglub=titi 
=> is redirected to 
catalogsearch/result/?q=foo&bar=toto&%2520ethical=Foo&zorglub=titi

And if I play with a little url encoding:

ethical?bar=toto%26 ethical%3DFoo&zorglub=titi 
=> is redirected to 
catalogsearch/result/?q=foo&bar=toto%2526%2520ethical%253DFoo&zorglub=titi

But this is not the end of the play, I'll let you handle all the urlencoding problems you could have it you want to detect all variations on Foo value and mix them with the tolower (here it's broken). You will maybe not even have theses problems of encoded url in your parameters or extra attributes, but that was fun :-)

like image 50
regilero Avatar answered Oct 21 '22 05:10

regilero