Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to encode special characters using mod_rewrite & Apache?

I would like to have pretty URLs for my tagging system along with all the special characters: +, &, #, %, and =. Is there a way to do this with mod_rewrite without having to double encode the links?

I notice that delicious.com and stackoverflow seem to be able to handle singly encoded special characters. What's the magic formula?

Here's an example of what I want to happen:

http://www.example.com/tag/c%2b%2b 

Would trigger the following RewriteRule:

RewriteRule ^tag/(.*)   script.php?tag=$1 

and the value of tag would be "c++"

The normal operation of apache/mod_rewrite doesn't work like this, as it seems to turn the plus signs into spaces. If I double encode the plus sign to '%252B' then I get the desired result - however it makes for messy URLS and seems pretty hack to me.

like image 597
Aldie Avatar asked Jan 19 '09 23:01

Aldie


People also ask

What is mod_rewrite used for?

mod_rewrite provides a flexible and powerful way to manipulate URLs using an unlimited number of rules. Each rule can have an unlimited number of attached rule conditions, to allow you to rewrite URL based on server variables, environment variables, HTTP headers, or time stamps.

What is RewriteCond and RewriteRule?

RewriteRule is used to rewrite the url as the name signifies if all the conditions defined in RewriteCond are matching. One or more RewriteCond can precede a RewriteRule directive. If we talk about traditional programming RewriteCond works just like 'If' condition where you can use conditions like AND, OR, >=, == , !

How do I enable rewrite mod?

In order for Apache to understand rewrite rules, we first need to activate mod_rewrite . It's already installed, but it's disabled on a default Apache installation. Use the a2enmod command to enable the module: sudo a2enmod rewrite.


2 Answers

The normal operation of apache/mod_rewrite doesn't work like this, as it seems to turn the plus signs into spaces.

I don't think that's quite what's happening. Apache is decoding the %2Bs to +s in the path part since + is a valid character there. It does this before letting mod_rewrite look at the request.

So then mod_rewrite changes your request '/tag/c++' to 'script.php?tag=c++'. But in a query string component in the application/x-www-form-encoded format, the escaping rules are very slightly different to those that apply in path parts. In particular, '+' is a shorthand for space (which could just as well be encoded as '%20', but this is an old behaviour we'll never be able to change now).

So PHP's form-reading code receives the 'c++' and dumps it in your _GET as C-space-space.

Looks like the way around this is to use the rewriteflag 'B'. See http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriteflags - curiously it uses more or less the same example!

RewriteRule ^tag/(.*)$ /script.php?tag=$1 [B] 
like image 54
bobince Avatar answered Sep 29 '22 06:09

bobince


I'm not sure I understand what you're asking, but the NE (noescape) flag to Apache's RewriteRule directive might be of some interest to you. Basically, it prevents mod_rewrite from automatically escaping special characters in the substitution pattern you provide. The example given in the Apache 2.2 documentation is

RewriteRule /foo/(.*) /bar/arg=P1\%3d$1 [R,NE] 

which will turn, for example, /foo/zed into a redirect to /bar/arg=P1%3dzed, so that the script /bar will then see a query parameter named arg with a value P1=zed, if it looks in its PATH_INFO (okay, that's not a real query parameter, so sue me ;-P).

At least, I think that's how it works . . . I've never used that particular flag myself.

like image 21
David Z Avatar answered Sep 29 '22 08:09

David Z