I would like to have pretty URLs for my tagging system along with all the special characters: +
, &
, #
, %
, and =
. Is there a way to do this with mod_rewrite without having to double encode the links?
I notice that delicious.com and stackoverflow seem to be able to handle singly encoded special characters. What's the magic formula?
Here's an example of what I want to happen:
http://www.example.com/tag/c%2b%2b
Would trigger the following RewriteRule:
RewriteRule ^tag/(.*) script.php?tag=$1
and the value of tag would be "c++"
The normal operation of apache/mod_rewrite doesn't work like this, as it seems to turn the plus signs into spaces. If I double encode the plus sign to '%252B' then I get the desired result - however it makes for messy URLS and seems pretty hack to me.
mod_rewrite provides a flexible and powerful way to manipulate URLs using an unlimited number of rules. Each rule can have an unlimited number of attached rule conditions, to allow you to rewrite URL based on server variables, environment variables, HTTP headers, or time stamps.
RewriteRule is used to rewrite the url as the name signifies if all the conditions defined in RewriteCond are matching. One or more RewriteCond can precede a RewriteRule directive. If we talk about traditional programming RewriteCond works just like 'If' condition where you can use conditions like AND, OR, >=, == , !
In order for Apache to understand rewrite rules, we first need to activate mod_rewrite . It's already installed, but it's disabled on a default Apache installation. Use the a2enmod command to enable the module: sudo a2enmod rewrite.
The normal operation of apache/mod_rewrite doesn't work like this, as it seems to turn the plus signs into spaces.
I don't think that's quite what's happening. Apache is decoding the %2Bs to +s in the path part since + is a valid character there. It does this before letting mod_rewrite look at the request.
So then mod_rewrite changes your request '/tag/c++' to 'script.php?tag=c++'. But in a query string component in the application/x-www-form-encoded format, the escaping rules are very slightly different to those that apply in path parts. In particular, '+' is a shorthand for space (which could just as well be encoded as '%20', but this is an old behaviour we'll never be able to change now).
So PHP's form-reading code receives the 'c++' and dumps it in your _GET as C-space-space.
Looks like the way around this is to use the rewriteflag 'B'. See http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriteflags - curiously it uses more or less the same example!
RewriteRule ^tag/(.*)$ /script.php?tag=$1 [B]
I'm not sure I understand what you're asking, but the NE
(noescape) flag to Apache's RewriteRule
directive might be of some interest to you. Basically, it prevents mod_rewrite
from automatically escaping special characters in the substitution pattern you provide. The example given in the Apache 2.2 documentation is
RewriteRule /foo/(.*) /bar/arg=P1\%3d$1 [R,NE]
which will turn, for example, /foo/zed
into a redirect to /bar/arg=P1%3dzed
, so that the script /bar
will then see a query parameter named arg
with a value P1=zed
, if it looks in its PATH_INFO
(okay, that's not a real query parameter, so sue me ;-P).
At least, I think that's how it works . . . I've never used that particular flag myself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With