Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Htaccess : multiple parameters and nested parentheses

I've got some project where I redirect every request on index.php, with various GET parameters.

Currently, I'm using this kind of htaccess (this is only an extract, and I changed parameters name to a, b, c..., in order to simplify the problem) :

RewriteRule ^([\w-]+)\.html$                   index.php?a=$1 [L]
RewriteRule ^([\w-]+)/([\w-]+)\.html$          index.php?a=$1&b=$2 [L]
RewriteRule ^([\w-]+)/([\w-]+)/([\w-]+)\.html$ index.php?a=$1&b=$2&c=$3 [L]
(...)

It works (well), but I was thinking about a way to minify those lines into a single one. The idea is to have some nested parentheses to 'generate' all GET parameters, but it seems it won't work as easily as I thought.

Here is what I've made so far :

RewriteRule 
    ^(?:([\w-]+)\/)*([\w-]+)\.html$ 
    index.php?a=$1&b=$2&c=$3&d=$4&e=$5&f=$6&g=$7&h=$8&i=$9 [L]

GET result for the url http://website.com/1/2/3/4/5/6/7/8/9.html :

array(9) { 
   ["a"]=> string(1) "8" 
   ["b"]=> string(1) "9" 
   ["c"]=> string(0) "" 
   ["d"]=> string(0) "" 
   ["e"]=> string(0) "" 
   ["f"]=> string(0) "" 
   ["g"]=> string(0) "" 
   ["h"]=> string(0) "" 
   ["i"]=> string(0) "" 
}

Instead of get a=1, b=2, c=3... I only receive the two last parameters. Notice that the RewriteRule is executed, then I know my regex match.

Any idea ?

like image 599
zessx Avatar asked Apr 20 '26 08:04

zessx


2 Answers

For the record: do consider using a PHP-based approach: pass the entire requested URI to PHP, and then you could handle the whole thing in PHP, which is a whole lot easier, very likely safer, and maybe even faster, than doing the magic with mod_rewrite.

I mean something like this:

 RewriteRule ^(.*)\.html$ switchboard.php?uri=$1 [L]

then in switchboard.php:

list($_GET['a'], $_GET['b'], $_GET['c']) = explode( '/', $_GET['uri']);
require 'index.php';

(if the super-long list() looks cumbersome, you may use some clever one-liner mapping technique)

Anyhow, the regex question is certainly interesting, but it's a generic PCRE thing. The phenomenon is called "repeated capture group": whenever you repeat a capture group with the Kleene star like you do, only the very last iteration will actually be captured (in our case, the matches 1,2..7 were discarded, and only the 8 was kept) - imagine it as a buffer you keep overwriting with newer and newer matches. It makes a lot of sense, if you think about it.

A solution is to use as many groups as you actually want to capture, by making the earlier groups optional - it's incredibly awkward to write and read (indeed, Arjan just posted it, and it's a headache just to look at it), and also very inefficient in this case. A much simpler solution is to just capture the whole thing and split it.

like image 195
SáT Avatar answered Apr 23 '26 08:04

SáT


RewriteRule 
    ^([\w-]+)(?:\/([\w-]+)(?:\/([\w-]+)(?:\/([\w-]+)(?:\/([\w-]+)(?:\/([\w-]+)(?:\/([\w-]+)(?:\/([\w-]+)(?:\/([\w-]+))?)?)?)?)?)?)?)?\.html$ 
    index.php?a=$1&b=$2&c=$3&d=$4&e=$5&f=$6&g=$7&h=$8&i=$9 [L]

Untested, but I think this does what you want. Note that it works for 1 to 9 parameters, it requires at least one. Also note that any GET parameter that was present in the original url could be removed this way.

like image 27
Arjan Avatar answered Apr 23 '26 09:04

Arjan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!