Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.htaccess regular expression difference/pros/cons

I have a bunch of rules in my .htaccess (sub-domains, folders, users specific folders etc...)

and I am using now this regular expression:

([a-z0-9A-Z])

I was looking for a specific rule and i found multiple way to build it and i was wondering if there's a standard practice for these? what are the difference/pros/cons of using something like:

  • ([^.]+)
  • ([^/]+)
  • (.*)
  • ([a-z0-9]+)
like image 645
Maggie Avatar asked Feb 13 '12 00:02

Maggie


1 Answers

Let's say we have this .htaccess:

RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?request=$1 [L]

The expression mentionned in your question will have the following logic:

^(.*)$

  • . : match any character and any single character
  • * : match zero or more of the previous symbol

Basically it will match anything like:

  • folder1/file1.html: $1 will folder1/file1.html
  • file1.html: $1 will be file1.html

This way it is very easy to parse the entire request in PHP or Python. On the other hand, you don't filter any unwanted characters in the URL which you will have to validate in your script.

Example: =@*-+

([^.]+)

  • [] : match any of the symbols inside the square braces
  • [^] : match any character other than what is listed inside the braces (ref).
  • + : match one or more of the previous symbol
  • [^.] : match anything other than . character. Will stop matching when a . character is found

From ref.

The only special characters or metacharacters inside a character class are the closing bracket (]), the backslash (), the caret (^) and the hyphen (-). The usual metacharacters are normal characters inside a character class, and do not need to be escaped by a backslash. To search for a star or plus, use [+*]. Your regex will work fine if you escape the regular metacharacters inside a character class, but doing so significantly reduces readability.

Basically it will match anything like:

  • folder1/file1.html: $1 will folder1/file1
  • file1.html: $1 will be file1

This as the same effect as the first one except this strip everything after the dot .

^([^/]+)$

  • []: match any of the symbols inside the square braces
  • +: match one or more of the previous symbol
  • ^: match the start of a string
  • [^/] : match anything other than / character. Will stop matching when a / character is found

This as the same effect as the first one except this will check any request up to the /. So if you have multiple folders you will have to include multiple times this regex.

Basically it will match anything like (if you have only one set):

  • folder1/file1.html: $1 will folder1
  • file1.html: $1 will be file1.html

and if you have 2:

  • folder1/file1.html: $1 will folder1 and $2 will match file1.html
  • file1.html: $1 will be file1.html

The more folders you have, the more rule you might have to add.

^([a-z0-9]+)$ [ ^([a-z0-9.]+)$ for this example ]

  • [] : match any of the symbols inside the square braces
  • + : match one or more of the previous symbol
  • a-z : match letters from a to z
  • 0-9 : match numbers from 0-9

(You can also use the \d or \w)

Basically it will match anything like (if you have only one set - added the dot):

  • folder1/file1.html: $1 will folder1
  • file1.html: $1 will be file1.html

and if you have 2:

  • folder1/file1.html: $1 will folder1 and $2 will match file1.html
  • file1.html: $1 will be file1.html

This one works like the previous except you have to specify which characters you want. Therefore, when you check your string in PHP you know which characters you get. Like in my example with the file name I had to add the \. so it recognise the dot. This one is also faster to execute.

See the benchmark: .htaccess mod_rewrite performance

So, if you know what type of request you will get you can always use the the last one but if you are not sure, you will have to pick the one that suits more your need. There's might be more difference between all of them but the primary objective understanding these regular expression is to understand what they do or catch. In addition, performance is something you need to take in consideration. Matching everything then parsing the request in PHP or Python might take longer than simply match them at first and simply use them in your script.

like image 160
Book Of Zeus Avatar answered Oct 22 '22 22:10

Book Of Zeus