Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do we need to escape the ! < > : = - in php regular expressions?

Tags:

regex

php

web

http://php.net/manual/en/function.preg-quote.php:

The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -

However this page says that special characters are [ \ ^ $ . | ? * + ( )

Ok I know that the first page is specifically on php regular expressions. However why do we need to escape the !, <, >, :, =, - ?

I tried to do a preg_match without escaping <, >, - and ! and everything is working perfectly.

like image 848
Pacerier Avatar asked Sep 10 '11 14:09

Pacerier


2 Answers

Those characters are metacharacters, but they need no escaping. What they do have in common is that they occur in special grouping constructs:

(?:...)      # non-capturing group
(?=...)      # positive lookahead
(?!...)      # negative lookahead
(?<name>...) # named capturing groups
(?<=...)     # positive lookbehind
(?<!...)     # negative lookbehind
(?>...)      # atomic group

But they only take on a special meaning in this context. So if you take any string and escape all these characters: [\^$.|?*+(){, then you get a regex that will exactly match the string character by character because those other metacharacters can never be in a meta-context.

For example, the ] is only a metacharacter if there was a previous unescaped [ that opened a character class.

Similarly, the - is only a metacharacter in a character class, meaning "range" as in [a-z] (or a literal - as in [abc-].

So to escape the string [tag-soup] you just need to escape the [. Outside of a character class, ] and - are simply treated as literals.

In summary, if you take a string and escape all the "unconditional" metacharacters ([\^$.|?*+(){) then you get a regex that will exactly match the string character by character.

like image 52
Tim Pietzcker Avatar answered Oct 13 '22 00:10

Tim Pietzcker


The page you link to is titled "basic regex syntax". There is a link to a page titled "advanced regex syntax". Here all the extra characters you specify are used.

  • ! is used for negative lookaheads and lookbehinds
  • < is used for lookbehinds
  • > is used for atomic groups
  • : is used for setting flags for only a section of a regex
  • = is used for positive lookaheads and lookbehinds
  • - is used for character ranges and adjusting flags
like image 25
lonesomeday Avatar answered Oct 12 '22 23:10

lonesomeday