Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are ^$ and $^ in PHP regex the same?

Why do both of these regexes match successfully?

if(preg_match_all('/$^/m',"",$array))
  echo "Match";

if(preg_match_all('/$^\n$/m',"\n",$array))
  echo "Match";
like image 523
nEAnnam Avatar asked Jun 17 '11 18:06

nEAnnam


People also ask

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.

Is regex same in every language?

Regex is not a programming language-specific application; in fact, it can be used in all programming languages today.

What regex does PHP use?

PHP uses a C library called pcre to provide almost complete support for Perl's arsenal of regular expression features. Perl regular expressions act on arbitrary binary data, so you can safely match with patterns or strings that contain the NUL-byte ( \x00 ).


2 Answers

$ and ^ are zero-width meta-characters. Unlike other meta-characters like . which match one character at a time (unless used with quantifiers), they do not actually match literal characters. This is why ^$ matches an empty string "", even though the regex (sans delimiters) contains two characters while the empty string contains zero.

It doesn't matter that an empty string contains no characters. It still has a starting point and an ending point, and since it's an empty string both are at the same location. Therefore no matter the order or number of ^ and $ you use, all of their permutations should match the empty string.


Your second case is slightly trickier but the same principles apply.

The m modifier (PCRE_MULTILINE) just tells the PCRE engine to feed in the entire string at one go, regardless of newlines, but the string still comprises "multiple lines". It then looks at ^ and $ as "the start of a line" and "the end of a line" respectively.

The string "\n" is essentially logically split into three parts: "", "\n" and "" (because the newline is surrounded by emptiness... sounds poetic).

Then these matches follow:

  1. The first empty string is matched by the starting $^ (as I explain above).

  2. The \n is matched by the same \n in your regex.

  3. The second empty string is matched by the last $.

And that's how your second case results in a match.

like image 174
BoltClock Avatar answered Oct 05 '22 00:10

BoltClock


No it is not. Actually, the expression $^ should never match, because $ symbolizes the end of a string whereas ^ represents the beginning. But as we know, the end cannot come before the beginning of a string :)

^$ should match an empty string, and only that.

The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, [...]

From the PCRE manpages

Note that, by adding the PCRE_MULTILINE modifier, $ becomes EOL and ^ becomes BOL, it will match (thanks netcoder for pointing that out). Still, I personally wouldn't use it.

like image 45
fresskoma Avatar answered Oct 05 '22 02:10

fresskoma