Why do both of these regexes match successfully?
if(preg_match_all('/$^/m',"",$array))
echo "Match";
if(preg_match_all('/$^\n$/m',"\n",$array))
echo "Match";
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.
Regex is not a programming language-specific application; in fact, it can be used in all programming languages today.
PHP uses a C library called pcre to provide almost complete support for Perl's arsenal of regular expression features. Perl regular expressions act on arbitrary binary data, so you can safely match with patterns or strings that contain the NUL-byte ( \x00 ).
$
and ^
are zero-width meta-characters. Unlike other meta-characters like .
which match one character at a time (unless used with quantifiers), they do not actually match literal characters. This is why ^$
matches an empty string ""
, even though the regex (sans delimiters) contains two characters while the empty string contains zero.
It doesn't matter that an empty string contains no characters. It still has a starting point and an ending point, and since it's an empty string both are at the same location. Therefore no matter the order or number of ^
and $
you use, all of their permutations should match the empty string.
Your second case is slightly trickier but the same principles apply.
The m
modifier (PCRE_MULTILINE
) just tells the PCRE engine to feed in the entire string at one go, regardless of newlines, but the string still comprises "multiple lines". It then looks at ^
and $
as "the start of a line" and "the end of a line" respectively.
The string "\n"
is essentially logically split into three parts: ""
, "\n"
and ""
(because the newline is surrounded by emptiness... sounds poetic).
Then these matches follow:
The first empty string is matched by the starting $^
(as I explain above).
The \n
is matched by the same \n
in your regex.
The second empty string is matched by the last $
.
And that's how your second case results in a match.
No it is not. Actually, the expression $^
should never match, because $
symbolizes the end of a string whereas ^
represents the beginning. But as we know, the end cannot come before the beginning of a string :)
^$
should match an empty string, and only that.
The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, [...]
From the PCRE manpages
Note that, by adding the PCRE_MULTILINE
modifier, $
becomes EOL and ^
becomes BOL, it will match (thanks netcoder for pointing that out). Still, I personally wouldn't use it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With