With PCRE regular expressions in PHP, multi-line mode (/m
) enables ^
and $
to match the start and end of lines (separated by newlines) in the source text, as well as the start and end of the source text.
This appears to work great on Linux with \n
(LF) being the newline separator, but fails on Windows with \r\n
(CRLF).
Is there any way to change what PCRE thinks are newlines? Or to perhaps allow it to match either CRLF or LF in the same way that $
matches the end of line/string?
EXAMPLE:
$EOL = "\n"; // Linux LF
$SOURCE_TEXT = "one{$EOL}two{$EOL}three{$EOL}four";
if (preg_match('/^two$/m',$SOURCE_TEXT)) {
echo 'Found match.'; // <<< RESULT
} else {
echo 'Did not find match!';
}
RESULT: Success
$EOL = "\r\n"; // Windows CR+LF
$SOURCE_TEXT = "one{$EOL}two{$EOL}three{$EOL}four";
if (preg_match('/^two$/m',$SOURCE_TEXT)) {
echo 'Found match.';
} else {
echo 'Did not find match!'; // <<< RESULT
}
RESULT: Fail
Did you try the (*CRLF)
and related modifiers? They are detailed on Wikipedia here (under Newline/linebreak options) and seem to do the right thing in my testing. i.e. '/(*CRLF)^two$/m'
should match the windows \r\n
newlines. Also (*ANYCRLF)
should match both linux and windows but I haven't tested this.
Note: The answer is only applicable to older PHP versions, when I wrote it, I was not aware of the sequences and modifiers that are available:
\R
,(*BSR_ANYCRLF)
and(*BSR_UNICODE)
. See as well the answer to: How to replace different newline styles in PHP the smartest way?
In PHP it's not possible to specify the newline character-sequence(s) for PCRE regex patterns. The m
modifier is looking for \n
only, that's documented. And there is no runtime setting available to make a change which would be possible in perl but that's not an option with PHP.
I normally just modify the string prior using it with preg_match
and the like:
$subject = str_replace("\r\n", "\n", $subject);
This might not be exactly what you're looking for but probably it helps.
Edit: Regarding the windows EOL example you've added to your question:
$EOL = "\r\n"; // Windows CR+LF
$SOURCE_TEXT = "one{$EOL}two{$EOL}three{$EOL}four";
if (preg_match('/^two$/m',$SOURCE_TEXT)) {
echo 'Found match.';
} else {
echo 'Did not find match!'; // <<< RESULT
}
This fails because in the text, there is a \r
after two
. So two
is not at the end of a line, there is an additional character, \r
before the end of the line ($
).
The PHP manual clearly explains that only \n
is considered as the character that specifies a line ending. $
does consider \n
only, so if you're looking for two\r
at the end of a line, you need to change your pattern. That's the other option (instead of converting the text as suggested above).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With