Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Including new lines in PHP preg_replace function

Tags:

regex

php

newline

I'm trying to match a string that may appear over multiple lines. It starts and ends with a specific string:

{a}some string
can be multiple lines
{/a}

Can I grab everything between {a} and {/a} with a regex? It seems the . doesn't match new lines, but I've tried the following with no luck:

$template = preg_replace( $'/\{a\}([.\n]+)\{\/a\}/', 'X', $template, -1, $count );
echo $count; // prints 0

It matches . or \n when they're on their own, but not together!

like image 370
DisgruntledGoat Avatar asked Mar 29 '09 23:03

DisgruntledGoat


2 Answers

Use the s modifier:

$template = preg_replace( $'/\{a\}([.\n]+)\{\/a\}/s', 'X', $template, -1, $count );
//                                                ^
echo $count;
like image 118
strager Avatar answered Sep 29 '22 02:09

strager


I think you've got more problems than just the dot not matching newlines, but let me start with a formatting recommendation. You can use just about any punctuation character as the regex delimiter, not just the slash ('/'). If you use another character, you won't have to escape slashes within the regex. I understand '%' is popular among PHPers; that would make your pattern argument:

'%\{a\}([.\n]+)\{/a\}%'

Now, the reason that regex didn't work as you intended is because the dot loses its special meaning when it appears inside a character class (the square brackets)--so [.\n] just matches a dot or a linefeed. What you were looking for was (?:.|\n), but I would have recommended matching the carriage-return as well as the linefeed:

'%\{a\}((?:.|[\r\n])+)\{/a\}%'

That's because the word "newline" can refer to the Unix-style "\n", Windows-style "\r\n", or older-Mac-style "\r". Any given web page may contain any of those or a mixture of two or more styles; a mix of "\n" and "\r\n" is very common. But with /s mode (also known as single-line or DOTALL mode), you don't need to worry about that:

'%\{a\}(.+)\{/a\}%s'

However, there's another problem with the original regex that's still present in this one: the + is greedy. That means, if there's more than one {a}...{/a} sequence in the text, the first time your regex is applied it will match all of them, from the first {a} to the last {/a}. The simplest way to fix that is to make the + ungreedy (a.k.a, "lazy" or "reluctant") by appending a question mark:

'%\{a\}(.+?)\{/a\}%s'

Finally, I don't know what to make of the '$' before the opening quote of your pattern argument. I don't do PHP, but that looks like a syntax error to me. If someone could educate me in this matter, I'd appreciate it.

like image 20
Alan Moore Avatar answered Sep 29 '22 00:09

Alan Moore