Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace one or two consecutive line breaks in a string?

I'm developing a single serving site in PHP that simply displays messages that are posted by visitors (ideally surrounding the topic of the website). Anyone can post up to three messages an hour.

Since the website will only be one page, I'd like to control the vertical length of each message. However, I do want to at least partially preserve line breaks in the original message. A compromise would be to allow for two line breaks, but if there are more than two, then replace them with a total of two line breaks in a row. Stack Overflow implements this.

For example:

Porcupines\nare\n\n\n\nporcupiney.

would be changed to

Porcupines<br />are<br /><br />porcupiney.

One tricky aspect of checking for line breaks is the possibility of their being collected and stored as \r\n, \r, or \n. I thought about converting all line breaks to <br />s using nl2br(), but that seemed unnecessary.

My question: Using regular expressions in PHP (with functions like preg_match() and preg_replace()), how can I check for instances of more than two line breaks in a row (with or without blank space between them) and then change them to a total of two line breaks?

like image 293
tevan Avatar asked May 03 '09 01:05

tevan


1 Answers

\R is the system-agnostic escape sequence which will match \n, \r and \r\n.

Because you want to greedily match 1 or 2 consecutive newlines, you will need to use a limiting quantifier {1,2}.

Code: (Demo)

$string = "Porcupines\nare\n\n\n\nporcupiney.";

echo preg_replace('~\R{1,2}~', '<br />', $string);

Output:

Porcupines<br >are<br /><br />porcupiney.

Now, to clarify why/where the other answers are incorrect...

@DavidZ's unexplained answer fails to replace the lone newline character (Demo of failure) because of the incorrect quantifier expression.

It generates:

Porcupines\nare<br/><br/>porcupiney.

The exact same result can be generated by @chaos's code-only answer (Demo of failure). Not only is the regular expression long-winded and incorrectly implementing the quantifier logic, it is also adding the s pattern modifier.

The s pattern modifier only has an effect on the regular expression if there is a dot metacharacter in the pattern. Because there is no . in the pattern, the modifier is useless and is teaching researchers meaningless/incorrect coding practices.

like image 124
mickmackusa Avatar answered Sep 17 '22 07:09

mickmackusa