Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP Regex: How to match \r and \n without using [\r\n]?

I have tested \v (vertical white space) for matching \r\n and their combinations, but I found out that \v does not match \r and \n. Below is my code that I am using..

$string = "
Test
";

if (preg_match("#\v+#", $string )) {
  echo "Matched";
} else {
  echo "Not Matched";
}

To be more clear, my question is, is there any other alternative to match \r\n?

like image 501
Jason OOO Avatar asked Sep 24 '13 17:09

Jason OOO


People also ask

How do you match a new line in regex?

"\n" matches a newline character.

How do you match everything including newline regex?

The dot matches all except newlines (\r\n). So use \s\S, which will match ALL characters.

What does \r do in regex?

Definition and Usage The \r metacharacter matches carriage return characters.

How do I match a character in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" .


3 Answers

PCRE and newlines

PCRE has a superfluity of newline related escape sequences and alternatives.

Well, a nifty escape sequence that you can use here is \R. By default \R will match Unicode newlines sequences, but it can be configured using different alternatives.

To match any Unicode newline sequence that is in the ASCII range.

preg_match('~\R~', $string);

This is equivalent to the following group:

(?>\r\n|\n|\r|\f|\x0b|\x85)

To match any Unicode newline sequence; including newline characters outside the ASCII range and both the line separator (U+2028) and paragraph separator (U+2029), you want to turn on the u (unicode) flag.

preg_match('~\R~u', $string);

The u (unicode) modifier turns on additional functionality of PCRE and Pattern strings are treated as (UTF-8).

The is equivalent to the following group:

(?>\r\n|\n|\r|\f|\x0b|\x85|\x{2028}|\x{2029})

It is possible to restrict \R to match CR, LF, or CRLF only:

preg_match('~(*BSR_ANYCRLF)\R~', $string);

The is equivalent to the following group:

(?>\r\n|\n|\r)

Additional

Five different conventions for indicating line breaks in strings are supported:

(*CR)        carriage return
(*LF)        linefeed
(*CRLF)      carriage return, followed by linefeed
(*ANYCRLF)   any of the three above
(*ANY)       all Unicode newline sequences

Note: \R does not have special meaning inside of a character class. Like other unrecognized escape sequences, it is treated as the literal character "R" by default.

like image 137
hwnd Avatar answered Oct 12 '22 14:10

hwnd


This doesn't answer the question for alternatives, because \v works perfectly well

\v matches any character considered vertical whitespace; this includes the platform's carriage return and line feed characters (newline) plus several other characters, all listed in the table below.

You only need to change "#\v+#" to either

  • "#\\v+#" escape the backslash

or

  • '#\v+#' use single quotes

In both cases, you will get a match for any combination of \r and \n.

Update:

Just to make the scope of \v clear in comparison to \R, from perlrebackslash

  • \R
    \R matches a generic newline; that is, anything considered a linebreak sequence by Unicode. This includes all characters matched by \v (vertical whitespace), ...
like image 24
Olaf Dietsche Avatar answered Oct 12 '22 15:10

Olaf Dietsche


If there is some strange requirement that prevents you from using a literal [\r\n] in your pattern, you can always use hexadecimal escape sequences instead:

preg_match('#[\xD\xA]+#', $string)

This is pattern is equivalent to [\r\n]+.

like image 40
p.s.w.g Avatar answered Oct 12 '22 13:10

p.s.w.g