Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex \R doesn't work inside character class

Tags:

regex

php

In PHP, the escape character \R which should match any newline sequence doesn't work inside a character class.

I recently learned about this special character on another answer here on stackoverflow and to be honest I haven't been able to find much online to document it's existence - nowhere on php.net is it mentioned except in a user comment.

Question(s):

  • Why won't \R work inside a character class?
  • Where is it documented?

EXAMPLE 1: (https://regex101.com/r/vA8xV3/3)

$a = "line1
      line2"

echo preg_replace('/\R/',' ',$a);

Returns (finds match, replace with single space):

line1 line2

EXAMPLE 2: (https://regex101.com/r/vA8xV3/2)

$a = "line1
      line2"

echo preg_replace('/[\R]/',' ',$a);

Returns (no match):

line1
line2
like image 502
Eaten by a Grue Avatar asked May 07 '15 13:05

Eaten by a Grue


People also ask

How do I escape a character in R?

Most characters can be used in a string, with a couple of exceptions, one being the backslash character, " \ ". This character is called the escape character and is used to insert characters that would otherwise be difficult to add.

What does \d mean in regex?

In regex, the uppercase metacharacter denotes the inverse of the lowercase counterpart, for example, \w for word character and \W for non-word character; \d for digit and \D or non-digit.

What does the R do in regex?

Definition and Usage The \r metacharacter matches carriage return characters.

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .


2 Answers

From the PCRE manual:

Escape sequences in character classes

All the sequences that define a single character value can be used both inside and outside character classes. In addition, inside a characterclass, \b is interpreted as the backspace character (hex 08).

\N is not allowed in a character class. \B, \R, and \X are not special inside a character class. Like other unrecognized escape sequences,they are treated as the literal characters "B", "R", and "X" by default, but cause an error if the PCRE_EXTRA option is set. Outside acharacter class, these sequences have different meanings.

(emphasis on relevant bit added by me)

like image 93
Anthony Avatar answered Sep 23 '22 04:09

Anthony


This is correct behavior. \R only works outside character class. (At least this is true in grep an many others)

For grep:

https://stat.ethz.ch/R-manual/R-devel/library/base/html/regex.html

PHP uses perl-like expressions, so see peardoc:

http://perldoc.perl.org/perlrebackslash.html#Misc

Since \R can match a sequence of more than one character, it cannot be put inside a bracketed character class; /[\R]/ is an error; use \v instead

like image 27
D. Cichowski Avatar answered Sep 21 '22 04:09

D. Cichowski