Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby regex ‘backslash R’ aka ‘\R’ pattern

Tags:

regex

ruby

I am pretty sure I have seen \R was introduced in Ruby2 to match newlines, despite where they came from: unix \n, macos \r or windows \r\n somewhere. That said, Ruby2 should treat \R like %r{\r\n|\r|\n}.

This works fine:

▶ "a\nb".match /\R/
#⇒ #<MatchData "\n">
▶ "a\rb".match /\R/
#⇒ #<MatchData "\r">
▶ "a\r\nb".match /\R/
#⇒ #<MatchData "\r\n">

even whether line endings/feeds are combined:

▶ "a\r\n\nb".match /\R{2}/
#⇒ #<MatchData "\r\n\n">

unless one tries to negate \R:

▶ "a\nb".match /[^\R]+/
#⇒ #<MatchData "a\nb">

Negating \n works fine though:

▶ "a\nb".match /[^\n]+/
#⇒ #<MatchData "a">

Unfortunately, \R is enormously hard to google. Neither Regexp rdoc nor Regular Expressions have a mention of it.

Would any regex guru drop an explanation here, so that it was at least easily googled?

Thanks in advance.

like image 654
Aleksei Matiushkin Avatar asked Feb 15 '15 07:02

Aleksei Matiushkin


1 Answers

This is from the author: https://github.com/k-takata/Onigmo/blob/master/doc/RE#L101. It says

\R       Linebreak

         Unicode:
           (?>\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}])

         Not Unicode:
           (?>\x0D\x0A|[\x0A-\x0D])

What seems relevant here to your question is that it is not a character group, but is a list of alternatives. Given that the sequence is not necessarily a single character, I guess it could not be made into a character group. This is probably interacting in peculiar way with negation, which is intended to be used only with characters and/or character groups.

like image 88
sawa Avatar answered Sep 27 '22 20:09

sawa