Ruby regex ‘backslash R’ aka ‘\R’ pattern

Question

I am pretty sure I have seen “\R was introduced in Ruby2 to match newlines, despite where they came from: unix , macos or windows ” somewhere. That said, Ruby2 should treat \R like %r{ | | }.

This works fine:

▶ "a
b".match /\R/
#⇒ #<MatchData "
">
▶ "a
b".match /\R/
#⇒ #<MatchData "
">
▶ "a
b".match /\R/
#⇒ #<MatchData "
">

even whether line endings/feeds are combined:

▶ "a

b".match /\R{2}/
#⇒ #<MatchData "

">

unless one tries to negate \R:

▶ "a
b".match /[^\R]+/
#⇒ #<MatchData "a
b">

Negating works fine though:

▶ "a
b".match /[^
]+/
#⇒ #<MatchData "a">

Unfortunately, \R is enormously hard to google. Neither Regexp rdoc nor Regular Expressions have a mention of it.

Would any regex guru drop an explanation here, so that it was at least easily googled?

Thanks in advance.

sawa · Accepted Answer

This is from the author: https://github.com/k-takata/Onigmo/blob/master/doc/RE#L101. It says

\R       Linebreak

         Unicode:
           (?>\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}])

         Not Unicode:
           (?>\x0D\x0A|[\x0A-\x0D])

What seems relevant here to your question is that it is not a character group, but is a list of alternatives. Given that the sequence is not necessarily a single character, I guess it could not be made into a character group. This is probably interacting in peculiar way with negation, which is intended to be used only with characters and/or character groups.

Ruby regex ‘backslash R’ aka ‘\R’ pattern

Tags:

regex

ruby

Aleksei Matiushkin

1 Answers

sawa

Recent Activity

Donate For Us

Ruby regex ‘backslash R’ aka ‘\R’ pattern

Tags:

regex

ruby

Aleksei Matiushkin

1 Answers

sawa

Related questions

Recent Activity

Donate For Us