I am pretty sure I have seen “\R
was introduced in Ruby2 to match newlines, despite where they came from: unix \n
, macos \r
or windows \r\n
” somewhere. That said, Ruby2 should treat \R
like %r{\r\n|\r|\n}
.
This works fine:
▶ "a\nb".match /\R/
#⇒ #<MatchData "\n">
▶ "a\rb".match /\R/
#⇒ #<MatchData "\r">
▶ "a\r\nb".match /\R/
#⇒ #<MatchData "\r\n">
even whether line endings/feeds are combined:
▶ "a\r\n\nb".match /\R{2}/
#⇒ #<MatchData "\r\n\n">
unless one tries to negate \R
:
▶ "a\nb".match /[^\R]+/
#⇒ #<MatchData "a\nb">
Negating \n
works fine though:
▶ "a\nb".match /[^\n]+/
#⇒ #<MatchData "a">
Unfortunately, \R
is enormously hard to google. Neither Regexp
rdoc nor Regular Expressions have a mention of it.
Would any regex guru drop an explanation here, so that it was at least easily googled?
Thanks in advance.
This is from the author: https://github.com/k-takata/Onigmo/blob/master/doc/RE#L101. It says
\R Linebreak
Unicode:
(?>\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}])
Not Unicode:
(?>\x0D\x0A|[\x0A-\x0D])
What seems relevant here to your question is that it is not a character group, but is a list of alternatives. Given that the sequence is not necessarily a single character, I guess it could not be made into a character group. This is probably interacting in peculiar way with negation, which is intended to be used only with characters and/or character groups.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With