Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiline mode in Perl and Ruby different: Ruby is wrong?

Tags:

regex

ruby

perl

Let us look at Perl code and result:

$s = "a\nb\nc\n";
$s =~ s/^b/X/;
print $s;

a
b
c

$s = "a\nb\nc\n";
$s =~ s/^b/X/m;
print $s;

a
X
c

I think Perl is right, ^ matches the position after new line in the middle only when multiline is enabled.

Let us look at Ruby:

$s = "a\nb\nc\n"
print $s.sub(/^b/,'X')

a
X
c

$s = "a\nb\nc\n"
print $s.sub(/^b/m,'X')

a
X
c

The ^ matches the position after newline in the middle of text regardless if it is in multiline mode or not.

For the life of me, I cannot find Ruby documentation which defines what the multiline option will do, where is it?

Also Ruby has no Single line mode (s)?

undefined group option: /(?s)^b/

/^b./s will parse but it does not behave like Perl (. matches new line).

PS: I tested using Perl 5 and Ruby 3.0.

like image 641
puravidaso Avatar asked May 06 '26 02:05

puravidaso


1 Answers

Ruby and Perl's /m work differently.


Ruby's /m changes the behavior of only .. It is equivalent to Perl's /s.

  • Ruby /m: Treat a newline as a character matched by .

  • Perl /s: Treat the string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.

Perl's /m changes the behavior of ^ and $.

  • Perl /m: Treat the string being matched against as multiple lines. That is, change "^" and "$" from matching the start of the string's first line and the end of its last line to matching the start and end of each line within the string.

^ and $ always work this way in Ruby. Ruby effectively always has Perl's /m.

Ruby and Perl both share \A, \z, and \Z to match at the beginning of the string, end of the string, or just before the final newline.

Which is correct? Neither, they do their own thing. Perl's default behavior for ^ and $ is the same as POSIX regular expressions, but they are incompatible in other ways. Python uses the equivalent of Perl's multi and single-line modes (MULTILINE and DOTALL). Ruby simplifies the behavior of ^ and $ and makes regexes more explicit.

See Also

  • Ruby Regexp Anchors
  • Ruby Regexp Options
  • Perl Regexp Metacharacters
  • Perl Regexp Modifiers Overview
like image 198
Schwern Avatar answered May 08 '26 17:05

Schwern



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!