Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl: Multiple global "or"-separated regex conditions in while block leads to an infinite loop?

I'm learning Perl and noticed a rather peculiar quirk -- attempting to match one of multiple regex conditions in a while loop results in that loop going on for infinity:

#!/usr/bin/perl

my $hivar = "this or that";

while ($hivar =~ m/this/ig || $hivar =~ m/that/ig) {
        print "$&\n";
}

The output of this program is:

this
that
that
that
that
[...]

I'm wondering why this is? Are there any workarounds that are less clumsy than this:

#!/usr/bin/perl

my $hivar = "this or that";

while ($hivar =~ m/this|that/ig) {
        print "$&\n";
}

This is a simplification of a real-world problem I am encountering, and while I am interested in this in a practical standpoint, I also would like to know what behind-the-scenes is triggering this behavior. This is a question that doesn't seem to be very Google-compatible.

Thanks!

Tom

like image 567
Tom Corelis Avatar asked Jun 26 '10 03:06

Tom Corelis


1 Answers

The thing is that there's a hidden value associated with each string, not with each match, that controls where a /g match will attempt to continue, and accessible through pos($string). What happens is:

  1. pos($hivar) is 0, /this/ matches at position 0 and resets pos($hivar) to 4. The second match isn't attempted because the or operator is already true. $& becomes "this" and gets printed.
  2. pos($hivar) is 4, /this/ fails to match because there's no "this" at position 4 or beyond. The failing match resets pos($hivar) to 0.
  3. /that/ matches at position 6 and resets pos($hivar) to 10. $& becomes "that" and gets printed.
  4. pos($hivar) is 10, /this/ fails to match because there's no "this" at position 10 or beyond. The failing match resets pos($hivar) to 0.
  5. /that/ matches at position 6 and resets pos($hivar) to 10. $& becomes "that" and gets printed.

and steps 4 and 5 repeat indefinitely.

Adding the c regex flag (which tells the engine not to reset pos on a failed match) solves the problem in the example code you provided, but it might or might not be the ideal solution to a more complex problem.

like image 191
hobbs Avatar answered Nov 15 '22 02:11

hobbs