Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Same regex doesn't match twice

Tags:

regex

perl

Trying to solve a problem in my perl script I finally could break it down to this situation:

my $content = 'test';
if($content =~ m/test/g) {
    print "1\n";
} 
if($content =~ m/test/g) {
    print "2\n";
} 
if($content =~ m/test/g) {
    print "3\n";
} 

Output:

1
3

My real case is just a bit different but at the end it's the same thing: I'm confused why regex 2 isn't matching. Does anyone has an explanation for this? I realized that /g seems to be the reason and of course this is not needed in my example. But (why) is this output normal behaviour?

like image 832
DOB Avatar asked Dec 01 '16 00:12

DOB


2 Answers

This is exactly what /g in scalar context is supposed to do.

The first time it matches "test". The second match tries to start matching in the string after where the previous match left off, and fails. The third match then tries again from the beginning of the string (and succeeds) because the second match failed and you didn't also specify /c.

(/c keeps it from restarting at the beginning if a match fails; if your second match was /test/gc, the second and third match would both fail.)

like image 58
ysth Avatar answered Oct 04 '22 02:10

ysth


Generally speaking, if (/.../g) makes no sense and should be replaced with if (/.../)[1].


You wouldn't expect the following to match twice:

my $content = "test";
while ($content =~ /test/g) {
   print(++$i, "\n");
}

So why would you expect the following to match twice:

my $content = "test";
if ($content =~ /test/g) {
   print(++$i, "\n");
}

if ($content =~ /test/g) {
   print(++$i, "\n");
}

They're the same!


Let's imagine $content contains testtest.

  1. The 1st time $content =~ /test/g is evaluated in scalar context,
    it matches the first test.
  2. The 2nd time $content =~ /test/g is evaluated in scalar context,
    it matches the second test.
  3. The 3rd time $content =~ /test/g is evaluated in scalar context,
    it returns false to indicate there are no more matches.
    This also resets the position at which $content future matches will start.
  4. The 4th time $content =~ /test/g is evaluated in scalar context,
    it matches the first test.
  5. ...

  1. There are advanced uses for if (/\G.../gc), but that's different. if (/.../g) only makes sense if you're unrolling a while loop. (e.g. while (1) { ...; last if !/.../g; ... }).
like image 44
ikegami Avatar answered Oct 04 '22 01:10

ikegami