Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retaining captures with the Perl substitution operator

Can someone explain why the following code...

#!/opt/local/bin/perl
use strict;
use warnings;

my $string;

$string = "\t\t\tEntry";
print "String: >$string<\n";

$string =~ s/^(\t*)//gi;

print "\$1: >$1<\n";
print "String: >$string<\n";
print "\n";

$string = "\t\t\tEntry";

$string =~ s/^(\t*)([^\t]+)/$2/gi;

print "\$1: >$1<\n";
print "String: >$string<\n";
print "\n";

exit 0;

...produces the following output...

String: >           Entry<
Use of uninitialized value in concatenation (.) or string at ~/sandbox.pl line 12.
$1: ><
String: >Entry<

$1: >           <
String: >Entry<

...or more directly: Why is the matched value in the first substitution not retained in $1?

like image 628
theraccoonbear Avatar asked Mar 29 '11 17:03

theraccoonbear


1 Answers

I tried this on two implementations of Perl 5.12, and did not encounter the problem. 5.8 did.

Because you have the g options, perl tries to match the pattern until it fails. See the debug output below.

So it doesn't work in Perl 5.8, but this does:

my $c1;
$string =~ s/^(\t*)/$c1=$1;''/ge;

Thus each time it matches, it saves it to $c1.

This is what use re 'debug' tells me:

Compiling REx `^(\t*)'
size 9 Got 76 bytes for offset annotations.
first at 2
   1: BOL(2)
   2: OPEN1(4)
   4:   STAR(7)
   5:     EXACT <\t>(0)
   7: CLOSE1(9)
   9: END(0)
anchored(BOL) minlen 0
Offsets: [9]
        1[1] 2[1] 0[0] 5[1] 3[1] 0[0] 6[1] 0[0] 7[0]
Compiling REx `^(\t*)([^\t]+)'
size 25 Got 204 bytes for offset annotations.
first at 2
   1: BOL(2)
   2: OPEN1(4)
   4:   STAR(7)
   5:     EXACTF <\t>(0)
   7: CLOSE1(9)
   9: OPEN2(11)
  11:   PLUS(23)
  12:     ANYOF[\0-\10\12-\377{unicode_all}](0)
  23: CLOSE2(25)
  25: END(0)
anchored(BOL) minlen 1
Offsets: [25]
        1[1] 2[1] 0[0] 5[1] 3[1] 0[0] 6[1] 0[0] 7[1] 0[0] 13[1] 8[5] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 14[1] 0[0] 15[0]
String: >                       Entry<
Matching REx `^(\t*)' against `                 Entry'
  Setting an EVAL scope, savestack=5
   0 <> <                       Entry>        |  1:  BOL
   0 <> <                       Entry>        |  2:  OPEN1
   0 <> <                       Entry>        |  4:  STAR
                           EXACT <\t> can match 3 times out of 2147483647...
  Setting an EVAL scope, savestack=5
   3 <                  > <Entry>        |  7:    CLOSE1
   3 <                  > <Entry>        |  9:    END
Match successful!
match pos=0
Use of uninitialized value in substitution iterator at - line 11.
Matching REx `^(\t*)' against `Entry'
  Setting an EVAL scope, savestack=5
   3 <                  > <Entry>        |  1:  BOL
                            failed...
Match failed
Freeing REx: `"^(\\t*)"'
Freeing REx: `"^(\\t*)([^\\t]+)"'

Because you are trying to match whitespace at the beginning of the line, you need neither the g nor the i. So it might be a case where you're trying to do something else.

like image 69
Axeman Avatar answered Sep 29 '22 10:09

Axeman