Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing `^` from `s/^/1/;` causes my code to fail. Why?

Tags:

regex

perl

I've been working on this problem over at the code golf exchange which is why my code looks so funny.

Here's a program with use strict and use warnings that recreates the problem:

use strict;
use warnings;

$_ = "";

for my $i (1..33){
    s//1/;   # Just prepends 1 to the string $_
}
print "$_\n";

for my $i (34..127) {
    if( chr(y/1/1/) !~ /[!"'()*+,-.\/12357:;<=>?CEFGHIJKLMNSTUVWXYZ[\\\]^_`cfhijklmnrstuvwxyz{|}~]/ ) {
        print chr y/1/1/;
    }
    s/^/1/;   # Prepends 1 to the start of the string.
}

Here is the output:

111111111111111111111111111111111
#$%&04689@ABDOPQRabdegopq

This works as I would expect. However, when I take ^ out of the second regex, the regex no longer matches and lengthens the string.

use strict;
use warnings;

$_ = "";

for my $i (1..33){
    s//1/;
}
print "$_\n";

for my $i (34..127) {
    if( chr(y/1/1/) !~ /[!"'()*+,-.\/12357:;<=>?CEFGHIJKLMNSTUVWXYZ[\\\]^_`cfhijklmnrstuvwxyz{|}~]/ ) {
        print chr y/1/1/;
    }
    s//1/;   # No Longer matches!
}

Why does this happen? s//1/ works in the first loop, so why does changing it in the second one break everything?

For an additional point of confusion, if you put the if block in braces, the regex matches again:

for my $i (34..127) {
    {
        if( chr(y/1/1/) !~ /[!"'()*+,-.\/12357:;<=>?CEFGHIJKLMNSTUVWXYZ[\\\]^_`cfhijklmnrstuvwxyz{|}~]/ ) {
            print chr y/1/1/;
        }
    }
    s//1/;   # This prepends 1 to the string $_ again.
}

edit:

I wanted to edit my original code back into the question for reference:

use strict;
use warnings;
$_="";
until( y/1/1/ > 32){
    print "test1";
    s//1/;
    print "test";
}
print "$_\n";
until( y/1/1/ > 125+1 ) {
    if( chr(y/1/1/) !~ /[!"'()*+,-.\/12357:;<=>?CEFGHIJKLMNSTUVWXYZ[\\\]^_`cfhijklmnrstuvwxyz{|}~]/ ) {
        print chr y/1/1/;
    }

    s/^/1/; # this is the line we remove ^ from
}

When we remove ^ from the line, the output changes from:

test1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1testtest1test111111111111111111111111111111111
#$%&04689@ABDOPQRabdegopq

to

hanging with no output

So in this case, the line change in the second loop changes the behavior of the first one it seems.

like image 932
hmatt1 Avatar asked Jul 29 '14 23:07

hmatt1


2 Answers

s//1/; does not check for any or empty string. It checks against the last successful regex text before. So, the first loop uses default regex and the second one uses the last successful check from the if above.

Quote:

If the PATTERN evaluates to the empty string, the last successfully matched regular expression is used instead. In this case, only the g and c flags on the empty pattern are honored

Please, see The empty pattern //

like image 159
VladimirM Avatar answered Oct 26 '22 21:10

VladimirM


To expand on VladimirM answer

print "regex have dynamic scope\n";
$_ = 1;
{
    m/1/;
    s//2/;
    print "$_  one becomes two, s//2/ is really s/1/2/\n";
}
$_=1;
{
    m/1/;
    {
        s//2/;
    }
    print "$_  one still becomes two, s//2/ is really s/1/2/\n";
}

$_=1;
{
    {
        m/1/;
    }
    s//2/;
    print "$_  one becomes twentyone, s//2/; is really s/(?:)//2;\n";
}

__END__
regex have dynamic scope
2  one becomes two, s//2/ is really s/1/2/
2  one still becomes two, s//2/ is really s/1/2/
21  one becomes twentyone, s//2/; is really s/(?:)//2;

since regex have dynamic scope, using The empty pattern // really means using the previous pattern from same dynamic scope so don't do that :)

If you add use re 'debug'; you can see the regex engine use the previous pattern (focus on Matching REx statements, NOTHING(2) is empty without previous, EXACT <1>(3) is the previous pattern)

regex have dynamic scope
Guessing start of match in sv for REx "1" against "1"
Found anchored substr "1" at offset 0...
Guessed: match at offset 0
Guessing start of match in sv for REx "1" against "1"
Found anchored substr "1" at offset 0...
Guessed: match at offset 0
Matching REx "1" against "1"
   0 <> <1>                  |  1:EXACT <1>(3)
   1 <1> <>                  |  3:END(0)
Match successful!
2  one becomes two, s//2/ is really s/1/2/
Guessing start of match in sv for REx "1" against "1"
Found anchored substr "1" at offset 0...
Guessed: match at offset 0
Guessing start of match in sv for REx "1" against "1"
Found anchored substr "1" at offset 0...
Guessed: match at offset 0
Matching REx "1" against "1"
   0 <> <1>                  |  1:EXACT <1>(3)
   1 <1> <>                  |  3:END(0)
Match successful!
2  one still becomes two, s//2/ is really s/1/2/
Guessing start of match in sv for REx "1" against "1"
Found anchored substr "1" at offset 0...
Guessed: match at offset 0
Matching REx "" against "1"
   0 <> <1>                  |  1:NOTHING(2)
   0 <> <1>                  |  2:END(0)
Match successful!
21  one becomes twentyone, s//2/; is really s/(?:)//2;

update: because you have an infinite loop; last pattern always has 1 in it, so the substitution is essentially s/1/1/; which means your string doesn't grow, its always 33 chars ... see update :)

$_="";
until( y/1/1/ > 32){
    print "test1";
    s//1/;
    print "test";
}
print "$_\n";
my $max = 126;
my $count = 0;
my $reps = 0;
until( y/1/1/ > 125+1 ) {
    if( chr(y/1/1/) !~ /[!"'()*+,-.\/12357:;<=>?CEFGHIJKLMNSTUVWXYZ[\\\]^_`cfhijklmnrstuvwxyz{|}~]/ ) {
        print chr y/1/1/;
    }
$reps =
#~     s/^/1/; # win
    s//1/; # fail
    $count++;
    last if $count > $max;
}
print "m $max c $count r $reps l @{[ length $_ ]}\n";
__END__
win #$%&04689@ABDOPQRabdegopqm 126 c 94 r 1 l 127
fail m 126 c 127 r 1 l 33

Unless you're obfuscating append is $_ .= 1; and prepend is $_ = 1 . $_;

like image 31
optional Avatar answered Oct 26 '22 21:10

optional