Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex substitution using a list element with just 3 digits index doesn't work as I expected

Tags:

regex

perl

Today I encountered a twitter post told me another mysterious Perl behaviour. Could someone please tell me what's wrong with the 3rd statement in following script? I'm looking for relevant part of document in perldoc.

#!/usr/bin/perl

$x[1]    = "foo"; $_ = "foo"; s/$x[1]/bar/;    print "$_\n";
$x[10]   = "foo"; $_ = "foo"; s/$x[10]/bar/;   print "$_\n";
$x[100]  = "foo"; $_ = "foo"; s/$x[100]/bar/;  print "$_\n";
$x[1000] = "foo"; $_ = "foo"; s/$x[1000]/bar/; print "$_\n";

__END__
bar
bar
foo
bar

It seems like perl interpreter tends to separate $x from [100].

$x[100] = 'foo';
$_ = 'foo';
s/${x}[100]/bar/;
print "$_\n";

Edit

Thank you all. I found a documentation in the Camel Book, and it recommends exact same as @fred-gannet said. The factors of the heuristic are the number of character occurrences and the pruning strategy in the bracket.

https://books.google.com/books?id=xx5JBSqcQzIC&lpg=PR1&pg=PA65#v=onepage&q&f=false

Within search patterns, which also undergo double-quotish interpolation, there is an unfortunate ambiguity: is /$foo[bar]/ to be interpolated as /${foo}[bar]/ (where [bar] is character class for the regular expression) or as /${foo[bar]}/ (where [bar] is the subscript to array @foo)? If @foo doesn't otherwise exists, it's obviously a character class. If @foo exists, Perl takes a good guess about [bar], and is almost always right.† If it does guess wrong, or if you're just plain paranoid, you can force the correct interpolation with braces as shown earlier. Even if you're merely prudent, it's probably not a bad idea.

https://rt.perl.org/Public/Bug/Display.html?id=133027#txn-1542459

The code is in S_intuit_more().

https://github.com/Perl/perl5/blob/823ba440369100de3f2693420a3887a645a57d28/toke.c#L4207-L4217

if (*s == '$')
    weight -= 3;
else if (isDIGIT(*s)) {
    if (s[1] != ']') {
    if (isDIGIT(s[1]) && s[2] == ']')
        weight -= 10;
    }
    else
    weight -= 100;
}
Zero(seen,256,char);

And there is an explanation of the logic, in Japanese. (surprisingly!)

https://8-p.info/perl-interpolation/

like image 863
ernix Avatar asked Mar 26 '18 13:03

ernix


2 Answers

Apparently perl is getting confused between an array index and regular expression character sets (e.g. /[a-z]/). The behaviour is not consistent. The indexes of 100 to 998 seems to be effected by this. Please report the bug using the script perlbug.

like image 130
shawnhcorey Avatar answered Nov 17 '22 00:11

shawnhcorey


Expression evaluates consistently when bracketed as

s/${x[100]}/bar/;

Inconsistency of interpretation when index is 100..998 seems bug like.

like image 24
Fred Gannett Avatar answered Nov 17 '22 02:11

Fred Gannett