Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quote - capture - question

Tags:

regex

perl

quote

Could someone explain, why I can use $1 two times and get different results?

perl -wle '"ok" =~ /(.*)/; sub { "huh?" =~ /(.*)/; print for @_ }->( "$1", $1 )'

(Found in: How to exclude submatches in Perl?)

like image 929
sid_com Avatar asked May 14 '11 05:05

sid_com


2 Answers

The @_ argument array doesn't behave the way you think it does. The values in @_ in a subroutine are actually aliases for the real arguments:

The array @_ is a local array, but its elements are aliases for the actual scalar parameters.

When you say this:

sub s {
    "huh?" =~ /(.*)/;
    print for @_;
}

"ok" =~ /(.*)/;   
s("$1", $1);

The $1 in the first argument to s is immediately evaluated by the string interpolation but the second argument is not evaluated, it is just noted that the second value in the sub's version of @_ is $1 (the actual variable $1, not its value). Then, inside s, the value of $1 is changed by your regular expression. And now, your @_ has an alias for the string "ok" followed by an alias for $1, these aliases are resolved by the print in your loop.

If you change the function to this:

sub s {
    my @a = @_;
    "huh?" =~ /(.*)/;
    print for @a;
}

or even this:

sub s {
    local $1;
    "huh?" =~ /(.*)/;
    print for @_;
}

Then you'll get the two lines of "ok" that you're expecting. The funny (funny peculiar, not funny ha-ha) is that those two versions of s produce your expected result for different reasons. The my @a = @_; version extracts the current values of the aliases in @_ before the regular expression gets its hands on $1; the local $1; version localizes the $1 variable to the sub leaving the alias in @_ referencing the version of $1 from outside the sub:

A local modifies the listed variables to be local to the enclosing block, file, or eval.

Oddities like this are why you should always copy the values of the numbered regex capture variables to variables of your as soon as possible and why you want to unpack @_ right at the beginning of your functions (unless you know why you don't want to do that).

Hopefully I haven't butchered the terminology too much, this is one of those weird corners of Perl that I've always stayed away from because I don't like juggling razor blades.

like image 194
mu is too short Avatar answered Sep 18 '22 16:09

mu is too short


The sample code makes use of two facts:

  • The elements of the @_ array are aliases for the actual scalar parameters. In particular, if an element $_[0] is updated, the corresponding argument is updated (and vice versa).
  • $1 is a global variable (albeit dynamically scoped to the current BLOCK), which automatically contains the subpattern from () from the last successful pattern match.

The first argument to the subroutine is an ordinary string ("ok"). The second argument is the global variable $1. But it is changed by the successful pattern match inside the subroutine, before the arguments are printed.

like image 43
Eugene Yarmash Avatar answered Sep 21 '22 16:09

Eugene Yarmash