Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why isn't my overload::constant sub triggered when I use a string variable?

Tags:

perl

I'm trying to overload constants in regular expressions. Here is my Tagger package:

package Tagger;
use overload;

sub import { overload::constant 'qr' => \&convert }

sub convert {
    my $re = shift;
    $re =~ s/\\nom/((?:[A-Z]{1}[a-z]+\\s*){2,3}(\\((\\w|\\s)+\\)+?)*)/xg;
    return $re;
}

1;

Here is the subroutine in which I'd like to trigger the overloading:

sub ChopPattern {
    my $string= shift;
    my $pattern = shift;

    if($string =~ m/$pattern/) {
        $string =~ s/$&/ /g;
        return ($string, $&);
    } else {
        return ($string, '');
    }
}

Here is my test:

$test = "foo bar Max Fast bar foo";
($test, $name) = ChopPattern($test, '\nom');
say $test;
say $name;

If I hardwire the test pattern, \nom, in the subroutine's match:

sub ChopPattern {
    my $string= shift;
    my $pattern = shift;

    if($string =~ m/\nom/) {
        $string =~ s/$&/ /g;
        return ($string, $&);
    } else {
        return ($string, '');
    }
}

the test yields the correct answer:

foo bar  bar foo
Max Fast

But if I use $pattern in the match as above the test yields:

foo bar Max Fast bar foo
<null line>

Is there a reason that \nom triggers Tagger but a variable equal to \nom doesn't?

Here are the details of the version of Perl being used:

This is perl 5, version 16, subversion 3 (v5.16.3) built for MSWin32-x64-multi-thread (with 1 registered patch, see perl -V for more detail)

Copyright 1987-2012, Larry Wall

Binary build 1604 [298023] provided by ActiveState http://www.ActiveState.com
Built Apr 14 2014 15:29:45
like image 250
user90346 Avatar asked Mar 13 '17 14:03

user90346


2 Answers

Is there a reason that \nom triggers Tagger but a variable equal to \nom doesn't?

Because '\nom' is a string literal, not a constant piece of a regex:

$ perl -Moverload -E'BEGIN { overload::constant qr => sub { say "@_" } } $foo =~ "bar"'
$ perl -Moverload -E'BEGIN { overload::constant qr => sub { say "@_" } } $foo =~ /bar/'
bar bar qq

What you're doing is a bad idea. The following implementation is much easier to understand and doesn't change regex semantics everywhere:

use strict;
use warnings 'all';
use 5.010;

sub chop_pattern {
    my ($string, $pattern) = @_;

    my %mapping = (
        '\nom' => qr/((?:[A-Z][a-z]+\s*){2,3}(?:\([\w\s]+\)+?)*)/
    );

    if (exists $mapping{$pattern}) {
        my $matched = $string =~ s/$mapping{$pattern}/ /g;
        return $string, $1 if $matched;
    }

    return $string, '';
}

my ($string, $chopped) = chop_pattern('foo Bar Baz qux', '\nom');
say "<$string> <$chopped>";

Output:

<foo  qux> <Bar Baz >

I'm guessing you went with overload because you want to handle more than one "magic" string (e.g. \nom). I did that with a simple hash that maps strings to regexes.

like image 90
ThisSuitIsBlackNot Avatar answered Oct 17 '22 11:10

ThisSuitIsBlackNot


Programming Perl says that overload::constant works on constants.

Any handlers you provide for integer and float will be invoked whenever the Perl tokener encounters a constant number.

When you call m/$pattern/, that's not a constant. It's a variable.

($test, $name) = ChopPattern($test, '\nom');

Now there the '\nom' is a constant, but it's a string. Turn that into a qr// and you will have a regular expression that contains a constant.

($test, my $name) = ChopPattern($test, qr'\nom');

The pattern-match in ChopPattern can stay the same:

if($string =~ m/$pattern/) { ... } 

Because there now is a constant part in a regular expression, Perl can call your convert overload, and do your regex.


Let's see this in action. Remember Perl is doing this overloading substitution at compile time, when it parses the source code.

Consider this example:

BEGIN {
    overload::constant 'qr' => sub {
        my $re = shift;
        $re =~ s/\\nom/foobar/;
        return $re;
    };
}

sub match {
    my ( $t, $p ) = @_;
    $t =~ m/$p/;
}
match( 'some text', '\nom' );

It's not important what the code does. When we deparse it, we get this output:

$ perl -MO=Deparse scratch.pl 
sub BEGIN {
    use warnings;
    use strict;
    use feature 'say';
    overload::constant('qr', sub {
        my $re = shift();
        $re =~ s/\\nom/foobar/;
        return $re;
    }
    );
}
sub match {
    use warnings;
    use strict;
    use feature 'say';
    BEGIN {
        $^H{'qr'} = 'CODE(0x147a048)';
    }
    my($t, $p) = @_;
    $t =~ /$p/;
}
use warnings;
use strict;
use feature 'say';
BEGIN {
    $^H{'qr'} = 'CODE(0x147a048)';
}
match 'some text', '\\nom';                       # <-- here

We can see that the handler was installed, but in the last line in the function call, there is the '\\nom' string.

Now if we use a quoted expression qr// there instead of the string, things change.

BEGIN {
    overload::constant 'qr' => sub {
        my $re = shift;
        $re =~ s/\\nom/foobar/;
        return $re;
    };
}

sub match {
    my ( $t, $p ) = @_;
    $t =~ m/$p/;
}
match( 'some text', qr/\nom/ );

Now the deparsed program suddenly contains foobar. The regex was changed.

$ perl -MO=Deparse scratch2.pl 
sub BEGIN {
    use warnings;
    use strict;
    use feature 'say';
    overload::constant('qr', sub {
        my $re = shift();
        $re =~ s/\\nom/foobar/;
        return $re;
    }
    );
}
sub match {
    use warnings;
    use strict;
    use feature 'say';
    BEGIN {
        $^H{'qr'} = 'CODE(0x1e81048)';
    }
    my($t, $p) = @_;
    $t =~ /$p/;
}
use warnings;
use strict;
use feature 'say';
BEGIN {
    $^H{'qr'} = 'CODE(0x1e81048)';
}
match 'some text', qr/foobar/;                       # <-- here

It did that before the code was even run.

If we run both programs with -MO=Concise to see what the interpreter will run after compile time, we get further proof that this stuff only works on actual constants in the source code, and cannot work dynamically.

$ perl -MO=Concise scratch.pl 
8  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 2529 scratch.pl:5950) v:%,R,*,&,{,x*,x&,x$,$,469762048 ->3
7     <1> entersub[t1] vKS/TARG,2 ->8
-        <1> ex-list K ->7
3           <0> pushmark s ->4
4           <$> const(PV "some text") sM ->5       # <-- here
5           <$> const(PV "\\nom") sM ->6
-           <1> ex-rv2cv sK/2 ->-
6              <$> gv(*match) s ->7

And with qr//:

 $ perl -MO=Concise scratch2.pl 
8  <@> leave[1 ref] vKP/REFC ->(end)
1     <0> enter ->2
2     <;> nextstate(main 2529 scratch2.pl:5950) v:%,R,*,&,{,x*,x&,x$,$,469762048 ->3
7     <1> entersub[t1] vKS/TARG,2 ->8
-        <1> ex-list K ->7
3           <0> pushmark s ->4
4           <$> const(PV "some text") sM ->5       # <-- here
5           </> qr(/"foobar"/) lM/RTIME ->6
-           <1> ex-rv2cv sK/2 ->-
6              <$> gv(*match) s ->7
like image 4
simbabque Avatar answered Oct 17 '22 12:10

simbabque