I'm trying to overload constants in regular expressions. Here is my Tagger package:
package Tagger;
use overload;
sub import { overload::constant 'qr' => \&convert }
sub convert {
my $re = shift;
$re =~ s/\\nom/((?:[A-Z]{1}[a-z]+\\s*){2,3}(\\((\\w|\\s)+\\)+?)*)/xg;
return $re;
}
1;
Here is the subroutine in which I'd like to trigger the overloading:
sub ChopPattern {
my $string= shift;
my $pattern = shift;
if($string =~ m/$pattern/) {
$string =~ s/$&/ /g;
return ($string, $&);
} else {
return ($string, '');
}
}
Here is my test:
$test = "foo bar Max Fast bar foo";
($test, $name) = ChopPattern($test, '\nom');
say $test;
say $name;
If I hardwire the test pattern, \nom
, in the subroutine's match:
sub ChopPattern {
my $string= shift;
my $pattern = shift;
if($string =~ m/\nom/) {
$string =~ s/$&/ /g;
return ($string, $&);
} else {
return ($string, '');
}
}
the test yields the correct answer:
foo bar bar foo
Max Fast
But if I use $pattern
in the match as above the test yields:
foo bar Max Fast bar foo
<null line>
Is there a reason that \nom
triggers Tagger but a variable equal to \nom
doesn't?
Here are the details of the version of Perl being used:
This is perl 5, version 16, subversion 3 (v5.16.3) built for MSWin32-x64-multi-thread (with 1 registered patch, see perl -V for more detail)
Copyright 1987-2012, Larry Wall
Binary build 1604 [298023] provided by ActiveState http://www.ActiveState.com
Built Apr 14 2014 15:29:45
Is there a reason that
\nom
triggers Tagger but a variable equal to\nom
doesn't?
Because '\nom'
is a string literal, not a constant piece of a regex:
$ perl -Moverload -E'BEGIN { overload::constant qr => sub { say "@_" } } $foo =~ "bar"'
$ perl -Moverload -E'BEGIN { overload::constant qr => sub { say "@_" } } $foo =~ /bar/'
bar bar qq
What you're doing is a bad idea. The following implementation is much easier to understand and doesn't change regex semantics everywhere:
use strict;
use warnings 'all';
use 5.010;
sub chop_pattern {
my ($string, $pattern) = @_;
my %mapping = (
'\nom' => qr/((?:[A-Z][a-z]+\s*){2,3}(?:\([\w\s]+\)+?)*)/
);
if (exists $mapping{$pattern}) {
my $matched = $string =~ s/$mapping{$pattern}/ /g;
return $string, $1 if $matched;
}
return $string, '';
}
my ($string, $chopped) = chop_pattern('foo Bar Baz qux', '\nom');
say "<$string> <$chopped>";
Output:
<foo qux> <Bar Baz >
I'm guessing you went with overload because you want to handle more than one "magic" string (e.g. \nom
). I did that with a simple hash that maps strings to regexes.
Programming Perl says that overload::constant
works on constants.
Any handlers you provide for integer and float will be invoked whenever the Perl tokener encounters a constant number.
When you call m/$pattern/
, that's not a constant. It's a variable.
($test, $name) = ChopPattern($test, '\nom');
Now there the '\nom'
is a constant, but it's a string. Turn that into a qr//
and you will have a regular expression that contains a constant.
($test, my $name) = ChopPattern($test, qr'\nom');
The pattern-match in ChopPattern
can stay the same:
if($string =~ m/$pattern/) { ... }
Because there now is a constant part in a regular expression, Perl can call your convert
overload, and do your regex.
Let's see this in action. Remember Perl is doing this overloading substitution at compile time, when it parses the source code.
Consider this example:
BEGIN {
overload::constant 'qr' => sub {
my $re = shift;
$re =~ s/\\nom/foobar/;
return $re;
};
}
sub match {
my ( $t, $p ) = @_;
$t =~ m/$p/;
}
match( 'some text', '\nom' );
It's not important what the code does. When we deparse it, we get this output:
$ perl -MO=Deparse scratch.pl
sub BEGIN {
use warnings;
use strict;
use feature 'say';
overload::constant('qr', sub {
my $re = shift();
$re =~ s/\\nom/foobar/;
return $re;
}
);
}
sub match {
use warnings;
use strict;
use feature 'say';
BEGIN {
$^H{'qr'} = 'CODE(0x147a048)';
}
my($t, $p) = @_;
$t =~ /$p/;
}
use warnings;
use strict;
use feature 'say';
BEGIN {
$^H{'qr'} = 'CODE(0x147a048)';
}
match 'some text', '\\nom'; # <-- here
We can see that the handler was installed, but in the last line in the function call, there is the '\\nom'
string.
Now if we use a quoted expression qr//
there instead of the string, things change.
BEGIN {
overload::constant 'qr' => sub {
my $re = shift;
$re =~ s/\\nom/foobar/;
return $re;
};
}
sub match {
my ( $t, $p ) = @_;
$t =~ m/$p/;
}
match( 'some text', qr/\nom/ );
Now the deparsed program suddenly contains foobar
. The regex was changed.
$ perl -MO=Deparse scratch2.pl
sub BEGIN {
use warnings;
use strict;
use feature 'say';
overload::constant('qr', sub {
my $re = shift();
$re =~ s/\\nom/foobar/;
return $re;
}
);
}
sub match {
use warnings;
use strict;
use feature 'say';
BEGIN {
$^H{'qr'} = 'CODE(0x1e81048)';
}
my($t, $p) = @_;
$t =~ /$p/;
}
use warnings;
use strict;
use feature 'say';
BEGIN {
$^H{'qr'} = 'CODE(0x1e81048)';
}
match 'some text', qr/foobar/; # <-- here
It did that before the code was even run.
If we run both programs with -MO=Concise
to see what the interpreter will run after compile time, we get further proof that this stuff only works on actual constants in the source code, and cannot work dynamically.
$ perl -MO=Concise scratch.pl
8 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 2529 scratch.pl:5950) v:%,R,*,&,{,x*,x&,x$,$,469762048 ->3
7 <1> entersub[t1] vKS/TARG,2 ->8
- <1> ex-list K ->7
3 <0> pushmark s ->4
4 <$> const(PV "some text") sM ->5 # <-- here
5 <$> const(PV "\\nom") sM ->6
- <1> ex-rv2cv sK/2 ->-
6 <$> gv(*match) s ->7
And with qr//
:
$ perl -MO=Concise scratch2.pl
8 <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 2529 scratch2.pl:5950) v:%,R,*,&,{,x*,x&,x$,$,469762048 ->3
7 <1> entersub[t1] vKS/TARG,2 ->8
- <1> ex-list K ->7
3 <0> pushmark s ->4
4 <$> const(PV "some text") sM ->5 # <-- here
5 </> qr(/"foobar"/) lM/RTIME ->6
- <1> ex-rv2cv sK/2 ->-
6 <$> gv(*match) s ->7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With