For example, let's imagine that I have a set of variables and an array of regexes that interpolate those variables:
my ($var1, $var2, $var3);
my @search_regexes=(
qr/foo $var1/,
qr/foo bar $var2/,
qr/foo bar baz $var3/,
);
The above code will give us warnings telling us that $var1
, $var2
and $var3
are not defined at the point of regex compilation for the regexes in $search_regexes
. However, I want to delay variable interpolation in those regexes until the point they are actually used (or later (re)compiled once the variables have values):
# Later on we assign a value to $var1 and search for the first regex in $_ ...
$var1='Hello';
if (/$search_regexes[0]/)
{
# Do something ...
}
How would I go about restructuring the construct in the initial code sample to allow for this?
As a bonus, I would like to compile each regex after a value is assigned to the respective variable(s) appearing in that regex in the same way that the qr//
operator is doing now (but too early). If you can show how to further extend the solution to allow for this, I would greatly appreciate it.
Update:
I have settled on a variant of Hunter's approach, because using it I don't take a performance hit and there are minimal changes to my existing code. Other answers also taught me quite a bit about alternative solutions to this problem and their performance implications when very many lines need to be matched. My code now resembles the following:
my ($var1, $var2, $var3);
my @search_regexes=(
sub {qr/foo $var1/},
sub {qr/foo bar $var2/},
sub {qr/foo bar baz $var3/},
);
...
($var1,$var2,$var3)=qw(Hello there Mr);
my $search_regex=$search_regexes[$based_on_something]->();
while (<>)
{
if (/$search_regex/)
{
# Do something ...
# and sometimes change $search_regex to be another from the array
}
}
This gets me what I was looking for with minimal changes to my code (i.e., just the addition of subs to the array up top) and no performance hit per regex usage.
Note: Regex can be created in two ways first one is regex literal and the second one is regex constructor method ( new RegExp() ). If we try to pass a variable to the regex literal pattern it won't work. The right way of doing it is by using a regular expression constructor new RegExp() .
You can do dynamic regexs by combining string values and other regex expressions within a raw string template. Using String. raw will prevent javascript from escaping any character within your string values.
For example, the replacement pattern $1 indicates that the matched substring is to be replaced by the first captured group. For more information about numbered capturing groups, see Grouping Constructs. All digits that follow $ are interpreted as belonging to the number group.
The best solution would be to defer the compilation of the regex until those variables are defined. But first a questionable solution: Regexes can include code: qr/foo (??{ $var1 })/
. The block is executed during the match, and the result of the block is then used as a pattern.
How can we defer the compilation?
By simply specifying them when the variables have been assigned. This is less of a problem as you might think, as any program can be expressed without (re-)assigning variables. Stick to the rule that any declaration must also be an assignment (and vice versa), and this should work. This:
my $var1;
my $re = qr/$var1/;
$var1 = ...;
$bar =~ $re;
becomes:
my $var1 = ...;
$re = qr/$var1/;
$bar =~ $re;
If this isn't possible, we might want to use a closure that we evaluate before matching:
my $var1;
my $deferred_re = sub { qr/$var1/ };
$var1 = ...;
$bar =~ $deferred_re->();
Of course this would recompile the regex at each invocation.
We can extend the previous idea by caching the regex:
package DeferredRegexp;
use overload 'qr' => sub {
my ($self) = @_;
return $self->[0] //= $self->[1]->();
};
sub new {
my ($class, $callback) = @_;
return bless [undef, $callback] => $class;
}
Then:
my $var1;
my $deferred_re = DeferredRegexp->new(sub{ qr/$var1/ });
$var1 = ...;
$bar =~ $deferred_re;
I think if you wrap each regular expression in anonymous sub, you can do this sort of deferral:
my ($var1, $var2, $var3);
my @search_regexes=(
sub { return qr/foo $var1/ },
sub { return qr/foo bar $var2/ },
sub { return qr/foo bar baz $var3/ },
);
Then when you are going to evaluate them you just 'call' the anonymous sub:
($var1, $var2, $var3) = qw(thunk this code);
if( $_ =~ $search_regexes[0]->() ) {
# Do something
}
I know in Scheme this is called thunking I am not sure if it has a name in Perl. You can do something similar in Ruby with Proc objects
(??{ })
does exactly what you ask for.
our $var1;
my $re = qr/foo (??{ $var1 )/;
...
local $var1 = ...;
/$re/
But that's very awkward. The original string is what is called a template. There are numerous templating systems available that would make this cleaner.
my $pat_template = 'foo [% var1 %]';
...
Template->new->process($pat_template, { var1 => ... }, \my $pat);
/$pat/
If the template doesn't need to be stored in a file, you could use a builder sub.
my $re_gen = sub { my ($var1) = @_; qr/foo $var1/ };
...
my $re = $re_gen->(...);
/$re/
Note: Inside of (??{ })
, you can run into problem using of lexical variables declared on the outside. That's why I used a package variable in the first snippet.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With