Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I delay variable interpolation in a regex to the point of use?

Tags:

regex

perl

For example, let's imagine that I have a set of variables and an array of regexes that interpolate those variables:

my ($var1, $var2, $var3);
my @search_regexes=(
  qr/foo $var1/,
  qr/foo bar $var2/,
  qr/foo bar baz $var3/,
);

The above code will give us warnings telling us that $var1, $var2 and $var3 are not defined at the point of regex compilation for the regexes in $search_regexes. However, I want to delay variable interpolation in those regexes until the point they are actually used (or later (re)compiled once the variables have values):

# Later on we assign a value to $var1 and search for the first regex in $_ ...
$var1='Hello';
if (/$search_regexes[0]/)
{
  # Do something ...
}

How would I go about restructuring the construct in the initial code sample to allow for this?

As a bonus, I would like to compile each regex after a value is assigned to the respective variable(s) appearing in that regex in the same way that the qr// operator is doing now (but too early). If you can show how to further extend the solution to allow for this, I would greatly appreciate it.

Update:

I have settled on a variant of Hunter's approach, because using it I don't take a performance hit and there are minimal changes to my existing code. Other answers also taught me quite a bit about alternative solutions to this problem and their performance implications when very many lines need to be matched. My code now resembles the following:

my ($var1, $var2, $var3);
my @search_regexes=(
  sub {qr/foo $var1/},
  sub {qr/foo bar $var2/},
  sub {qr/foo bar baz $var3/},
);

...
($var1,$var2,$var3)=qw(Hello there Mr);

my $search_regex=$search_regexes[$based_on_something]->();

while (<>)
{
  if (/$search_regex/)
  {
    # Do something ...
    # and sometimes change $search_regex to be another from the array
  }

}

This gets me what I was looking for with minimal changes to my code (i.e., just the addition of subs to the array up top) and no performance hit per regex usage.

like image 737
Michael Goldshteyn Avatar asked Feb 13 '14 16:02

Michael Goldshteyn


People also ask

How do you pass a variable within a regular expression?

Note: Regex can be created in two ways first one is regex literal and the second one is regex constructor method ( new RegExp() ). If we try to pass a variable to the regex literal pattern it won't work. The right way of doing it is by using a regular expression constructor new RegExp() .

Which of the following options can you use to create a dynamic regex using the string in a variable variable?

You can do dynamic regexs by combining string values and other regex expressions within a raw string template. Using String. raw will prevent javascript from escaping any character within your string values.

What is $1 regex?

For example, the replacement pattern $1 indicates that the matched substring is to be replaced by the first captured group. For more information about numbered capturing groups, see Grouping Constructs. All digits that follow $ are interpreted as belonging to the number group.


3 Answers

The best solution would be to defer the compilation of the regex until those variables are defined. But first a questionable solution: Regexes can include code: qr/foo (??{ $var1 })/. The block is executed during the match, and the result of the block is then used as a pattern.

How can we defer the compilation?

  1. By simply specifying them when the variables have been assigned. This is less of a problem as you might think, as any program can be expressed without (re-)assigning variables. Stick to the rule that any declaration must also be an assignment (and vice versa), and this should work. This:

    my $var1;
    my $re = qr/$var1/;
    $var1 = ...;
    $bar =~ $re;
    

    becomes:

    my $var1 = ...;
    $re = qr/$var1/;
    $bar =~ $re;
    
  2. If this isn't possible, we might want to use a closure that we evaluate before matching:

    my $var1;
    my $deferred_re = sub { qr/$var1/ };
    $var1 = ...;
    $bar =~ $deferred_re->();
    

    Of course this would recompile the regex at each invocation.

  3. We can extend the previous idea by caching the regex:

    package DeferredRegexp;
    use overload 'qr' => sub {
      my ($self) = @_;
      return $self->[0] //= $self->[1]->();
    };
    
    sub new {
       my ($class, $callback) = @_;
       return bless [undef, $callback] => $class;
    }
    

    Then:

    my $var1;
    my $deferred_re = DeferredRegexp->new(sub{ qr/$var1/ });
    $var1 = ...;
    $bar =~ $deferred_re;
    
like image 193
amon Avatar answered Oct 14 '22 08:10

amon


I think if you wrap each regular expression in anonymous sub, you can do this sort of deferral:

my ($var1, $var2, $var3);
my @search_regexes=(
  sub { return qr/foo $var1/         },
  sub { return qr/foo bar $var2/     },
  sub { return qr/foo bar baz $var3/ },
);

Then when you are going to evaluate them you just 'call' the anonymous sub:

($var1, $var2, $var3) = qw(thunk this code);
if( $_ =~ $search_regexes[0]->() ) {
   # Do something
}

I know in Scheme this is called thunking I am not sure if it has a name in Perl. You can do something similar in Ruby with Proc objects

like image 38
Hunter McMillen Avatar answered Oct 14 '22 07:10

Hunter McMillen


(??{ }) does exactly what you ask for.

our $var1;
my $re = qr/foo (??{ $var1 )/;
...
local $var1 = ...;
/$re/

But that's very awkward. The original string is what is called a template. There are numerous templating systems available that would make this cleaner.

my $pat_template = 'foo [% var1 %]';
...
Template->new->process($pat_template, { var1 => ... }, \my $pat);
/$pat/

If the template doesn't need to be stored in a file, you could use a builder sub.

my $re_gen = sub { my ($var1) = @_; qr/foo $var1/ };
...
my $re = $re_gen->(...);
/$re/

Note: Inside of (??{ }), you can run into problem using of lexical variables declared on the outside. That's why I used a package variable in the first snippet.

like image 37
ikegami Avatar answered Oct 14 '22 07:10

ikegami