Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to have a variable as regex in Perl

Tags:

regex

perl

I think this question is repeated, but searching wasn't helpful for me.

my $pattern = "javascript:window.open\('([^']+)'\);";
$mech->content =~ m/($pattern)/;
print $1;

I want to have an external $pattern in the regular expression. How can I do this? The current one returns:

Use of uninitialized value $1 in print at main.pm line 20.

like image 432
Shubham Avatar asked Feb 26 '23 11:02

Shubham


2 Answers

$1 was empty, so the match did not succeed. I'll make up a constant string in my example of which I know that it will match the pattern.

Declare your regular expression with qr, not as a simple string. Also, you're capturing twice, once in $pattern for the open call's parentheses, once in the m operator for the whole thing, therefore you get two results. Instead of $1, $2 etc. I prefer to assign the results to an array.

my $pattern = qr"javascript:window.open\('([^']+)'\);";
my $content = "javascript:window.open('something');";
my @results = $content =~ m/($pattern)/;
# expression return array
# (
#     q{javascript:window.open('something');'},
#     'something'
# )
like image 139
daxim Avatar answered Mar 06 '23 23:03

daxim


When I compile that string into a regex, like so:

my $pattern = "javascript:window.open\('([^']+)'\);";
my $regex   = qr/$pattern/;

I get just what I think I should get, following regex:

(?-xism:javascript:window.open('([^']+)');)/

Notice that it it is looking for a capture group and not an open paren at the end of 'open'. And in that capture group, the first thing it expects is a single quote. So it will match

javascript:window.open'fum';

but not

javascript:window.open('fum');

One thing you have to learn, is that in Perl, "\(" is the same thing as "(" you're just telling Perl that you want a literal '(' in the string. In order to get lasting escapes, you need to double them.

my $pattern = "javascript:window.open\\('([^']+)'\\);";
my $regex   = qr/$pattern/;

Actually preserves the literal ( and yields:

(?-xism:javascript:window.open\('([^']+)'\);)

Which is what I think you want.

As for your question, you should always test the results of a match before using it.

if ( $mech->content =~ m/($pattern)/ ) { 
     print $1;
}

makes much more sense. And if you want to see it regardless, then it's already implicit in that idea that it might not have a value. i.e., you might not have matched anything. In that case it's best to put alternatives

$mech->content =~ m/($pattern)/;
print $1 || 'UNDEF!';

However, I prefer to grab my captures in the same statement, like so:

my ( $open_arg ) = $mech->content =~ m/($pattern)/;
print $open_arg || 'UNDEF!';

The parens around $open_arg puts the match into a "list context" and returns the captures in a list. Here I'm only expecting one value, so that's all I'm providing for.

Finally, one of the root causes of your problems is that you do not need to specify your expression in a string in order for your regex to be "portable". You can get perl to pre-compile your expression. That way, you only care what instructions the characters are to a regex and not whether or not you'll save your escapes until it is compiled into an expression.

A compiled regex will interpolate itself into other regexes properly. Thus, you get a portable expression that interpolates just as well as a string--and specifically correctly handles instructions that could be lost in a string.

my $pattern = qr/javascript:window.open\('([^']+)'\);/;

Is all that you need. Then you can use it, just as you did. Although, putting parens around the whole thing, would return the whole matched expression (and not just what's between the quotes).

like image 29
Axeman Avatar answered Mar 07 '23 00:03

Axeman