Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl regex substitution using external parameters

Consider the following example:

my $text = "some_strange_thing";
$text =~ s/some_(\w+)_thing/no_$1_stuff/;
print "Result: $text\n";  

It prints

"Result: no_strange_stuff"

So far so good.

Now, I need to get both the match and replacement patterns from external sources (user input, config file, etc). Naive solution appears to be like this:

my $match = "some_(\\w+)_thing";
my $repl = "no_\$1_stuff";

my $text = "some_strange_thing";
$text =~ s/$match/$repl/;
print "Result: $text\n";  

However:

"Result: no_$1_stuff".

What's wrong? How can I get the same outcome with externally supplied patterns?

like image 1000
Noname Avatar asked Jul 07 '15 12:07

Noname


People also ask

What is \b in Perl regex?

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length.

How do I replace a value in Perl?

Perl: Use s/ (replace) and return new string [duplicate]

How do I search and replace in Perl?

Performing a regex search-and-replace is just as easy: $string =~ s/regex/replacement/g; I added a “g” after the last forward slash. The “g” stands for “global”, which tells Perl to replace all matches, and not just the first one.


2 Answers

Solution 1: String::Substitution

Use String::Substitution package:

use String::Substitution qw(gsub_modify);

my $find = 'some_(\w+)_thing';
my $repl = 'no_$1_stuff';
my $text = "some_strange_thing";
gsub_modify($text, $find, $repl);
print $text,"\n";

The replacement string only interpolates (term used loosely) numbered match vars (like $1 or ${12}). See "interpolate_match_vars" for more information.
This module does not save or interpolate $& to avoid the "considerable performance penalty" (see perlvar).

Solution 2: Data::Munge

This is a solution mentioned by Grinnz in the comments below.

The Data::Munge can be used the following way:

use Data::Munge;

my $find = qr/some_(\w+)_thing/;
my $repl = 'no_$1_stuff';
my $text = 'some_strange_thing';
my $flags = 'g';
print replace($text, $find, $repl, $flags);
# => no_strange_stuff

Solution 3: A quick'n'dirty way (if replacement won't contain double quotes and security is not considered)

DISCLAIMER: I provide this solution as this approach can be found online, but its caveats are not explained. Do not use it in production.

With this approach, you can't have a replacement string that includes a " double quotation mark and, since this is equivalent to handing whoever is writing the configuration file direct code access, it should not be exposed to Web users (as mentioned by Daniel Martin).

You can use the following code:

#!/usr/bin/perl
my $match = qr"some_(\w+)_thing";
my $repl = '"no_$1_stuff"';
my $text = "some_strange_thing";
$text =~ s/$match/$repl/ee;
print "Result: $text\n";

See IDEONE demo

Result:

Result: no_strange_stuff

You have to

  1. Declare the replacement in '"..."' so as $1 could be later evaluated
  2. Use /ee to force the double evaluation of the variables in the replacement.

A modifier available specifically to search and replace is the s///e evaluation modifier. s///e treats the replacement text as Perl code, rather than a double-quoted string. The value that the code returns is substituted for the matched substring. s///e is useful if you need to do a bit of computation in the process of replacing text.

You can use qr to instantiate pattern for the regex (qr"some_(\w+)_thing").

like image 87
Wiktor Stribiżew Avatar answered Oct 29 '22 00:10

Wiktor Stribiżew


Essentially the same approach as the accepted solution, but I kept the initial lines the same as the problem statement, since I thought that might make it easier to fit into more situations:

my $match = "some_(\\w+)_thing";
my $repl = "no_\$1_stuff";

my $qrmatch = qr($match);
my $code = $repl;

$code =~ s/([^"\\]*)(["\\])/$1\\$2/g;
$code = qq["$code"];

if (!defined($code)) {
  die "Couldn't find appropriate quote marks";
}

my $text = "some_strange_thing";
$text =~ s/$qrmatch/$code/ee;
print "Result: $text\n";

Note that this works no matter what is in $repl, whereas the naive solution has issues if $repl contains a double quote character itself, or ends in a backslash.

Also, assuming that you're going to run the three lines at the end (or something like it) in a loop, do make sure that you don't skip the qr line. It will make a huge performance difference if you skip the qr and just use s/$match/$code/ee.

Also, even though it's not as trivial to get arbitrary code execution with this solution as it is with the accepted one, it wouldn't surprise me if it's still possible. In general, I'd avoid solutions based on s///ee if the $match or $repl come from untrusted users. (e.g., don't build a web service out of this)

Doing this kind of replacement securely when $match and $repl are supplied by untrusted users should be asked as a different question if your use case includes that.

like image 37
Daniel Martin Avatar answered Oct 29 '22 02:10

Daniel Martin