Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl: how to use string variables as search pattern and replacement in regex

Tags:

regex

perl

I want to use string variables for both search pattern and replacement in regex. The expected output is like this,

$ perl -e '$a="abcdeabCde"; $a=~s/b(.)d/_$1$1_/g; print "$a\n"'
a_cc_ea_CC_e

But when I moved the pattern and replacement to a variable, $1 was not evaluated.

$ perl -e '$a="abcdeabCde"; $p="b(.)d"; $r="_\$1\$1_"; $a=~s/$p/$r/g; print "$a\n"'
a_$1$1_ea_$1$1_e

When I use "ee" modifier, it gives errors.

$ perl -e '$a="abcdeabCde"; $p="b(.)d"; $r="_\$1\$1_"; $a=~s/$p/$r/gee; print "$a\n"'
Scalar found where operator expected at (eval 1) line 1, near "$1$1"
    (Missing operator before $1?)
Bareword found where operator expected at (eval 1) line 1, near "$1_"
    (Missing operator before _?)
Scalar found where operator expected at (eval 2) line 1, near "$1$1"
    (Missing operator before $1?)
Bareword found where operator expected at (eval 2) line 1, near "$1_"
    (Missing operator before _?)
aeae

What do I miss here?


Edit

Both $p and $r are written by myself. What I need is to do multiple similar regex replacing without touching the perl code, so $p and $r have to be in a separate data file. I hope this file can be used with C++/python code later. Here are some examples of $p and $r.

^(.*\D)?((19|18|20)\d\d)年   $1$2<digits>年
^(.*\D)?(0\d)年  $1$2<digits>年
([TKZGD])(\d+)/(\d+)([^\d/])    $1$2<digits>$3<digits>$4
([^/TKZGD\d])(\d+)/(\d+)([^/\d])    $1$3分之$2$4
like image 312
kangshiyin Avatar asked Dec 22 '16 09:12

kangshiyin


People also ask

How do I match a string in regex in Perl?

Simple word matching In this statement, World is a regex and the // enclosing /World/ tells Perl to search a string for a match. The operator =~ associates the string with the regex match and produces a true value if the regex matched, or false if the regex did not match.

How do I find and replace a string in Perl?

Performing a regex search-and-replace is just as easy: $string =~ s/regex/replacement/g; I added a “g” after the last forward slash. The “g” stands for “global”, which tells Perl to replace all matches, and not just the first one.

How do you match a string to a pattern?

Put brackets ( [ ] ) in the pattern string, and inside the brackets put the lowest and highest characters in the range, separated by a hyphen ( – ). Any single character within the range makes a successful match.

How do I search for a string in Perl?

To search for a substring inside a string, you use index() and rindex() functions. The index() function searches for a substring inside a string from a specified position and returns the position of the first occurrence of the substring in the searched string.


1 Answers

With $p="b(.)d"; you are getting a string with literal characters b(.)d. In general, regex patterns are not preserved in quoted strings and may not have their expected meaning in a regex. However, see Note at the end.

This is what qr operator is for: $p = qr/b(.)d/; forms the string as a regular expression.

As for the replacement part and /ee, the problem is that $r is first evaluated, to yield _$1$1_, which is then evaluated as code. Alas, that is not valid Perl code. The _ are barewords and even $1$1 itself isn't valid (for example, $1 . $1 would be).

The provided examples of $r have $Ns mixed with text in various ways. One way to parse this is to extract all $N and all else into a list that maintains their order from the string. Then, that can be processed into a string that will be valid code. For example, we need

'$1_$2$3other'  -->  $1 . '_' . $2 . $3 . 'other'

which is valid Perl code that can be evaluated.

The part of breaking this up is helped by split's capturing in the separator pattern.

sub repl {
    my ($r) = @_;

    my @terms = grep { $_ } split /(\$\d)/, $r;

    return join '.', map { /^\$/ ? $_ : q(') . $_ . q(') } @terms;
}
    
$var =~ s/$p/repl($r)/gee;

With capturing /(...)/ in split's pattern, the separators are returned as a part of the list. Thus this extracts from $r an array of terms which are either $N or other, in their original order and with everything (other than trailing whitespace) kept. This includes possible (leading) empty strings so those need be filtered out.

Then every term other than $Ns is wrapped in '', so when they are all joined by . we get a valid Perl expression, as in the example above.

Then /ee will have this function return the string (such as above), and evaluate it as valid code.

We are told that safety of using /ee on external input is not a concern here. Still, this is something to keep in mind. See this post, provided by Håkon Hægland in a comment. Along with the discussion it also directs us to String::Substitution. Its use is demonstrated in this post. Another way to approach this is with replace from Data::Munge

For more discussion of /ee see this post, with several useful answers.


Note on using "b(.)d" for a regex pattern

In this case, with parens and dot, their special meaning is maintained. Thanks to kangshiyin for an early mention of this, and to Håkon Hægland for asserting it. However, this is a special case. Double-quoted strings directly deny many patterns since interpolation is done -- for example, "\w" is just an escaped w (what is unrecognized). The single quotes should work, as there is no interpolation. Still, strings intended for use as regex patterns are best formed using qr, as we are getting a true regex. Then all modifiers may be used as well.

like image 142
zdim Avatar answered Oct 03 '22 12:10

zdim