Perl: how to use string variables as search pattern and replacement in regex

Tags:

Edit

Both $p and $r are written by myself. What I need is to do multiple similar regex replacing without touching the perl code, so $p and $r have to be in a separate data file. I hope this file can be used with C++/python code later. Here are some examples of $p and $r.

^(.*\D)?((19|18|20)\d\d)年   $1$2<digits>年
^(.*\D)?(0\d)年  $1$2<digits>年
([TKZGD])(\d+)/(\d+)([^\d/])    $1$2<digits>$3<digits>$4
([^/TKZGD\d])(\d+)/(\d+)([^/\d])    $1$3分之$2$4

312

asked Dec 22 '16 09:12

kangshiyin

1 Answers

With $p="b(.)d"; you are getting a string with literal characters b(.)d. In general, regex patterns are not preserved in quoted strings and may not have their expected meaning in a regex. However, see Note at the end.

This is what qr operator is for: $p = qr/b(.)d/; forms the string as a regular expression.

As for the replacement part and /ee, the problem is that $r is first evaluated, to yield _$1$1_, which is then evaluated as code. Alas, that is not valid Perl code. The _ are barewords and even $1$1 itself isn't valid (for example, $1 . $1 would be).

The provided examples of $r have $Ns mixed with text in various ways. One way to parse this is to extract all $N and all else into a list that maintains their order from the string. Then, that can be processed into a string that will be valid code. For example, we need

'$1_$2$3other'  -->  $1 . '_' . $2 . $3 . 'other'

which is valid Perl code that can be evaluated.

The part of breaking this up is helped by split's capturing in the separator pattern.

sub repl {
    my ($r) = @_;

    my @terms = grep { $_ } split /(\$\d)/, $r;

    return join '.', map { /^\$/ ? $_ : q(') . $_ . q(') } @terms;
}
    
$var =~ s/$p/repl($r)/gee;

With capturing /(...)/ in split's pattern, the separators are returned as a part of the list. Thus this extracts from $r an array of terms which are either $N or other, in their original order and with everything (other than trailing whitespace) kept. This includes possible (leading) empty strings so those need be filtered out.

Then every term other than $Ns is wrapped in '', so when they are all joined by . we get a valid Perl expression, as in the example above.

Then /ee will have this function return the string (such as above), and evaluate it as valid code.

We are told that safety of using /ee on external input is not a concern here. Still, this is something to keep in mind. See this post, provided by Håkon Hægland in a comment. Along with the discussion it also directs us to String::Substitution. Its use is demonstrated in this post. Another way to approach this is with replace from Data::Munge

For more discussion of /ee see this post, with several useful answers.

Note on using "b(.)d" for a regex pattern

In this case, with parens and dot, their special meaning is maintained. Thanks to kangshiyin for an early mention of this, and to Håkon Hægland for asserting it. However, this is a special case. Double-quoted strings directly deny many patterns since interpolation is done -- for example, "\w" is just an escaped w (what is unrecognized). The single quotes should work, as there is no interpolation. Still, strings intended for use as regex patterns are best formed using qr, as we are getting a true regex. Then all modifiers may be used as well.

142

answered Oct 03 '22 12:10

zdim

Related questions
                            
                                jQuery: Add file type class to links for ANY file type
                            
                                Regex anchors inside character class
                            
                                Regular Expression causing Stack Overflow
                            
                                Better Way to Write Regular Expression
                            
                                R: split text with multiple regex patterns and exceptions
                            
                                "IOError: [Errno 0] Error" error in Python
                            
                                Java 7, regexes and supplementary unicode characters
                            
                                Sort by function using bash/coreutils instead of perl
                            
                                How to get substring from a string in qt?
                            
                                How do I use regex_replace?
                            
                                How to capture minus sign in scientific notation with regex?
                            
                                How can I find repeated words in a file using grep/egrep?
                            
                                Extracting whole words based on substring matching in python
                            
                                Exclude the last character of a regex match
                            
                                Why doesnt look ahead and look behind Regex work in Kotlin?
                            
                                PHP: split a string of alternating groups of characters into an array
                            
                                Find the first occurrence with Regex and Java
                            
                                How to write a regex to match title case sentence (Ex: I Love To Work)
                            
                                Remove spaces at the start of each line in a multiline string variable
                            
                                scikit-learn: don't separate hyphenated words while tokenization

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Perl: how to use string variables as search pattern and replacement in regex

Tags:

regex

perl

Edit

kangshiyin

People also ask

1 Answers

zdim

Recent Activity

Donate For Us