Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What characters must I escape in a Perl pre-compiled regex?

I'm having a hard time determining what characters must be escaped when using Perl's qr{} construct

I'm attempting to create a multi-line precompiled regex for text that contains a myriad of normally escaped characters (#*.>:[]) and also contains another precompiled regex. Additionally I need to match as strictly as possible for testing purposes.

my $output = q{# using defaults found in .config
*
*
Options:
  1. opt1
> 2. opt2
choice[1-2?]: };

my $sc = qr{(>|\s)}smx;
my $re = qr{# using defaults found in .config
*
*
Options:
$sc 1. opt1
$sc 2. opt2
choice[1-2?]: }mx;

if ( $output =~ $re ) {
  print "OK!\n";
}
else {
  print "D'oh!\n";
}

Error:

Quantifier follows nothing in regex; marked by <-- HERE in m/# using defaults found in .config
* <-- HERE 
*
Options:
(?msx-i:(>|\s)) 1. opt1
(?msx-i:(>|\s)) 2. opt2
choice[1-2?]: / at ./so.pl line 14.

Attempting to escape the asterisks results in a failed match (D'oh output). Attempting to escape other pesky chars also results in a failed match. I could continue trying different combos of what to escape, but there's a lot of variations here and am hoping someone could provide some insight.

like image 953
Erik Johansen Avatar asked Nov 14 '08 19:11

Erik Johansen


People also ask

What characters must be escaped in regex?

Operators: * , + , ? , | Anchors: ^ , $ Others: . , \ In order to use a literal ^ at the start or a literal $ at the end of a regex, the character must be escaped.

How do I escape a regular expression in Perl?

Because backslash \ has special meaning in strings and regexes, if we would like to tell Perl that we really mean a backs-slash, we will have to "escape" it, by using the "escape character" which happens to be back-slash itself. So we need to write two back-slashes: \\.

How do you escape a special character in Perl?

The backslash is the escape character and is used to make use of escape sequences. When there is a need to insert the escape character in an interpolated string, the same backslash is used, to escape the substitution of escape character with ” (blank). This allows the use of escape character in the interpolated string.


2 Answers

You have to escape the delimiter for qr//, and you have to escape any regex metacharacters that you want to use as literals. If you want those to be literal *'s, you need to escape them since the * is a regex quantifier.

Your problem here is the various regex flags that you've added. The /m doesn't do anything because you don't use the beginning- or end-of-string anchors (^, $). The /s doesn't do anything because you don't use the wildcard . metacharacter. The /x makes all of the whitespace in your regex meaningless, and it turns that line with the # into a regex comment.

This is what you want, with regex flags removed and the proper things escaped:

my $sc = qr{(>|\s)};

my $re = qr{# using defaults found in \.config
\*
\*
Options:
$sc 1\. opt1
$sc 2\. opt2
choice\[1-2\?]: };

Although Damian Conway tells people in Perl Best Practices to always put these options on their regexes, you now see why he's wrong. You should only add them when you want what they do, and you should only add things when you know what they do. :) Here's what you might do if you want to use /x. You have to escape any literal whitespace, you need to denote the line endings somehow, and you have to escape the literal # character. What was readable before is now a mess:

my $sc  = qr{(>|\s)};
my $eol = qr{[\r\n]+};

my $re  = qr{\# \s+ using \s+ defaults \s+ found \s+ in \s+ \.config $eol
\*                    $eol
\*                    $eol
Options:              $eol
$sc \s+ 1\. \s+ opt1   $eol
$sc \s+ 2\. \s+ opt2   $eol
choice\[1-2\?]: \s+
}x;

if ( $output =~ $re ) {
  print "OK!\n";
}
else {
  print "D'oh!\n";
}
like image 135
brian d foy Avatar answered Nov 08 '22 11:11

brian d foy


Sounds like what you really want is Expect, but the thing you are most immediately looking for is the quotemeta operator which escapes all characters that have special meanings to a regex.

To answer your question directly (however), in addition to the unquote character (in this case }) you need to escape at a minimum, .[$()|*+?{\

like image 42
geocar Avatar answered Nov 08 '22 10:11

geocar