Is there a built-in way to escape a string that will be used within/as a regular expression? E.g.
www.abc.com
The escaped version would be:
www\.abc\.com
I was going to use:
$string =~ s/[.*+?|()\[\]{}\\]/\\$&/g; # Escapes special regex chars
But I just wanted to make sure that there's not a cleaner built-in operation that I'm missing?
Using character sets For example, the regular expression "[ A-Za-z] " specifies to match any single uppercase or lowercase letter. In the character set, a hyphen indicates a range of characters, for example [A-Z] will match any one capital letter.
Use quotemeta
or \Q...\E
.
Consider the following test program that matches against $str
as-is, with quotemeta
, and with \Q...\E
:
#! /usr/bin/perl
use warnings;
use strict;
my $str = "www.abc.com";
my @test = (
"www.abc.com",
"www/abc!com",
);
sub ismatch($) { $_[0] ? "MATCH" : "NO MATCH" }
my @match = (
[ as_is => sub { ismatch /$str/ } ],
[ qmeta => sub { my $qm = quotemeta $str; ismatch /$qm/ } ],
[ qe => sub { ismatch /\Q$str\E/ } ],
);
for (@test) {
print "\$_ = '$_':\n";
foreach my $method (@match) {
my($name,$match) = @$method;
print " - $name: ", $match->(), "\n";
}
}
Notice in the output that using the string as-is could produce spurious matches:
$ ./try $_ = 'www.abc.com': - as_is: MATCH - qmeta: MATCH - qe: MATCH $_ = 'www/abc!com': - as_is: MATCH - qmeta: NO MATCH - qe: NO MATCH
For programs that accept untrustworthy inputs, be extremely careful about using such potentially nasty bits as regular expressions: doing so could create unexpected runtime errors, denial-of-service vulnerabilities, and security holes.
The best way to do this is to use \Q
to begin a quoted string and \E
to end it.
my $foo = 'www.abc.com';
$bar =~ /blah\Q$foo\Eblah/;
You can also use quotemeta
on the variable first. E.g.
my $quoted_foo = quotemeta($foo);
The \Q
trick is documented in perlre under "Escape Sequences."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With