Consider this script, which is based on an answer to SO 267399 about parsing Roman numbers, though the parsing of Roman numbers is incidental to this question.
#!/usr/bin/env perl
#
# Based on answer to SO 0026-7399
use warnings;
use strict;
my $qr1 = qr/(?i:M{1,3})/;
my $qr2 = qr/(?i:C[MD]|D?C{1,3})/;
my $qr3 = qr/(?i:X[CL]|L?X{1,3})/;
my $qr4 = qr/(?i:I[XV]|V?I{1,3})/;
print "1000s: $qr1\n";
print " 100s: $qr2\n";
print " 10s: $qr3\n";
print " 1s: $qr4\n";
# This $qr is too simple — it matches the empty string
#my $qr = qr/($qr1?$qr2?$qr3?$qr4?)/;
my $qr = qr/\b((?:$qr1$qr2?$qr3?$qr4?)|(?:$qr2$qr3?$qr4?)|(?:$qr3$qr4?)|(?:$qr4))\b/;
print " Full: $qr\n";
while (<>)
{
chomp;
print " Line: [$_]\n";
while ($_ =~ m/$qr/g)
{
print "Match: [$1] found in [$_] using qr//\n";
}
}
Given the data file below, the first three lines each contain a Roman number.
mix in here
no mix in here
mmmcmlxxxix
minimum
When run with (home-built) Perl 5.22.0 on a Mac now running macOS Sierra 10.12.4, I get output like this (but the version of Perl is not critical):
1000s: (?^:(?i:M{1,3}))
100s: (?^:(?i:C[MD]|D?C{1,3}))
10s: (?^:(?i:X[CL]|L?X{1,3}))
1s: (?^:(?i:I[XV]|V?I{1,3}))
Full: (?^:\b((?:(?^:(?i:M{1,3}))(?^:(?i:C[MD]|D?C{1,3}))?(?^:(?i:X[CL]|L?X{1,3}))?(?^:(?i:I[XV]|V?I{1,3}))?)|(?:(?^:(?i:C[MD]|D?C{1,3}))(?^:(?i:X[CL]|L?X{1,3}))?(?^:(?i:I[XV]|V?I{1,3}))?)|(?:(?^:(?i:X[CL]|L?X{1,3}))(?^:(?i:I[XV]|V?I{1,3}))?)|(?:(?^:(?i:I[XV]|V?I{1,3}))))\b)
Line: [mix in here]
Match: [mix] found in [mix in here] using qr//
Line: [no mix in here]
Match: [mix] found in [no mix in here] using qr//
Line: [mmmcmlxxxix]
Match: [mmmcmlxxxix] found in [mmmcmlxxxix] using qr//
Line: [minimum]
The only part of the output that I don't understand is the caret ^
in the
(?^:…)
notation.
I've looked at Perl documentation for
perlre
and
perlref
and even the section of
perlop
on 'Regex quote-like operators' without seeing this exemplified or
explained. (I also checked the resources suggested by SO when you ask a question about regexes. The (?^:
string is carefully designed to give search engines conniptions.)
There are two parts to my question:
(?^:…)
and what caused
it to be added to the qr//
regexes?qr//
regexes?The caret (^) matches the beginning of a line. The dollar sign ($) matches the end of a line. The dot (.) matches any character. A single character that doesn't have any other special meaning matches that character.
=~ is the Perl binding operator. It's generally used to apply a regular expression to a string; for instance, to test if a string matches a pattern: if ($string =~ m/pattern/) {
qr// is one of the quote-like operators that apply to pattern matching and related activities. From perldoc: This operator quotes (and possibly compiles) its STRING as a regular expression. STRING is interpolated the same way as PATTERN in m/PATTERN/. If ' is used as the delimiter, no interpolation is done.
Simple word matching In this statement, World is a regex and the // enclosing /World/ tells Perl to search a string for a match. The operator =~ associates the string with the regex match and produces a true value if the regex matched, or false if the regex did not match.
Basically it means the default flags apply (even if it gets interpolated into a regex that specifies differently).
Before it was introduced, qr would produce something like (?-ismx:
and a new flag being added to Perl would make that change, which m ade keeping tests up
to date a pain.
http://perldoc.perl.org/perlre.html#Extended-Patterns:
Starting in Perl 5.14, a "^" (caret or circumflex accent) immediately after the "?" is a shorthand equivalent to d-imnsx . Flags (except "d" ) may follow the caret to override it. But a minus sign is not legal with it.
It means "set all flags (such as i
, s
) to their defaults", so
$ perl -le'my $re = "a"; for (qw( a A )) { print "$_: ", /$re/i ? "match" : "no match"; }'
a: match
A: match
$ perl -le'my $re = "(?^:a)"; for (qw( a A )) { print "$_: ", /$re/i ? "match" : "no match"; }'
a: match
A: no match
It's primarily used to represent patterns created by qr//.
$ perl -le'my $re = qr/a/; print $re; for (qw( a A )) { print "$_: ", /$re/i ? "match" : "no match"; }'
(?^:a)
a: match
A: no match
$ perl -le'my $re = qr/a/i; print $re; for (qw( a A )) { print "$_: ", /$re/i ? "match" : "no match"; }'
(?^i:a)
a: match
A: match
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With