I thought I understood Perl RE to a reasonable extent, but this is puzzling me:
#!/usr/bin/perl
use strict;
use warnings;
my $test = "'some random string'";
if($test =~ /\'?(.*?)\'?/) {
print "Captured $1\n";
print "Matched $&";
}
else {
print "What?!!";
}
prints
Captured
Matched '
It seems it has matched the ending ' alone, and so captured nothing.
I would have expected it to match the entire thing, or if it's totally non-greedy, nothing at all (as everything there is an optional match).
This in between behaviour baffles me, can anyone explain what is happening?
The \'?
at the beginning and end means match 0 or 1 apostrophes greedily. (As another poster has pointed out, to make it non-greedy, it would have to be \'??
)
The .*?
in the middle means match 0 or more characters non-greedily.
The Perl regular expression engine will look at the first part of the string. It will match the beginning, but does so greedily, so it picks up the first apostrophe. It then matches non-greedily (so takes as little as it can) followed by an optional apostrophe. This is matched by the empty string.
I think you mean something like:
/'(.*?)'/ // matches everything in single quotes
or
/'[^']*'/ // matches everything in single quotes, but faster
The singe quotes don't need to be escaped, AFAIK.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With