In this regex
$line = 'this is a regular expression';
$line =~ s/^(\w+)\b(.*)\b(\w+)$/$3 $2 $1/;
print $line;
Why is $2 equal to " is a regular "
? My thought process is that (.*) should be greedy and match all characters until the end of the line and therefore $3 would be empty.
That's not happening, though. The regex matcher is somehow stopping right before the last word boundary and populating $3 with what's after the last word boundary and the rest of the string is sent to $2.
Any explanation? Thanks.
$3
can't be empty when using this regex because the corresponding capturing group is (\w+)
, which must match at least one word character or the whole match will fail.
So what happens is (.*)
matches "is a regular expression
", \b
matches the end of the string, and (\w+)
fails to match. The regex engine then backtracks to (.*)
matching "is a regular "
(note the match includes the space), \b
matches the word boundary before e
, and (\w+)
matches "expression
".
If you change(\w+)
to (\w*)
then you will end up with the result you expected, where (.*)
consumes the whole string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With