Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this regex not greedy?

Tags:

regex

perl

In this regex

$line = 'this is a regular expression';
$line =~  s/^(\w+)\b(.*)\b(\w+)$/$3 $2 $1/;

print $line;

Why is $2 equal to " is a regular "? My thought process is that (.*) should be greedy and match all characters until the end of the line and therefore $3 would be empty.

That's not happening, though. The regex matcher is somehow stopping right before the last word boundary and populating $3 with what's after the last word boundary and the rest of the string is sent to $2.

Any explanation? Thanks.

like image 366
Josh Klein Avatar asked Oct 14 '12 02:10

Josh Klein


1 Answers

$3 can't be empty when using this regex because the corresponding capturing group is (\w+), which must match at least one word character or the whole match will fail.

So what happens is (.*) matches "is a regular expression", \b matches the end of the string, and (\w+) fails to match. The regex engine then backtracks to (.*) matching "is a regular " (note the match includes the space), \b matches the word boundary before e, and (\w+) matches "expression".

If you change(\w+) to (\w*) then you will end up with the result you expected, where (.*) consumes the whole string.

like image 174
verdesmarald Avatar answered Sep 30 '22 15:09

verdesmarald