a*
means zero or more instances of: a right?
so why does $_ = "a"; s/a*/e/g
produce: ee
Possible answer: it's replacing the string: "a" with: "e" and it's replacing the empty string: "" with: "e" as well. Or it's replacing the mere absence of a letter: a with a letter: e or it's replacing "zero occurrences" of: a with an: e
Ok then, but:
$_ = "b"; s/a*/e/g
produces: ebe
It seems to be replacing the empty string to the left of: b
and also the empty string to the right of: b
OK. But then why doesn't it do that for: "a
" ? Why doesn't it replace the empty string to the left of: a
and also the empty string to the right of: a and also the letter: a itself to get: eee
?
There are just as many zero occurrences of: a on the left side as the right side!
Your analysis of why the results are "ee"
and "ebe"
is completely accurate.
The "/g" modifier causes the regex to match once, and then try to match again from where the last match stopped.
The reason for the discrepancy (it doesn't replace the empty string to the left of "a"
) is that is because "*"
is greedy - it matches the MOST possible characters. From perldoc perlre
:
By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match.
So it matches zero "a"s, and sees if it can match more. Since there are more "a"s in the string, it will match one more. Try to match more. None? Done. So we match the first "a".
Then, "/g" causes us to try to match again (starting from where we stopped after last match completed), which now matches empty (zero "a"s) string.
Using Damian Conway's excellent Regexp::Debugger, I tried this:
perl -MRegexp::Debugger -E '$_ = "a"; s/a*/e/g; say'
And got this output, in case it makes things any clearer, shown in event logging mode. The first pass match running through the replacement yields this set of events:
a | a* | Starting regex match a | a* | Trying a literal character zero-or-more times (as many as possible) | a* | Matched | | Regex matched in 3 steps
This is showing that the "a" is matched the first time, which gets replaced by "e".
After completing the match the first time, the debugger lets me run a second match from the same program:
| <~~ | Back-tracking in regex | a* | Back-tracked and restarting regex match | a* | Trying a literal character zero-or-more times (as many as possible) | a* | Matched | | Regex matched in 3 steps
This is showing that the "" after the original "a" (now "e") is matched the second time and replaced with "e".
Unfortunately, either I don't know how to read the output or Regexp::Debugger gets confused at this point or something, but it repeats again, but doesn't do a replacement.
| <~~ | Back-tracking in regex | a* | Back-tracked and restarting regex match | a* | Trying a literal character zero-or-more times (as many as possible) | a* | Matched | | Regex matched in 3 steps
Anyway, either Perl has matched a third time and decides for some reason not to do a replacement this time or Regexp::Debugger or I am just confused.
Edit: I solved my confusion by reviewing perldoc perlre:
"The higher-level loops preserve an additional state between iterations: whether the last match was zero-length. To break the loop, the following match after a zero-length match is prohibited to have a length of zero. This prohibition interacts with backtracking (see "Backtracking"), and so the second best match is chosen if the best match is of zero length."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With