In perl, the *
is usually greedy, unless you add a ?
after it. When *
is used against a group, however, the situation seems different. My question is "why". Consider this example:
my $text = 'f fjfj ff';
my (@matches) = $text =~ m/((?:fj)*)/;
print "@matches\n";
# --> ""
@matches = $text =~ m/((?:fj)+)/;
print "@matches\n";
# --> "fjfj"
In the first match, perl lazily prints out nothing, though it could have matched something, as is demonstrated in the second match. Oddly, the behavior of *
is greedy as expected when the contents of the group is just .
instead of actual characters:
@matches = $text =~ m/((?:..)*)/;
print "@matches\n";
# --> 'f fjfj f'
The Substitution Operator The substitution operator, s///, is really just an extension of the match operator that allows you to replace the text matched with some new text. The basic form of the operator is − s/PATTERN/REPLACEMENT/; The PATTERN is the regular expression for the text that we are looking for.
The “g” stands for “global”, which tells Perl to replace all matches, and not just the first one. Options are typically indicated including the slash, like “/g”, even though you do not add an extra slash, and even though you could use any non-word character instead of slashes.
Digit \d[0-9]: The \d is used to match any digit character and its equivalent to [0-9]. In the regex /\d/ will match a single digit.
This isn't a matter of greedy or lazy repetition. (?:fj)*
is greedily matching as many repetitions of "fj" as it can, but it will successfully match zero repetitions. When you try to match it against the string "f fjfj ff"
, it will first attempt to match at position zero (before the first "f"). The maximum number of times you can successfully match "fj" at position zero is zero, so the pattern successfully matches the empty string. Since the pattern successfully matched at position zero, we're done, and the engine has no reason to try a match at a later position.
The moral of the story is: don't write a pattern that can match nothing, unless you want it to match nothing.
Perl will match as early as possible in the string (left-most). It can do that with your first match by matching zero occurrences of fj
at the start of the string
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With