In the following examples (via regex101.com, PCRE mode), I can't figure out why the + quantifier finds a sub-string but * doesn't.
In the first illustration, the + quantifier (1 or more) finds all four lower-case a characters (which is what I expected):
In the second illustration, the * quantifier (0 or more) doesn't find any lower-case a characters (which is NOT what I expected):
What REGEX logic explains why "1 or more" (+) finds all four lower-case a characters but "0 or more" (*) doesn't find any?
The regex engine will try to match the entire pattern at each position in the string, from left to right. The pattern /a*/
successfully matches the zero a
s at the very beginning of the string. This is what the little dotted caret in your regex101 screenshot signifies – a zero-width match at that position. It would match more a
s at that position, but there are none. Nonetheless, the match is successful.
If you use a function that returns all regex matches in the string, then it will move ahead a minimum of one character each time to look for new matches, so it will match aaaa
(as a single result) once it gets to it. Example in Python:
import re regex = r"a*" input = "AAAAaaaaBBBBbbbb" print(re.findall(regex, input))
Output:
['', '', '', '', 'aaaa', '', '', '', '', '', '', '', '', '']
Whereas, when you use /a+/
, it can't do those zero-width matches, so it steps through the input until it finds its first and only match at aaaa
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With