In the following examples (via regex101.com, PCRE mode), I can't figure out why the + quantifier finds a sub-string but * doesn't.
In the first illustration, the + quantifier (1 or more) finds all four lower-case a characters (which is what I expected):

In the second illustration, the * quantifier (0 or more) doesn't find any lower-case a characters (which is NOT what I expected):

What REGEX logic explains why "1 or more" (+) finds all four lower-case a characters but "0 or more" (*) doesn't find any?
The regex engine will try to match the entire pattern at each position in the string, from left to right. The pattern /a*/ successfully matches the zero as at the very beginning of the string. This is what the little dotted caret in your regex101 screenshot signifies – a zero-width match at that position. It would match more as at that position, but there are none. Nonetheless, the match is successful.
If you use a function that returns all regex matches in the string, then it will move ahead a minimum of one character each time to look for new matches, so it will match aaaa (as a single result) once it gets to it. Example in Python:
import re regex = r"a*" input = "AAAAaaaaBBBBbbbb" print(re.findall(regex, input)) Output:
['', '', '', '', 'aaaa', '', '', '', '', '', '', '', '', ''] Whereas, when you use /a+/, it can't do those zero-width matches, so it steps through the input until it finds its first and only match at aaaa.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With