Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't * match when + does?

Tags:

In the following examples (via regex101.com, PCRE mode), I can't figure out why the + quantifier finds a sub-string but * doesn't.

In the first illustration, the + quantifier (1 or more) finds all four lower-case a characters (which is what I expected):

Plus-sign quantifier finds 1 or more as expected

In the second illustration, the * quantifier (0 or more) doesn't find any lower-case a characters (which is NOT what I expected):

Asterisk quantifier doesn't find 0 or more

What REGEX logic explains why "1 or more" (+) finds all four lower-case a characters but "0 or more" (*) doesn't find any?

like image 576
RBV Avatar asked Apr 02 '16 00:04

RBV


1 Answers

The regex engine will try to match the entire pattern at each position in the string, from left to right. The pattern /a*/ successfully matches the zero as at the very beginning of the string. This is what the little dotted caret in your regex101 screenshot signifies – a zero-width match at that position. It would match more as at that position, but there are none. Nonetheless, the match is successful.

If you use a function that returns all regex matches in the string, then it will move ahead a minimum of one character each time to look for new matches, so it will match aaaa (as a single result) once it gets to it. Example in Python:

import re regex = r"a*" input = "AAAAaaaaBBBBbbbb" print(re.findall(regex, input)) 

Output:

['', '', '', '', 'aaaa', '', '', '', '', '', '', '', '', ''] 

Whereas, when you use /a+/, it can't do those zero-width matches, so it steps through the input until it finds its first and only match at aaaa.

like image 182
Boann Avatar answered Sep 28 '22 05:09

Boann