This pattern:
/a+?b+?/
Against the following string:
aaaaaabbbbbb
Matches:
aaaaaab
We see that the non-greedy behaves different in backward/left direction (takes all) and forward/right direction (takes just one).
Is there a way to make the non-greedy at the beginning, that matches all the a
, to match as less as possible too? So that it behaves in the same way like at with the b
part at the end?
Regexes generally match from left-to-right unless you set a right-to-left flag (which very few flavors support). In either case, they do not start in the middle and then work out in both directions, even if you use a lookbehind.
It helps to stop and ask - why does the lazy quantifier exist in the first place? What problem was it meant to solve?
Normal (greedy) quantifiers work by finding a matching pattern of text and then repeatedly matching a sequence of characters until they can match no more. This behavior is usually desired, but you run into problems when you have a very general pattern followed by a very specific pattern where the specific pattern is a subset of the general pattern.
For example, consider the following input:
_abc_END_def_END
And this pattern:
(\w+END)
The intent was to match _abc_
and then END
. The problem is that END
is a subset of \w+
. Using standard "greedy" rules, the \w+
matches as much as possible. So rather than matching _abc_
, it matched _abc_END_def
.
The solution to this scenario is to change the way the quantifier (+
) behaves with the lazy modifier ?
. By changing the expression to \w+?
, the regex engine is forced to match only as much as necessary to satisfy the expression and no more. The expression is satisfied when \w+?
matches _abc_
and END
matches its literal string.
The purpose of the lazy quantifier is not to match a "minimum" number of characters - it is about giving that second pattern, a subset of the first, an opportunity to match.
In your example, b
is not a subset of a
, so there is no need for the lazy quantifier. If you want to match one or more a's, but as few as possible, and one or more b's, but as few as possible, then you'd simply use:
ab
Or, if your a
is a stand-in for some superset which may include b:
[ab]b
For example:
\wb
Both of which would match:
ab
Example:
const input = "aaabbb"
console.log(/ab/.exec(input)[0])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With