I have a very simple regex similar to this:
HOHO.*?_HO_
With this test string...
fiwgu_HOHO_HOHO_HOHOrgh_HOHO_feh_HOHO___HO_fbguyev
_HOHO___HO_
(shortest match, non-greedy)_HOHO_HOHO_HOHOrgh_HOHO_feh_HOHO___HO_
(longest match, looks greedy). Why? How can I make it match the shortest match?
Adding and removing the ?
gives the same result.
Edit - better test string that shows why [^HOHO]
doesn't work: fiwgu_HOHO_HOHO_HOHOrgh_HOHO_feh_HOHO_H_O_H_O_HO_fbguye
All I can think of is that maybe it is matching multiple times - but there's only one match for _HO_
, so I don't understand why it isn't taking the shortest match that ends at the _HO_
, discarding the rest.
I've browsed all the questions I can find with titles like "Non-greedy regex acts greedy", but they all seem to have some other problem.
I figured out a solution with some help from Regex lazy vs greedy confusion.
In regex engines like the one used by Javascript (NFA engines I believe), non-greedy only gives you the match that is shortest going left to right - from the first left-hand match that fits to the nearest right-hand match.
Where there are many left-hand matches for one right-hand match, it will always go from the first it reaches (which will actually give the longest match).
Essentially, it goes through the string one character at a time asking "Are there matches from this character? If so, match the shortest and finish. If no, move to next character, repeat". I expected it to be "Are there matches anywhere in this string? If so, match the shortest of all of them".
You can approximate a regex that is non-greedy in both directions by replacing the .
with a negation meaning "not the left-side match". To negate a string like this requires negative lookaheads and non-capturing groups, but it's as simple as dropping the string into (?:(?!).)
. For example, (?:(?!HOHO).)
For example, the equivalent of HOHO.*?_HO_
which is non-greedy on the left and right would be:
HOHO(?:(?!HOHO).)*?_HO_
So the regex engine is essentially going through each character like this:
HOHO
- Does this match the left side? (?:(?!HOHO).)*
- If so, can I reach the right-hand side without any repeats of the left side? _HO_
- If so, grab everything until the right-hand match ?
modifier on *
or +
- If there are multiple right-hand matches, choose the nearest oneIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With