I have an issue with lazy quantifiers. Or most likely I misunderstand how I am supposed to use them.
Testing on Regex101
My test string is let's say: 123456789D123456789
.{1,5}
matches 12345
.{1,5}?
matches 1
I am OK with both matches.
.{1,5}?D
matches 56789D
!! I would expect it to match 9D
Thanks for clarifying this.
First and foremost, please do not think of greediness and laziness in regex as means of getting the longest/shortest match. "Greedy" and "lazy" terms only pertain to the rightmost character a pattern can match, it does not have any impact on the leftmost one. When you use a lazy quantifier, it will guarantee that the end of your matched substring will be the first found one, not the last found one (that would be returned with a greedy quantifier).
The regex engine analyzes a string from left to right. So, it searches for the first character that meets the pattern and then, once it finds the matching substring, it is returned as a match.
Let's see how it parses the string with .{1,5}D
: 1
is found and D
is tested for. No D
after 1
is found, the regex engine expands the lazy quantifier and matches 12
and tries to match D
. There is 3
after 2
, again, the engine expands the lazy dot and does it 5 times. After expanding to the max value, it sees there is 12345
and the next character is not D
. Since the engine reached the max limiting quantifier value, the match is failed, next location is tested.
The same scenario happens with the locations up to 5
. When the engine reaches 5
, it tries to match 5D
, fails, tries 56D
, fails, 567D
, fails, 5678D
- fails again, and when it tries to match 56789D
- Bingo! - the match is found.
This makes it clear that a lazily quantified subpattern at the beginning of a pattern will act "greedily" by default, that is, it will not match the shortest substring.
Here is a visualization from regex101.com:
Now, here is a fun fact: .{1,5}?
at the end of the pattern will always match 1 character (if there is any) because the requirement is to match at least 1, and it is sufficient to return a valid match. So, if you write D.{1,5}?
, you will get D1
and D6
in 123456789D12345D678904
.
Fun Fact 2: In .NET, you can "ask" the regex engine to analyze the string from right to left with the help of RightToLeft
modifier. Then, with .{1,5}?D
, you will get 9D
, see this demo.
Fun fact 3: In .NET, (?<=(.{1,5}?))D
will capture 9
into Group 1 if 123456789D
is passed as input. This happens because of the way the lookbehind is implemented in .NET regex (.NET reverses the string as well as the pattern inside the lookbehind, then attempts to match that single pattern on the reversed string). And in Java, (?<=(.{1,5}))D
(the greedy version) will capture 9
because it tries all the possible fixed-width patterns in the range, from the shortest to the longest, until one succeeds.
And a solution is: if you know you need 1 character followed with D
, just use
/.D/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With