I have plenty of confusion in regular expression and I am trying to solve them. Here I have the following string:
{start}do or die{end}extended string
My two different regexes, where I only changed the position of the dot:
(.(?!{end}))* //returns: {start}do or di
//^ See here
((?!{end}).)* //returns: {start}do or die
//^ See here
Why does the first regex eats the last "e" ?
And also how does this negative lookahead make this * quantifier non greedy? I mean why it can't consume characters beyond {end}?
In this type of lookahead the regex engine searches for a particular element which may be a character or characters or a group after the item matched. If that particular element is not present then the regex declares the match as a match otherwise it simply rejects that match.
Lookbehind has the same effect, but works backwards. It tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there.
Positive lookahead: (?= «pattern») matches if pattern matches what comes after the current location in the input string. Negative lookahead: (?! «pattern») matches if pattern does not match what comes after the current location in the input string.
A negative lookbehind assertion asserts true if the pattern inside the lookbehind is not matched.
With your negative lookahead you say, that it is impossible to match the regex, which in your case is: {end}
. And .
captures everything except new line.
So with your first regex:
(.(?!{end}))*
It leaves out the e
, because: e{end}
can't match because of the negative lookahead. While in your second regex, where you have the dot on the other side it can until: {end}d
so the e
is included in your second regex.
i have figured a work flow for the regex engine for both the regex on completing the task...
First, for (.(?!{end}))*
the approach for the regex engine as follows...
"{start}do or die{end}extended string"
^ .(dot) matches "{" and {end} tries to match here but fails.So "{" included
"{start}do or die{end}extended string"
^ . (dot) matches "s" and {end} tries to match here but fails.So "s" included
....
....so on...
"{start}do or die{end}extended string"
^ (dot) matches "e" and {end} here matches "{end}" so "e" is excluded..
so the match we get is "{start}do or di"
for the secodn regex ((?!{end}).)*....
"{start}do or die{end}extended string"
^ {end} regex tries to match here but fails to match.So dot consumes "{".
"{start}do or die{end}extended string"
^ {end} regex tries to match here but fails again.So dot consumes "s".
....
..so on..
"{start}do or die{end}extended string"
^ {end} regex tries to match here but fails.So dot consumes the "e"
"{start}do or die{end}extended string"
^ {end} regex tries to match here and succeed.So the whole regex fail here.
So we ended up with a match which is "{start}do or die"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With