Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

difference in match due to the position of negative lookahead?

I have plenty of confusion in regular expression and I am trying to solve them. Here I have the following string:

{start}do or die{end}extended string

My two different regexes, where I only changed the position of the dot:

(.(?!{end}))* //returns: {start}do or di
                                      //^ See here
((?!{end}).)* //returns: {start}do or die
                                      //^ See here

Why does the first regex eats the last "e" ?

And also how does this negative lookahead make this * quantifier non greedy? I mean why it can't consume characters beyond {end}?

like image 523
AL-zami Avatar asked Jul 17 '15 18:07

AL-zami


People also ask

What is a negative lookahead in regex?

In this type of lookahead the regex engine searches for a particular element which may be a character or characters or a group after the item matched. If that particular element is not present then the regex declares the match as a match otherwise it simply rejects that match.

What is Lookbehind in regex?

Lookbehind has the same effect, but works backwards. It tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there.

What is positive and negative lookahead?

Positive lookahead: (?= «pattern») matches if pattern matches what comes after the current location in the input string. Negative lookahead: (?! «pattern») matches if pattern does not match what comes after the current location in the input string.

What is a negative Lookbehind?

A negative lookbehind assertion asserts true if the pattern inside the lookbehind is not matched.


2 Answers

With your negative lookahead you say, that it is impossible to match the regex, which in your case is: {end}. And . captures everything except new line.

So with your first regex:

(.(?!{end}))*

It leaves out the e, because: e{end} can't match because of the negative lookahead. While in your second regex, where you have the dot on the other side it can until: {end}d so the e is included in your second regex.

like image 128
Rizier123 Avatar answered Sep 30 '22 19:09

Rizier123


i have figured a work flow for the regex engine for both the regex on completing the task...

First, for (.(?!{end}))* the approach for the regex engine as follows...

"{start}do or die{end}extended string"
^   .(dot) matches "{" and {end} tries to match here but fails.So "{" included
"{start}do or die{end}extended string"
 ^  . (dot) matches "s" and {end} tries to match here but fails.So "s" included

....
....so on...
"{start}do or die{end}extended string"
               ^ (dot) matches "e" and {end} here matches "{end}" so "e" is excluded..
so the match we get is "{start}do or di"

for the secodn regex ((?!{end}).)*....

"{start}do or die{end}extended string"
^ {end} regex tries to match here but fails to match.So dot consumes "{".

"{start}do or die{end}extended string"
 ^ {end} regex tries to match here but fails again.So dot consumes "s".

....
..so on..
"{start}do or die{end}extended string"
               ^   {end} regex tries to match here but fails.So dot consumes the "e"
"{start}do or die{end}extended string"
                ^   {end} regex tries to match here and succeed.So the whole regex fail here.

So we ended up with a match which is "{start}do or die"
like image 35
AL-zami Avatar answered Sep 30 '22 20:09

AL-zami