Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does negative lookahead with asterisks work?

I'm trying to understand why I'm not getting the expected results from a regex.

I already know what is negative lookahead (apparently not :-)) And also that asterisks is zero or more times of repeats.

Looking at this regex :

a(?![^3])

Regular expression visualization

This will match a which isn't followed by a non-3 after it.

So looking at this test string , the bold part is a match:

a3333335

Ok

Also- if I change the regex to :

a(?![^3]+)  //notice "+"

Regular expression visualization

It will still match :

a3333335

This will match a which isn't followed by a non-3's ( at least one)

Question

My problem is with * :

Let's change the regex to :

a(?![^3]*)

Regular expression visualization

This will not match

a3333335

But my question is - why ?

According to the drawing :

a should not be followed by : Either nothing or neither non-3's

But this is DOES happening : a is not followed by nothing AND is not followed by non3-'s

So why it doesn't match ?

And to make my life more difficult :

Looking at this regex :

a(?![^3]*7)

This will match :

a3333335

What is going on here?

like image 482
Royi Namir Avatar asked Sep 18 '15 21:09

Royi Namir


People also ask

What is regex negative look ahead?

The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point. Inside the lookahead, we have the trivial regex u. Positive lookahead works just the same. q(?=u) matches a q that is followed by a u, without making the u part of the match.

What is Lookbehind in regex?

Regex Lookbehind is used as an assertion in Python regular expressions(re) to determine success or failure whether the pattern is behind i.e to the right of the parser's current position. They don't match anything. Hence, Regex Lookbehind and lookahead are termed as a zero-width assertion.

Does JavaScript support negative Lookbehind?

Negative lookbehinds seem to be the only answer, but JavaScript doesn't has one. Consider posting the regex as it would look with a negative lookbehind; that may make it easier to respond. @WiktorStribiżew : Look-behinds were added in the 2018 spec. Chrome supports them, but Firefox still hasn't implemented the spec.

Does grep support negative lookahead?

Negative lookahead, which is what you're after, requires a more powerful tool than the standard grep . You need a PCRE-enabled grep. If you have GNU grep , the current version supports options -P or --perl-regexp and you can then use the regex you wanted.


1 Answers

The problem is that an asterisk can generate the empty string (""), and you can say that between every character and the next one, there is an empty string.

Given the regex:

a(?![^3]*)

and you query with a33333, you more or less say: reject if there are zero or more repetition of non-3's after a, but there is such repetition: the empty string, so without even capturing a single 3, it will reject. The matching thus looks like:

a    (?![^3]*)
"a"     ""   "33333"

(quotations mark strings, and are no characters here)

You can thus say that the negative lookahead, of a regular expression over a Kleene star will always reject (one must be careful, in the sentence I mean that the Kleene star is unified over the "entire" regular expression, this does not imply that a negative lookahead containing a Kleene star will always reject).

Your image shows this as well:

enter image description here

It says if not followed, it means that it cannot match what is inside the box. The problem is, that it doesn't have to take a single character to reach the end of the box.


This does not hold for a(?![^3]*7): here you say "*reject if you encounter zero-or more non-3's followed by a seven. Since the regex [^3]*7 doesn't match 3333335, the lookahead will not reject the match.

like image 124
Willem Van Onsem Avatar answered Sep 17 '22 15:09

Willem Van Onsem