Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why "abcdef" is not matched by (?=abc)def but matched by abc(?=def)? [duplicate]

In Javascript, I have a string abcdef and cannot figure out this strange behavior:

  • (?=abc)def doesnt match the string
  • abc(?=def) does match the string

Why?

like image 513
Zaffy Avatar asked Jun 10 '13 01:06

Zaffy


2 Answers

In (?=abc)def the (?=abc) capture is zero width, and doesn't move the cursor forward in the input string following a successful match. That construct is simply saying look ahead at the next three characters to see if they are abc, if they are then check to see if those same characters are def. At this point the match fails..

You need to understand how the regex engine works to complete your match. Consider your input string abcdef and your regex abc(?=def). The engine starts by matching the a then moves the cursor inside the input string over to the next character and attempts to match the b because the cursor in the input string is on b the match succeeds. Then the engine moves the cursor inside the input string over and attempts to match the c and because the cursor is in the input string is on a c the match succeeds and the cursor in the input string is again moved to the next character. Now the engine encounters the (?=def) at this point the engine just looks ahead to see if the next three characters from where the cursor is in the input sting are in fact def without moving the cursor, which they are and the match completes successfully.

Now consider the input string xyz and a regex x(?=y)Z. The regex engine put the cursor on the first letter in the input string and checks to see it it is an x and finds that an x so it moves the cursor to the next character in the input string. Now it looks ahead to see if the next character is a y, which it is, but the engine doesn't move the input text cursor foreword so the cursor in the input text stays on the y. Next the engine looks to see if the cursor is on the letter z, but because the cursor in the input text is still on the letter y the match fails.

You can read a lot more about both positive and negative lookaheads at http://www.regular-expressions.info/lookaround.html

like image 83
Ro Yo Mi Avatar answered Nov 11 '22 20:11

Ro Yo Mi


(?=...) is a lookahead, in other words that tests the string on its right. Note too that a lookahead is a zero-width assertion that don't eat character. In your first example: (?=abc) that means must be followed by abc encounters def. This is the reason why the pattern fails.

In you second example it finds def after abc, then the string is matched

like image 4
Casimir et Hippolyte Avatar answered Nov 11 '22 19:11

Casimir et Hippolyte