Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What matches this regex: qr/(?!)/;

Tags:

regex

perl

In one source code I found this regex:

qr/(?!)/;

I simply can't figure out what this matches.

Honestly, absolutely don't understand what means the A zero-width negative look-ahead assertion. - what i found in the perlre. :(

Can someone explain it in an human language, please? :)

like image 564
novacik Avatar asked Jun 01 '13 00:06

novacik


People also ask

What is QR in regex?

qr// is one of the quote-like operators that apply to pattern matching and related activities. From perldoc: This operator quotes (and possibly compiles) its STRING as a regular expression. STRING is interpolated the same way as PATTERN in m/PATTERN/. If ' is used as the delimiter, no interpolation is done.

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.


3 Answers

The empty regex pattern matches a zero-length string, which is to say it always matches. It's an obvious progression:

'bbbbb' =~ /^(?:aaa|bbb)/   # Matches (Matches 3 "b"s, from pos 0 to 3)
'bbbbb' =~ /^(?:aaa|bb)/    # Matches (Matches 2 "b"s, from pos 0 to 2)
'bbbbb' =~ /^(?:aaa|b)/     # Matches (Matches 1 "b",  from pos 0 to 1)
'bbbbb' =~ /^(?:aaa|)/      # Matches (Matches 0 "b"s, from pos 0 to 0)

This means that (?=) ("Is this position followed by a zero-length string?") always matches and (?!) ("Is this position not followed by a zero-length string?") never matches. In fact, (?!) is optimised to (*FAIL) since the latter's introduction in 5.10.

(?!) aka (*FAIL) is useful to force backtracking when the pattern has side-effects.

'abcd' =~ /(.+?)(?{ print "$1\n" })(?!)/;

Output:

a
ab
abc
abcd
b
bc
bcd
c
cd
d

Explanation of example:

(?!) doesn't match, so the regex engine keeps trying to find a match by having .+? match more and more characters. When that fails, the regex engine tries to match at a later starting position.

This is called "backtracking". It's how the regex engine can match 'aaaab' =~ /a*ab/. The first time through, a* matches all 4 as, so the ab doesn't match, so the engine backtracks. The second time through, a* only matches 3 of the as, allowing ab and thus the whole pattern to match.

The step by step flow for the example I originally gave follows:

  1. Start matching at pos 0.
  2. (.+?) matches a at pos 0
  3. (?{ print "$1\n" }) prints a and matches zero chars
  4. (?!) doesn't match. ⇒ Backtrack!
  5. (.+?) matches ab at pos 0
  6. (?{ print "$1\n" }) prints ab and matches zero chars
  7. (?!) doesn't match. ⇒ Backtrack!
  8. (.+?) matches abc at pos 0
  9. (?{ print "$1\n" }) prints abc and matches zero chars
  10. (?!) doesn't match. ⇒ Backtrack!
  11. (.+?) matches abcd at pos 0
  12. (?{ print "$1\n" }) prints abcd and matches zero chars
  13. (?!) doesn't match. ⇒ Backtrack!
  14. (.+?) can't match anything else here. ⇒ Backtrack!
  15. Start matching at pos 1.
  16. (.+?) matches b at pos 1
  17. (?{ print "$1\n" }) prints b and matches zero chars
  18. (?!) doesn't match. ⇒ Backtrack!
  19. ...
  20. (.+?) matches d at pos 3
  21. (?{ print "$1\n" }) prints d and matches zero chars
  22. (?!) doesn't match. ⇒ Backtrack!
  23. (.+?) can't match anything else here. ⇒ Backtrack!
  24. Start matching at pos 4.
  25. (.+?) doesn't match. ⇒ Backtrack!
  26. Pattern doesn't match.
like image 98
ikegami Avatar answered Oct 02 '22 08:10

ikegami


It is legal, but matches nothing at all.

The (?!...) construct is a negative lookahead assertion. In details, it means: "match a position where the regex that follows (...) should not match the input string".

But in this case, the "regex that follows" is the empty regex, which matches everything.

So, this regex essentially says "match a position where what follows cannot be matched by the empty regex"... And there can be no such position, whatever the input string. This is a regex construct which always fails!

like image 21
fge Avatar answered Oct 02 '22 07:10

fge


(?=), an empty positive lookahead, will always match. It’s a hackish way to set the value of the last successful match. (?!) is its inverse, and will never match.

like image 21
Jon Purdy Avatar answered Oct 02 '22 07:10

Jon Purdy