In one source code I found this regex:
qr/(?!)/;
I simply can't figure out what this matches.
Honestly, absolutely don't understand what means the A zero-width negative look-ahead assertion. - what i found in the perlre. :(
Can someone explain it in an human language, please? :)
qr// is one of the quote-like operators that apply to pattern matching and related activities. From perldoc: This operator quotes (and possibly compiles) its STRING as a regular expression. STRING is interpolated the same way as PATTERN in m/PATTERN/. If ' is used as the delimiter, no interpolation is done.
Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.
The empty regex pattern matches a zero-length string, which is to say it always matches. It's an obvious progression:
'bbbbb' =~ /^(?:aaa|bbb)/ # Matches (Matches 3 "b"s, from pos 0 to 3)
'bbbbb' =~ /^(?:aaa|bb)/ # Matches (Matches 2 "b"s, from pos 0 to 2)
'bbbbb' =~ /^(?:aaa|b)/ # Matches (Matches 1 "b", from pos 0 to 1)
'bbbbb' =~ /^(?:aaa|)/ # Matches (Matches 0 "b"s, from pos 0 to 0)
This means that (?=)
("Is this position followed by a zero-length string?") always matches and (?!)
("Is this position not followed by a zero-length string?") never matches. In fact, (?!)
is optimised to (*FAIL)
since the latter's introduction in 5.10.
(?!)
aka (*FAIL)
is useful to force backtracking when the pattern has side-effects.
'abcd' =~ /(.+?)(?{ print "$1\n" })(?!)/;
Output:
a
ab
abc
abcd
b
bc
bcd
c
cd
d
Explanation of example:
(?!)
doesn't match, so the regex engine keeps trying to find a match by having .+?
match more and more characters. When that fails, the regex engine tries to match at a later starting position.
This is called "backtracking". It's how the regex engine can match 'aaaab' =~ /a*ab/
. The first time through, a*
matches all 4 a
s, so the ab
doesn't match, so the engine backtracks. The second time through, a*
only matches 3 of the a
s, allowing ab
and thus the whole pattern to match.
The step by step flow for the example I originally gave follows:
(.+?)
matches a
at pos 0(?{ print "$1\n" })
prints a
and matches zero chars(?!)
doesn't match. ⇒ Backtrack!(.+?)
matches ab
at pos 0(?{ print "$1\n" })
prints ab
and matches zero chars(?!)
doesn't match. ⇒ Backtrack!(.+?)
matches abc
at pos 0(?{ print "$1\n" })
prints abc
and matches zero chars(?!)
doesn't match. ⇒ Backtrack!(.+?)
matches abcd
at pos 0(?{ print "$1\n" })
prints abcd
and matches zero chars(?!)
doesn't match. ⇒ Backtrack!(.+?)
can't match anything else here. ⇒ Backtrack!(.+?)
matches b
at pos 1(?{ print "$1\n" })
prints b
and matches zero chars(?!)
doesn't match. ⇒ Backtrack!(.+?)
matches d
at pos 3(?{ print "$1\n" })
prints d
and matches zero chars(?!)
doesn't match. ⇒ Backtrack!(.+?)
can't match anything else here. ⇒ Backtrack!(.+?)
doesn't match. ⇒ Backtrack!It is legal, but matches nothing at all.
The (?!...)
construct is a negative lookahead assertion. In details, it means: "match a position where the regex that follows (...
) should not match the input string".
But in this case, the "regex that follows" is the empty regex, which matches everything.
So, this regex essentially says "match a position where what follows cannot be matched by the empty regex"... And there can be no such position, whatever the input string. This is a regex construct which always fails!
(?=)
, an empty positive lookahead, will always match. It’s a hackish way to set the value of the last successful match. (?!)
is its inverse, and will never match.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With