Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Negative lookahead assertion with the * modifier in Perl

I have the (what I believe to be) negative lookahead assertion <@> *(?!QQQ) that I expect to match if the tested string is a <@> followed by any number of spaces (zero including) and then not followed by QQQ.

Yet, if the tested string is <@> QQQ the regular expression matches.

I fail to see why this is the case and would appreciate any help on this matter.

Here's a test script

use warnings;
use strict;

my @strings = ('something <@> QQQ',
               'something <@> RRR',
               'something <@>QQQ' ,
               'something <@>RRR' );


print "$_\n" for map {$_ . " --> " . rep($_) } (@strings);



sub rep {

  my $string = shift;

  $string  =~ s,<@> *(?!QQQ),at w/o ,;
  $string  =~ s,<@> *QQQ,at w/  QQQ,;

  return $string;
}

This prints

something <@> QQQ --> something at w/o  QQQ
something <@> RRR --> something at w/o RRR
something <@>QQQ --> something at w/  QQQ
something <@>RRR --> something at w/o RRR

And I'd have expected the first line to be something <@> QQQ --> something at w/ QQQ.

like image 309
René Nyffenegger Avatar asked Apr 27 '12 11:04

René Nyffenegger


People also ask

What is lookahead assertion?

A lookahead assertion has the form (?= test) and can appear anywhere in a regular expression. MATLAB® looks ahead of the current location in the text for the test condition. If MATLAB matches the test condition, it continues processing the rest of the expression to find a match.

What is a regex assertion?

Regular Expression (Regex or RE) in Perl is when a special string describing a sequence or the search pattern in the given string. An Assertion in Regular Expression is when a match is possible in some way.

What is assert in Perl?

The \G assertion in Perl allows you to continue searching from the point where the last match occurred.

What is positive and negative lookahead?

Positive lookahead: (?= «pattern») matches if pattern matches what comes after the current location in the input string. Negative lookahead: (?! «pattern») matches if pattern does not match what comes after the current location in the input string.


1 Answers

It matches because zero is included in "any number". So no spaces, followed by a space, matches "any number of spaces not followed by a Q".

You should add another lookahead assertion that the first thing after your spaces is not itself a space. Try this (untested):

 <@> *(?!QQQ)(?! )

ETA Side note: changing the quantifier to + would have helped only when there's exactly one space; in the general case, the regex can always grab one less space and therefore succeed. Regexes want to match, and will bend over backwards to do so in any way possible. All other considerations (leftmost, longest, etc) take a back seat - if it can match more than one way, they determine which way is chosen. But matching always wins over not matching.

like image 70
Mark Reed Avatar answered Oct 12 '22 04:10

Mark Reed