Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Negative lookahead Regular Expression

I want to match all strings ending in ".htm" unless it ends in "foo.htm". I'm generally decent with regular expressions, but negative lookaheads have me stumped. Why doesn't this work?

/(?!foo)\.htm$/i.test("/foo.htm");  // returns true. I want false. 

What should I be using instead? I think I need a "negative lookbehind" expression (if JavaScript supported such a thing, which I know it doesn't).

like image 867
gilly3 Avatar asked Jul 27 '11 22:07

gilly3


People also ask

What is negative lookahead in regex?

Because the lookahead is negative, this means that the lookahead has successfully matched at the current position. At this point, the entire regex has matched, and q is returned as the match.

What is a lookahead in regex?

Lookahead is used as an assertion in Python regular expressions to determine success or failure whether the pattern is ahead i.e to the right of the parser's current position. They don't match anything. Hence, they are called as zero-width assertions.

Can I use regex lookahead?

Lookahead assertions are part of JavaScript's original regular expression support and are thus supported in all browsers.

What is lookahead in regex JavaScript?

The syntax is: X(?= Y) , it means "look for X , but match only if followed by Y ". There may be any pattern instead of X and Y . For an integer number followed by € , the regexp will be \d+(?=


2 Answers

The problem is pretty simple really. This will do it:

/^(?!.*foo\.htm$).*\.htm$/i

like image 168
ridgerunner Avatar answered Sep 20 '22 16:09

ridgerunner


What you are describing (your intention) is a negative look-behind, and Javascript has no support for look-behinds.

Look-aheads look forward from the character at which they are placed — and you've placed it before the .. So, what you've got is actually saying "anything ending in .htm as long as the first three characters starting at that position (.ht) are not foo" which is always true.

Usually, the substitute for negative look-behinds is to match more than you need, and extract only the part you actually do need. This is hacky, and depending on your precise situation you can probably come up with something else, but something like this:

// Checks that the last 3 characters before the dot are not foo: /(?!foo).{3}\.htm$/i.test("/foo.htm"); // returns false  
like image 34
Nicole Avatar answered Sep 20 '22 16:09

Nicole