Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Negative lookahead regex to ignore list of words

Tags:

regex

I am trying to write a regular expression that will find any word that is followed by a space so long as that word is not AND, OR, NOT.

I've tried a negative lookahead after searching for similar problems, this is my current regex: (?!AND|OR|NOT).*?\\s

If I try this with "AND " I get a match on "ND". If I try with "OR " I get "R" and if I try with "NOT " I get "OT".

Can anyone help?

like image 515
GPW Avatar asked May 11 '13 15:05

GPW


People also ask

What is negative look ahead in regex?

Because the lookahead is negative, this means that the lookahead has successfully matched at the current position. At this point, the entire regex has matched, and q is returned as the match.

Does grep support negative lookahead?

Negative lookahead, which is what you're after, requires a more powerful tool than the standard grep . You need a PCRE-enabled grep. If you have GNU grep , the current version supports options -P or --perl-regexp and you can then use the regex you wanted.

Can I use negative lookahead?

Negative lookahead That's a number \d+ , NOT followed by € . For that, a negative lookahead can be applied. The syntax is: X(?! Y) , it means "search X , but only if not followed by Y ".

What is lookahead and Lookbehind?

The lookbehind asserts that what immediately precedes the current position is a lowercase letter. And the lookahead asserts that what immediately follows the current position is an uppercase letter.


2 Answers

Try with this pattern:

\\b(?!(?:AND|OR|NOT)\\b)[a-zA-Z]+\\s

I have added some word boundaries (\b) and used the character class [a-zA-Z] (you can replace it by [a-z] in a case insensitive context) to avoid the lazy quantifier.

or more performant (with case insensitive):

\\b(?>(?>[b-mp-z])|(?!(?>and|or|not)\\b)[aon])(?>[a-z]*)\\s

if you want to match:

  • words between double-quotes without the double quotes or spaces:

(?<=(\"?)\\b)(?!(?:AND|OR|NOT)\\b)[a-zA-Z]+(?=\\1(?:\\s|$))

  • words between double-quotes with the double quotes:

(\"?)(?<=\\b)(?!(?:AND|OR|NOT)\\b)[a-zA-Z]+\\1(?=\\s|$)

  • words between parenthesis without parenthesis:

(?<=(\\()\\b)(?!(?:AND|OR|NOT)\\b)[a-zA-Z]+(?=(?(1)\\)|(?:\\s|$)))

  • words between parenthesis and double-quotes without both:

(?<=(\\()?(\"?)\\b)(?!(?:AND|OR|NOT)\\b)[a-zA-Z]+(?=(?(1)\\)|\\2(?:\\s|$)))

  • words that are not AND OR NOT without all that you want:

\\b(?!(?:AND|OR|NOT)\\b)[a-zA-Z]+\\b

like image 92
Casimir et Hippolyte Avatar answered Nov 15 '22 04:11

Casimir et Hippolyte


Hmm, I'm not 100% sure if I understood correctly, but could you try this and see if it's what you were looking for?

(?<=\bAND|\bOR|\bNOT)\s.*

This will match XYZ in your comment (though with the preceding white character). I tested it here after adding a word in between.

EDIT: If there are no more characters to the right and you need the last three characters, you could use either:

\w+$

or:

[^\s]+$
like image 23
Jerry Avatar answered Nov 15 '22 03:11

Jerry