I am trying to write a regular expression that will find any word that is followed by a space so long as that word is not AND
, OR
, NOT
.
I've tried a negative lookahead after searching for similar problems, this is my current regex: (?!AND|OR|NOT).*?\\s
If I try this with "AND " I get a match on "ND". If I try with "OR " I get "R" and if I try with "NOT " I get "OT".
Can anyone help?
Because the lookahead is negative, this means that the lookahead has successfully matched at the current position. At this point, the entire regex has matched, and q is returned as the match.
Negative lookahead, which is what you're after, requires a more powerful tool than the standard grep . You need a PCRE-enabled grep. If you have GNU grep , the current version supports options -P or --perl-regexp and you can then use the regex you wanted.
Negative lookahead That's a number \d+ , NOT followed by € . For that, a negative lookahead can be applied. The syntax is: X(?! Y) , it means "search X , but only if not followed by Y ".
The lookbehind asserts that what immediately precedes the current position is a lowercase letter. And the lookahead asserts that what immediately follows the current position is an uppercase letter.
Try with this pattern:
\\b(?!(?:AND|OR|NOT)\\b)[a-zA-Z]+\\s
I have added some word boundaries (\b) and used the character class [a-zA-Z]
(you can replace it by [a-z] in a case insensitive context) to avoid the lazy quantifier.
or more performant (with case insensitive):
\\b(?>(?>[b-mp-z])|(?!(?>and|or|not)\\b)[aon])(?>[a-z]*)\\s
if you want to match:
(?<=(\"?)\\b)(?!(?:AND|OR|NOT)\\b)[a-zA-Z]+(?=\\1(?:\\s|$))
(\"?)(?<=\\b)(?!(?:AND|OR|NOT)\\b)[a-zA-Z]+\\1(?=\\s|$)
(?<=(\\()\\b)(?!(?:AND|OR|NOT)\\b)[a-zA-Z]+(?=(?(1)\\)|(?:\\s|$)))
(?<=(\\()?(\"?)\\b)(?!(?:AND|OR|NOT)\\b)[a-zA-Z]+(?=(?(1)\\)|\\2(?:\\s|$)))
\\b(?!(?:AND|OR|NOT)\\b)[a-zA-Z]+\\b
Hmm, I'm not 100% sure if I understood correctly, but could you try this and see if it's what you were looking for?
(?<=\bAND|\bOR|\bNOT)\s.*
This will match XYZ
in your comment (though with the preceding white character). I tested it here after adding a word in between.
EDIT: If there are no more characters to the right and you need the last three characters, you could use either:
\w+$
or:
[^\s]+$
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With