Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match all words except a given list

Tags:

c#

.net

regex

I am trying to write a replacement regular expression to surround all words in quotes except the words AND, OR and NOT.

I have tried the following for the match part of the expression:

(?i)(?<word>[a-z0-9]+)(?<!and|not|or)

and

(?i)(?<word>[a-z0-9]+)(?!and|not|or)

but neither work. The replacement expression is simple and currently surrounds all words.

"${word}"

So

This and This not That

becomes

"This" and "This" not "That"

like image 632
John Avatar asked Oct 28 '08 09:10

John


2 Answers

This is a little dirty, but it works:

(?<!\b(?:and| or|not))\b(?!(?:and|or|not)\b)

In plain English, this matches any word boundary not preceded by and not followed by "and", "or", or "not". It matches whole words only, e.g. the position after the word "sand" would not be a match just because it is preceded by "and".

The space in front of the "or" in the zero-width look-behind assertion is necessary to make it a fixed length look-behind. Try if that already solves your problem.

EDIT: Applied to the string "except the words AND, OR and NOT." as a global replace with single quotes, this returns:

'except' 'the' 'words' AND, OR and NOT.
like image 173
Tomalak Avatar answered Sep 25 '22 02:09

Tomalak


John,

The regex in your question is almost correct. The only problem is that you put the lookahead at the end of the regex instead of at the start. Also, you need to add word boundaries to force the regex to match whole words. Otherwise, it will match "nd" in "and", "r" in "or", etc, because "nd" and "r" are not in your negative lookahead.

(?i)\b(?!and|not|or)(?[a-z0-9]+)\b

like image 43
Jan Goyvaerts Avatar answered Sep 24 '22 02:09

Jan Goyvaerts