Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match on a word but only if it isn't preceded or followed by another specific word

Need a Regex string to work with custom Exchange DLP "Sensitive Information" type.

i.e match on Smith but not if John Smith or Smith John

(?i)(?<!John\s)Smith appears to work for "John Smith" though I'm not convinced it is 100% efficient.

(?i)(Smith.*\s(?!John)) appears to work for "Smith John" but not if followed by a space or new line.

Have tried the following to combine them into one string but it doesn't seem to work at all.

(?i)(?<!John\s)Smith |(?i)(Smith.*\s(?!John))

(?i)(?<!John\s)Smith.*\s(?!John)

What schoolboy error am I making?

like image 569
Glen Liddell Avatar asked Nov 19 '25 07:11

Glen Liddell


1 Answers

The (?i)(?<!John\s)Smith |(?i)(Smith.*\s(?!John)) pattern is matching Smith that does not have John+ 1 whitespace before it, OR a Smith that is followed with any amount of chars followed with a whitespace that is not immediately followed with John. Thus, it matches Smith in a lot of positions.

The (?i)(?<!John\s)Smith.*\s(?!John) pattern grabs a Smith that is not immediately preceded with John + whitespace, and all text up to the final whitespace that is not immediately followed with John.

Make sure the \s pattern is inside the lookahead:

(?i)(?<!John\s)Smith(?!\s+John)

See the regex demo

Details

  • (?i) - case insensitive inline modifier
  • (?<!John\s) - a location that is not immediately preceded with Hohn and a whitespace char
  • Smith - a literal substring
  • (?!\s+John) - the Smith substring should not be immediately followed with 1+ whitespaces (or if you use \s*, with 0+ whitespaces) and the substring John.
like image 185
Wiktor Stribiżew Avatar answered Nov 21 '25 21:11

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!