Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to capture unknown number of words in between a negative lookbehind catch term regex?

I am trying to exclude records which have the word "owner" somewhere preceding the word "dog"

  • the owner has a dog (exclude)
  • the owner has a black and brown dog (exclude)
  • John has a dog (include)
  • John has a black and brown dog (include)

Here is current regex:

\b(?<!owner\s)\w+\sdog\b

This works for a single unknown word ('owner has dog' is excluded but 'owner has a dog' is included)), however, I am unable to capture multiple words which retain its negative look behind across all words between "owner" and "dog".

Many Thanks

like image 460
Sean Farrell Avatar asked Sep 02 '25 09:09

Sean Farrell


2 Answers

You can use the following regular expression to verify that the string contains the word "dog" that is not preceded by the word "owner".

^(?:(?!\bowner\b).)*\bdog\b

Start your engine! <¯\(ツ)> Python code

Python's regex engine performs the following operations.

^                : anchor match to beginning of string
(?:              : begin a non-capture group
  (?!\bowner\b)  : use a negative lookahead to assert that the current
                   position in the string is not followed by "owner"
  .              : match a character
)                : end non-capture group
*                : execute non-capture group 0+ times
\bdog\b          : match 'dog' surrounded by word boundaries

The technique of matching a sequence of individual characters that do not begin an outlawed word is called Tempered Greedy Token Solution.

like image 186
Cary Swoveland Avatar answered Sep 04 '25 21:09

Cary Swoveland


Another option could be to start matching any char except o or a newline.

Then in case you encounter an o, assert that it is not the word owner followed by matching any char except an o or a newline and optionally repeat that process until you match the word dog.

 ^[^o\r\n]*(?:(?!\bowner\b)o[^o\r\n]*)*\bdog\b

Explanation

  • ^ Start of string
  • [^o\r\n]* Match 0+ times any char except o or a newline
  • (?: Non capture group
    • (?!\bowner\b) Negative lookahead, assert not the word owner directly to the right
    • o[^o\r\n]* Match o followed by 0+ times any char except o or newline
  • )* Close non capturing group and repeat 0+ times
  • \bdog\b Match the word dog

Regex demo | Python demo

like image 33
The fourth bird Avatar answered Sep 04 '25 22:09

The fourth bird