I am trying to exclude records which have the word "owner" somewhere preceding the word "dog"
Here is current regex:
\b(?<!owner\s)\w+\sdog\b
This works for a single unknown word ('owner has dog' is excluded but 'owner has a dog' is included)), however, I am unable to capture multiple words which retain its negative look behind across all words between "owner" and "dog".
Many Thanks
You can use the following regular expression to verify that the string contains the word "dog" that is not preceded by the word "owner".
^(?:(?!\bowner\b).)*\bdog\b
Start your engine! <¯\(ツ)/¯> Python code
Python's regex engine performs the following operations.
^ : anchor match to beginning of string
(?: : begin a non-capture group
(?!\bowner\b) : use a negative lookahead to assert that the current
position in the string is not followed by "owner"
. : match a character
) : end non-capture group
* : execute non-capture group 0+ times
\bdog\b : match 'dog' surrounded by word boundaries
The technique of matching a sequence of individual characters that do not begin an outlawed word is called Tempered Greedy Token Solution.
Another option could be to start matching any char except o
or a newline.
Then in case you encounter an o, assert that it is not the word owner followed by matching any char except an o or a newline and optionally repeat that process until you match the word dog.
^[^o\r\n]*(?:(?!\bowner\b)o[^o\r\n]*)*\bdog\b
Explanation
^
Start of string[^o\r\n]*
Match 0+ times any char except o or a newline(?:
Non capture group
(?!\bowner\b)
Negative lookahead, assert not the word owner directly to the righto[^o\r\n]*
Match o followed by 0+ times any char except o or newline)*
Close non capturing group and repeat 0+ times\bdog\b
Match the word dogRegex demo | Python demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With