I want to find all appearances of "not", but does not include the terms "not good" or "not bad".
For example, "not not good, not bad, not mine" will match the first and last "not".
How do I achieve that using the re package in python?
Use negative look-ahead assertion:
\bnot\b(?!\s+(?:good|bad))
This will match not
, except the case where good
and bad
are right after not
in the string. I have added word boundary \b
to make sure we are matching the word not
, rather than not
in nothing
or knot
.
\b
is word boundary. It checks that the character in front is word character and the character after is not, and vice versa. Word character is normally English alphabet (a-z, A-Z), digit (0-9), and underscore (_), but there can be more depending on the regex flavor.
(?!pattern)
is syntax for zero-width negative look-ahead - it will check that from the current point, it cannot find the pattern
specified ahead in the input string.
\s
denotes whitespace character (space (ASCII 32), new line \n
, tab \t
, etc. - check the documentation for more information). If you don't want to match so arbitrarily, just replace \s
with (space).
The +
in \s+
matches one or more instances of the preceding token, in this case, it is whitespace character.
(?:pattern)
is non-capturing group. There is no need to capture good
and bad
, so I specify so for performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With