Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regular expression: how to excluding superstrings?

Tags:

python

regex

I want to find all appearances of "not", but does not include the terms "not good" or "not bad".

For example, "not not good, not bad, not mine" will match the first and last "not".

How do I achieve that using the re package in python?

like image 989
CuriousMind Avatar asked Dec 28 '12 04:12

CuriousMind


1 Answers

Use negative look-ahead assertion:

\bnot\b(?!\s+(?:good|bad))

This will match not, except the case where good and bad are right after not in the string. I have added word boundary \b to make sure we are matching the word not, rather than not in nothing or knot.


\b is word boundary. It checks that the character in front is word character and the character after is not, and vice versa. Word character is normally English alphabet (a-z, A-Z), digit (0-9), and underscore (_), but there can be more depending on the regex flavor.

(?!pattern) is syntax for zero-width negative look-ahead - it will check that from the current point, it cannot find the pattern specified ahead in the input string.

\s denotes whitespace character (space (ASCII 32), new line \n, tab \t, etc. - check the documentation for more information). If you don't want to match so arbitrarily, just replace \s with (space).

The + in \s+ matches one or more instances of the preceding token, in this case, it is whitespace character.

(?:pattern) is non-capturing group. There is no need to capture good and bad, so I specify so for performance.

like image 129
nhahtdh Avatar answered Oct 06 '22 08:10

nhahtdh