I have something like this in my code:
df2 = df[df['A'].str.contains("Hello|World")]
However, I want all the rows that don't contain either of Hello or World. How do I most efficiently reverse this?
You can use the tilde ~
to flip the bool values:
>>> df = pd.DataFrame({"A": ["Hello", "this", "World", "apple"]}) >>> df.A.str.contains("Hello|World") 0 True 1 False 2 True 3 False Name: A, dtype: bool >>> ~df.A.str.contains("Hello|World") 0 False 1 True 2 False 3 True Name: A, dtype: bool >>> df[~df.A.str.contains("Hello|World")] A 1 this 3 apple [2 rows x 1 columns]
Whether this is the most efficient way, I don't know; you'd have to time it against your other options. Sometimes using a regular expression is slower than things like df[~(df.A.str.contains("Hello") | (df.A.str.contains("World")))]
, but I'm bad at guessing where the crossovers are.
The .contains()
method uses regular expressions, so you can use a negative lookahead test to determine that a word is not contained:
df['A'].str.contains(r'^(?:(?!Hello|World).)*$')
This expression matches any string where the words Hello
and World
are not found anywhere in the string.
Demo:
>>> df = pd.DataFrame({"A": ["Hello", "this", "World", "apple"]})
>>> df['A'].str.contains(r'^(?:(?!Hello|World).)*$')
0 False
1 True
2 False
3 True
Name: A, dtype: bool
>>> df[df['A'].str.contains(r'^(?:(?!Hello|World).)*$')]
A
1 this
3 apple
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With