Reversal of string.contains In python, pandas

Question

I have something like this in my code:

df2 = df[df['A'].str.contains("Hello|World")]

However, I want all the rows that don't contain either of Hello or World. How do I most efficiently reverse this?

DSM · Accepted Answer

You can use the tilde ~ to flip the bool values:

>>> df = pd.DataFrame({"A": ["Hello", "this", "World", "apple"]}) >>> df.A.str.contains("Hello|World") 0     True 1    False 2     True 3    False Name: A, dtype: bool >>> ~df.A.str.contains("Hello|World") 0    False 1     True 2    False 3     True Name: A, dtype: bool >>> df[~df.A.str.contains("Hello|World")]        A 1   this 3  apple  [2 rows x 1 columns]

Whether this is the most efficient way, I don't know; you'd have to time it against your other options. Sometimes using a regular expression is slower than things like df[~(df.A.str.contains("Hello") | (df.A.str.contains("World")))], but I'm bad at guessing where the crossovers are.

Martijn Pieters · Answer

The .contains() method uses regular expressions, so you can use a negative lookahead test to determine that a word is not contained:

df['A'].str.contains(r'^(?:(?!Hello|World).)*$')

This expression matches any string where the words Hello and World are not found anywhere in the string.

Demo:

>>> df = pd.DataFrame({"A": ["Hello", "this", "World", "apple"]})
>>> df['A'].str.contains(r'^(?:(?!Hello|World).)*$')
0    False
1     True
2    False
3     True
Name: A, dtype: bool
>>> df[df['A'].str.contains(r'^(?:(?!Hello|World).)*$')]
       A
1   this
3  apple

Reversal of string.contains In python, pandas

Tags:

python

string

pandas

csv

python-2.7

Xodarap777

2 Answers

DSM

Martijn Pieters

Recent Activity

Donate For Us

Reversal of string.contains In python, pandas

Tags:

python

string

pandas

csv

python-2.7

Xodarap777

2 Answers

DSM

Martijn Pieters

Related questions

Recent Activity

Donate For Us