Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas dataframe str.contains() AND operation

I have a df (Pandas Dataframe) with three rows:

some_col_name
"apple is delicious"
"banana is delicious"
"apple and banana both are delicious"

The function df.col_name.str.contains("apple|banana") will catch all of the rows:

"apple is delicious",
"banana is delicious",
"apple and banana both are delicious".

How do I apply AND operator to the str.contains() method, so that it only grabs strings that contain BOTH "apple" & "banana"?

"apple and banana both are delicious"

I'd like to grab strings that contains 10-20 different words (grape, watermelon, berry, orange, ..., etc.)

like image 578
aerin Avatar asked May 03 '16 18:05

aerin


People also ask

Is there a Contains () function in Python?

Python string __contains__() is an instance method and returns boolean value True or False depending on whether the string object contains the specified string object or not. Note that the Python string contains() method is case sensitive.

How do I use contains in Pandas Python?

contains() function is used to test if pattern or regex is contained within a string of a Series or Index. The function returns boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index. Parameter : pat : Character sequence or regular expression.

How do I check if a string contains a substring panda?

Using “contains” to Find a Substring in a Pandas DataFrame The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not. A basic application of contains should look like Series. str. contains("substring") .


4 Answers

You can do that as follows:

df[(df['col_name'].str.contains('apple')) & (df['col_name'].str.contains('banana'))]
like image 132
flyingmeatball Avatar answered Oct 13 '22 04:10

flyingmeatball


You can also do it in regex expression style:

df[df['col_name'].str.contains(r'^(?=.*apple)(?=.*banana)')]

You can then, build your list of words into a regex string like so:

base = r'^{}'
expr = '(?=.*{})'
words = ['apple', 'banana', 'cat']  # example
base.format(''.join(expr.format(w) for w in words))

will render:

'^(?=.*apple)(?=.*banana)(?=.*cat)'

Then you can do your stuff dynamically.

like image 21
Anzel Avatar answered Oct 13 '22 04:10

Anzel


df = pd.DataFrame({'col': ["apple is delicious",
                           "banana is delicious",
                           "apple and banana both are delicious"]})

targets = ['apple', 'banana']

# Any word from `targets` are present in sentence.
>>> df.col.apply(lambda sentence: any(word in sentence for word in targets))
0    True
1    True
2    True
Name: col, dtype: bool

# All words from `targets` are present in sentence.
>>> df.col.apply(lambda sentence: all(word in sentence for word in targets))
0    False
1    False
2     True
Name: col, dtype: bool
like image 35
Alexander Avatar answered Oct 13 '22 04:10

Alexander


This works

df.col.str.contains(r'(?=.*apple)(?=.*banana)',regex=True)
like image 12
Charan Reddy Avatar answered Oct 13 '22 03:10

Charan Reddy