Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if a string in a Pandas DataFrame column is in a list of strings

If I have a frame like this

frame = pd.DataFrame({
    "a": ["the cat is blue", "the sky is green", "the dog is black"]
})

and I want to check if any of those rows contain a certain word I just have to do this.

frame["b"] = (
   frame.a.str.contains("dog") |
   frame.a.str.contains("cat") |
   frame.a.str.contains("fish")
)

frame["b"] outputs:

0     True
1    False
2     True
Name: b, dtype: bool

If I decide to make a list:

mylist = ["dog", "cat", "fish"]

How would I check that the rows contain a certain word in the list?

like image 626
user2333196 Avatar asked Jul 31 '13 14:07

user2333196


People also ask

How do you check if a pandas column value is in a list?

Using pandas.Series. isin() function is used to check whether a column contains a list of multiple values. It returns a boolean Series showing each element in the Series matches an element in the passed sequence of values exactly.

How do I check if a string contains a series?

contains() function is used to test if pattern or regex is contained within a string of a Series or Index. The function returns boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.

How do you test if a string contains one of the substrings in a list in pandas?

To test if a string contains one of the substrings in a list in Python Pandas, we can use the str. contains method with a regex pattern to find all the matches.


3 Answers

frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})  frame                   a 0   the cat is blue 1  the sky is green 2  the dog is black 

The str.contains method accepts a regular expression pattern:

mylist = ['dog', 'cat', 'fish'] pattern = '|'.join(mylist)  pattern 'dog|cat|fish'  frame.a.str.contains(pattern) 0     True 1    False 2     True Name: a, dtype: bool 

Because regex patterns are supported, you can also embed flags:

frame = pd.DataFrame({'a' : ['Cat Mr. Nibbles is blue', 'the sky is green', 'the dog is black']})  frame                      a 0  Cat Mr. Nibbles is blue 1         the sky is green 2         the dog is black  pattern = '|'.join([f'(?i){animal}' for animal in mylist])  # python 3.6+  pattern '(?i)dog|(?i)cat|(?i)fish'   frame.a.str.contains(pattern) 0     True  # Because of the (?i) flag, 'Cat' is also matched to 'cat' 1    False 2     True 
like image 84
Andy Hayden Avatar answered Sep 18 '22 02:09

Andy Hayden


For list should work

print(frame[frame["a"].isin(mylist)])

See pandas.DataFrame.isin().

like image 23
Meloun Avatar answered Sep 17 '22 02:09

Meloun


After going through the comments of the accepted answer of extracting the string, this approach can also be tried.

frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})

frame
              a
0   the cat is blue
1  the sky is green
2  the dog is black

Let us create our list which will have strings that needs to be matched and extracted.

mylist = ['dog', 'cat', 'fish']
pattern = '|'.join(mylist)

Now let create a function which will be responsible to find and extract the substring.

import re
def pattern_searcher(search_str:str, search_list:str):

    search_obj = re.search(search_list, search_str)
    if search_obj :
        return_str = search_str[search_obj.start(): search_obj.end()]
    else:
        return_str = 'NA'
    return return_str

We will use this function with pandas.DataFrame.apply

frame['matched_str'] = frame['a'].apply(lambda x: pattern_searcher(search_str=x, search_list=pattern))

Result :

              a             matched_str
   0   the cat is blue         cat
   1  the sky is green         NA
   2  the dog is black         dog
like image 29
Aman Raparia Avatar answered Sep 17 '22 02:09

Aman Raparia