Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String contains across two pandas series

I have a series with some strings in a pandas dataframe. I would like to search for the existence of that string within an adjacent column.

In the below example I would like to search for if the string in 'choice' series is contained within the 'fruit' series, returning either true (1) or false (0) in a new column 'choice_match'.

Example DataFrame:

import pandas as pd
d = {'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'fruit': [
'apple, banana', 'apple', 'apple', 'pineapple', 'apple, pineapple',            'orange', 'apple, orange', 'orange', 'banana', 'apple, peach'],
'choice': ['orange', 'orange', 'apple', 'pineapple', 'apple', 'orange',  'orange', 'orange', 'banana', 'banana']}
df = pd.DataFrame(data=d)

Desired DataFrame:

import pandas as pd
d = {'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'fruit': [
'apple, banana', 'apple', 'apple', 'pineapple', 'apple, pineapple',   'orange', 'apple, orange', 'orange', 'banana', 'apple, peach'],
'choice': ['orange', 'orange', 'apple', 'pineapple', 'apple', 'orange',      'orange', 'orange', 'banana', 'banana'],
'choice_match': [0, 0, 1, 1, 1, 1, 1, 1, 1, 0]}
df = pd.DataFrame(data=d)
like image 982
shbfy Avatar asked Dec 02 '22 11:12

shbfy


1 Answers

Here is one way:

df['choice_match'] = df.apply(lambda row: row['choice'] in row['fruit'].split(','),\
                              axis=1).astype(int)

Explanation

  • df.apply with axis=1 cycles through each row and applies logic; it accepts anonymous lambda functions.
  • row['fruit'].split(',') creates a list from the fruit column. This is necessary so, for example, apple is not considered in pineapple.
  • astype(int) is necessary to convert Boolean values to integers for display purposes.
like image 146
jpp Avatar answered Jan 01 '23 16:01

jpp