String contains across two pandas series

Question

I have a series with some strings in a pandas dataframe. I would like to search for the existence of that string within an adjacent column.

In the below example I would like to search for if the string in 'choice' series is contained within the 'fruit' series, returning either true (1) or false (0) in a new column 'choice_match'.

Example DataFrame:

import pandas as pd
d = {'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'fruit': [
'apple, banana', 'apple', 'apple', 'pineapple', 'apple, pineapple',            'orange', 'apple, orange', 'orange', 'banana', 'apple, peach'],
'choice': ['orange', 'orange', 'apple', 'pineapple', 'apple', 'orange',  'orange', 'orange', 'banana', 'banana']}
df = pd.DataFrame(data=d)

Desired DataFrame:

import pandas as pd
d = {'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'fruit': [
'apple, banana', 'apple', 'apple', 'pineapple', 'apple, pineapple',   'orange', 'apple, orange', 'orange', 'banana', 'apple, peach'],
'choice': ['orange', 'orange', 'apple', 'pineapple', 'apple', 'orange',      'orange', 'orange', 'banana', 'banana'],
'choice_match': [0, 0, 1, 1, 1, 1, 1, 1, 1, 0]}
df = pd.DataFrame(data=d)

jpp · Accepted Answer

Here is one way:

df['choice_match'] = df.apply(lambda row: row['choice'] in row['fruit'].split(','),\
                              axis=1).astype(int)

Explanation

df.apply with axis=1 cycles through each row and applies logic; it accepts anonymous lambda functions.
row['fruit'].split(',') creates a list from the fruit column. This is necessary so, for example, apple is not considered in pineapple.
astype(int) is necessary to convert Boolean values to integers for display purposes.

String contains across two pandas series

Tags:

python

string

pandas

dataframe

shbfy

1 Answers

jpp

Recent Activity

Donate For Us

String contains across two pandas series

Tags:

python

string

pandas

dataframe

shbfy

1 Answers

jpp

Related questions

Recent Activity

Donate For Us