I have a series with some strings in a pandas dataframe. I would like to search for the existence of that string within an adjacent column.
In the below example I would like to search for if the string in 'choice' series is contained within the 'fruit' series, returning either true (1) or false (0) in a new column 'choice_match'.
Example DataFrame:
import pandas as pd
d = {'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'fruit': [
'apple, banana', 'apple', 'apple', 'pineapple', 'apple, pineapple', 'orange', 'apple, orange', 'orange', 'banana', 'apple, peach'],
'choice': ['orange', 'orange', 'apple', 'pineapple', 'apple', 'orange', 'orange', 'orange', 'banana', 'banana']}
df = pd.DataFrame(data=d)
Desired DataFrame:
import pandas as pd
d = {'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'fruit': [
'apple, banana', 'apple', 'apple', 'pineapple', 'apple, pineapple', 'orange', 'apple, orange', 'orange', 'banana', 'apple, peach'],
'choice': ['orange', 'orange', 'apple', 'pineapple', 'apple', 'orange', 'orange', 'orange', 'banana', 'banana'],
'choice_match': [0, 0, 1, 1, 1, 1, 1, 1, 1, 0]}
df = pd.DataFrame(data=d)
Here is one way:
df['choice_match'] = df.apply(lambda row: row['choice'] in row['fruit'].split(','),\
axis=1).astype(int)
Explanation
df.apply
with axis=1
cycles through each row and applies logic; it accepts anonymous lambda
functions.row['fruit'].split(',')
creates a list from the fruit
column. This is necessary so, for example, apple
is not considered in pineapple
.astype(int)
is necessary to convert Boolean values to integers for display purposes.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With