I have a fruits
dataframe with columns: (Name, Color)
and a sentence
dataframe with columns: (Sentence)
.
fruits dataframe
Name Color
0 Apple Red
1 Mango Yellow
2 Grapes Green
3 Strawberry Pink
sentence dataframe
Sentence
0 I like Apple, Mango, Grapes
1 I like ripe Mango
2 Grapes are juicy
3 Oranges are citric
I need to compare each row of the fruits dataframe with every row of the sentence dataframe and if the fruit name appears exactly as such in the sentence, concatenate its color before the fruit name in the sentence.
This is what I have done using dataframe.apply()
:
import pandas as pd
import regex as re
# create fruit dataframe
fruit_data = [['Apple', 'Red'], ['Mango', 'Yellow'], ['Grapes', 'Green']]
fruit_df = pd.DataFrame(fruit_data, columns = ['Name', 'Color'])
print(fruit_df)
# create sentence dataframe
sentence = ['I like Apple, Mango, Grapes', 'I like ripe Mango', 'Grapes are juicy']
sentence_df = pd.DataFrame(sentence, columns = ['Sentence'])
print(sentence_df)
def search(desc, name, color):
if re.findall(r"\b" + name + r"\b", desc):
# for loop is used because fruit can appear more than once in sentence
all_indexes = []
for match in re.finditer(r"\b" + name + r"\b", desc):
all_indexes.append(match.start())
arr = list(desc)
for idx in sorted(all_indexes, reverse=True):
arr.insert(idx, color + " ")
new_desc = ''.join(arr)
return new_desc
def compare(name, color):
sentence_df['Result'] = sentence_df['Sentence'].apply(lambda x: search(x, name, color))
fruit_df.apply(lambda x: compare(x['Name'], x['Color']), axis=1)
print ("The final result is: ")
print(sentence_df['Result'])
The result I am getting is:
Sentence Result
0 I like Apple, Mango, Grapes None
1 I like ripe Mango None
2 Grapes are juicy None
3 Oranges are citric None
The expected result:
Sentence Result
0 I like Apple, Mango, Grapes I like Red Apple, Yellow Mango, Green Grapes
1 I like ripe Mango I like ripe Yellow Mango
2 Grapes are juicy Green Grapes are juicy
3 Oranges are citric
I also tried iterating through the fruits_df
using itertuples()
but still the result is the same
for row in fruit_df.itertuples():
result = sentence_df['Sentence'].apply(lambda x: search(x, getattr(row, 'Name'), getattr(row, 'Color')))
print(result)
I can't understand why the value returned by search
function is not stored in the new column. Is this the right way to do it or am I missing something?
We can create a mapping
series with the help of fruits
dataframe, then use this mapping
series with Series.replace
to substitute the occurrences of fruit name in Sentence
column with the corresponding replacement (Color
+ Fruit name
) from the mapping
series:
fruit = r'\b' + fruits['Name'] + r'\b'
fruit_replacement = list(fruits['Color'] + ' ' + fruits['Name'])
mapping = pd.Series(fruit_replacement, index=fruit)
sentence['Result'] = sentence['Sentence'].replace(mapping, regex=True)
>>> sentence
Sentence Result
0 I like Apple, Mango, Grapes I like Red Apple, Yellow Mango, Green Grapes
1 I like ripe Mango I like ripe Yellow Mango
2 Grapes are juicy Green Grapes are juicy
3 Oranges are citric Oranges are citric
The problem is that you call compare
for each row of Fruit
but use the same input on each pass.
I have just added some debugging prints to the compare
function to understand what happens:
def compare(name, color):
print(name, color)
sentence_df['Result'] = sentence_df['Sentence'].apply(lambda x: search(x, name, color))
print(sentence_df['Result'])
and got:
Apple Red
0 I like Red Apple, Mango, Grapes
1 None
2 None
Name: Result, dtype: object
Mango Yellow
0 I like Apple, Yellow Mango, Grapes
1 I like ripe Yellow Mango
2 None
Name: Result, dtype: object
Grapes Green
0 I like Apple, Mango, Green Grapes
1 None
2 Green Grapes are juicy
Name: Result, dtype: object
So you successfully add the color when the fruit is present, but return None when it is not, and start from the original column on each pass, hence only keeping last one.
How to fix:
First add a missing return desc
in search, to avoid the None
results
def search(desc, name, color):
if re.findall(r"\b" + name + r"\b", desc):
...
new_desc = ''.join(arr)
return new_desc
return desc
Initialize df['Result']
before applying compare, and use it as its input:
def compare(name, color):
sentence_df['Result'] = sentence_df['Result'].apply(lambda x: search(x, name, color))
sentence_df['Result'] = sentence_df['Sentence']
fruit_df.apply(lambda x: compare(x['Name'], x['Color']), axis=1)
To finaly get as expected:
The final result is:
0 I like Red Apple, Yellow Mango, Green Grapes
1 I like ripe Yellow Mango
2 Green Grapes are juicy
Name: Result, dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With