Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Value returned by function not getting updated in pandas dataframe

I have a fruits dataframe with columns: (Name, Color) and a sentence dataframe with columns: (Sentence).

fruits dataframe

          Name   Color
0        Apple     Red
1        Mango  Yellow
2       Grapes   Green
3   Strawberry    Pink

sentence dataframe

                      Sentence
0  I like Apple, Mango, Grapes
1            I like ripe Mango
2             Grapes are juicy
3           Oranges are citric

I need to compare each row of the fruits dataframe with every row of the sentence dataframe and if the fruit name appears exactly as such in the sentence, concatenate its color before the fruit name in the sentence.

This is what I have done using dataframe.apply():

import pandas as pd
import regex as re

# create fruit dataframe 
fruit_data = [['Apple', 'Red'], ['Mango', 'Yellow'], ['Grapes', 'Green']] 
fruit_df = pd.DataFrame(fruit_data, columns = ['Name', 'Color']) 
print(fruit_df)

# create sentence dataframe 
sentence = ['I like Apple, Mango, Grapes', 'I like ripe Mango', 'Grapes are juicy'] 
sentence_df = pd.DataFrame(sentence, columns = ['Sentence']) 
print(sentence_df)


def search(desc, name, color):

    if re.findall(r"\b" + name + r"\b", desc):
             
            # for loop is used because fruit can appear more than once in sentence
            all_indexes = []
            for match in re.finditer(r"\b" + name + r"\b", desc):
                     all_indexes.append(match.start())
            
            arr = list(desc)
            for idx in sorted(all_indexes, reverse=True):
                       arr.insert(idx, color + " ")

            new_desc = ''.join(arr)
            return new_desc 

def compare(name, color):
    sentence_df['Result'] = sentence_df['Sentence'].apply(lambda x: search(x, name, color))
    

fruit_df.apply(lambda x: compare(x['Name'], x['Color']), axis=1)
print ("The final result is: ")
print(sentence_df['Result'])

The result I am getting is:

                      Sentence     Result
0  I like Apple, Mango, Grapes       None
1            I like ripe Mango       None
2             Grapes are juicy       None
3           Oranges are citric       None

The expected result:

                      Sentence                                        Result
0  I like Apple, Mango, Grapes  I like Red Apple, Yellow Mango, Green Grapes
1            I like ripe Mango                      I like ripe Yellow Mango
2             Grapes are juicy                        Green Grapes are juicy
3           Oranges are citric       

I also tried iterating through the fruits_df using itertuples() but still the result is the same

for row in fruit_df.itertuples():
   result = sentence_df['Sentence'].apply(lambda x: search(x, getattr(row, 'Name'), getattr(row, 'Color')))
   print(result)

I can't understand why the value returned by search function is not stored in the new column. Is this the right way to do it or am I missing something?

like image 672
Animeartist Avatar asked Dec 31 '22 16:12

Animeartist


2 Answers

We can create a mapping series with the help of fruits dataframe, then use this mapping series with Series.replace to substitute the occurrences of fruit name in Sentence column with the corresponding replacement (Color + Fruit name) from the mapping series:

fruit = r'\b' + fruits['Name'] + r'\b'
fruit_replacement = list(fruits['Color'] + ' ' + fruits['Name'])

mapping = pd.Series(fruit_replacement, index=fruit)
sentence['Result'] = sentence['Sentence'].replace(mapping, regex=True)

>>> sentence
                      Sentence                                        Result
0  I like Apple, Mango, Grapes  I like Red Apple, Yellow Mango, Green Grapes
1            I like ripe Mango                      I like ripe Yellow Mango
2             Grapes are juicy                        Green Grapes are juicy
3           Oranges are citric                            Oranges are citric
like image 56
Shubham Sharma Avatar answered Jan 14 '23 02:01

Shubham Sharma


The problem is that you call compare for each row of Fruit but use the same input on each pass.

I have just added some debugging prints to the compare function to understand what happens:

def compare(name, color):
    print(name, color)
    sentence_df['Result'] = sentence_df['Sentence'].apply(lambda x: search(x, name, color))
    print(sentence_df['Result'])

and got:

Apple Red
0    I like Red Apple, Mango, Grapes
1                               None
2                               None
Name: Result, dtype: object
Mango Yellow
0    I like Apple, Yellow Mango, Grapes
1              I like ripe Yellow Mango
2                                  None
Name: Result, dtype: object
Grapes Green
0    I like Apple, Mango, Green Grapes
1                                 None
2               Green Grapes are juicy
Name: Result, dtype: object

So you successfully add the color when the fruit is present, but return None when it is not, and start from the original column on each pass, hence only keeping last one.

How to fix:

  1. First add a missing return desc in search, to avoid the None results

     def search(desc, name, color):
    
         if re.findall(r"\b" + name + r"\b", desc):
                 ...                 
                 new_desc = ''.join(arr)
                 return new_desc
         return desc
    
  2. Initialize df['Result'] before applying compare, and use it as its input:

     def compare(name, color):
         sentence_df['Result'] = sentence_df['Result'].apply(lambda x: search(x, name, color))
    
     sentence_df['Result'] = sentence_df['Sentence']
     fruit_df.apply(lambda x: compare(x['Name'], x['Color']), axis=1)
    

To finaly get as expected:

The final result is: 
0    I like Red Apple, Yellow Mango, Green Grapes
1                        I like ripe Yellow Mango
2                          Green Grapes are juicy
Name: Result, dtype: object
like image 20
Serge Ballesta Avatar answered Jan 14 '23 02:01

Serge Ballesta