Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Updating a pandas dataframe with a new dataframe

Ye with Pandas Perspicacity,

I'm attempting to update a simple dataframe with another dataframe, and I'm running into trouble. I have a master dataframe that I would like to update:

Master_df:

          color     tastey
name                      
Apples      Red     Always
Avocados  Black  Sometimes
Anise     Brown        NaN

I have some new data that I would like to update this dataframe with. It might append new columns, add new rows, or update old values:

New_df:

          color   tastey   price
name                            
Bananas  Yellow      NaN  Medium
Apples      Red  Usually     Low
Berries     Red      NaN    High

I would like to merge these two dataframes such that an updated dataframe looks like:

Desired_df:

           color     tastey   price
name                               
Apples       Red     Always     Low
Avocados   Black  Sometimes     NaN
Anise      Brown        NaN     NaN
Bananas   Yellow        NaN  Medium
Berries      Red        NaN    High

I've played with many different commands, but I'm still struggling to:

  • Not lose the index values I join on.
  • Have common columns form one tastey column, rather than tastey_x and tastey_y.
  • Have new data from new rows.
  • Not have to hardcode the names of the new columns or new rows.

Finally, (while not shown in this example) I need to join on multiple columns. i.e. I need to use 3 columns to form my unique key. (Though I'm sure a solution to the above example would expand to that case.)

I sincerely appreciate any help or pointers! I hope the example above is clear.

Cheers,

A Pandas Pinhead.

edit1: I believe this question is different than previously asked questions because when I use combine_first I get this:

>>> Master_df.combine_first(New_df)

          color     tastey
name                      
Apples      Red     Always
Avocados  Black  Sometimes
Anise     Brown        NaN

Edit2: OK, I'm getting closer, but not there yet! I don't want to generate _x and _y columns. I want them to be one column, taking data from New_df when there's a conflict.

>>> updated = pd.merge(Master_df, New_df, how="outer", on=["name"])
       name color_x   tastey_x color_y tastey_y   price
0    Apples     Red     Always     Red  Usually     Low
1  Avocados   Black  Sometimes     NaN      NaN     NaN
2     Anise   Brown        NaN     NaN      NaN     NaN
3   Bananas     NaN        NaN  Yellow      NaN  Medium
4   Berries     NaN        NaN     Red      NaN    High

Edit3: Here's an image of what I'm trying to do. It's important that I don't have to hard code the column names ('A', 'B', etc.) other than the keys.

P.S. Code below.

import pandas as pd
import numpy as np

Master_data = {
    'name' : ['Apples', 'Avocados', 'Anise'],
    'color' : ['Red', 'Black', 'Brown'],
    'tastey' : ['Always', 'Sometimes', np.NaN]
}

Master_df = pd.DataFrame(Master_data, columns = ['name', 'color', 'tastey'])
Master_df = Master_df.set_index('name')

print(Master_df)

newData = {
    'name' : ['Bananas', 'Apples', 'Berries'],
    'color' : ['Yellow', 'Red', 'Red'],
    'tastey' : [np.NaN, 'Usually', np.NaN],
    'price' : ['Medium', 'Low', 'High']
}

New_df = pd.DataFrame(newData, columns = ['name', 'color', 'tastey', 'price'])
New_df = New_df.set_index('name')

print(New_df)

Desired_data = {
    'name' : ['Apples', 'Avocados', 'Anise', 'Bananas', 'Berries'],
    'color' : ['Red', 'Black', 'Brown', 'Yellow', 'Red'],
    'tastey' : ['Always', 'Sometimes', np.NaN, np.NaN, np.NaN],
    'price' : ['Low', np.NaN, np.NaN, 'Medium', 'High']
}

Desired_df = pd.DataFrame(Desired_data, columns = ['name', 'color', 'tastey', 'price'])
Desired_df = Desired_df.set_index('name')

print(Desired_df)
like image 281
Fraser Parlane Avatar asked Oct 17 '22 20:10

Fraser Parlane


1 Answers

You can use pd.DataFrame.update (an in-place operation) before pd.DataFrame.combine_first:

New_df.update(Master_df)

res = New_df.combine_first(Master_df)

#            color   price     tastey
# name                               
# Anise      Brown     NaN        NaN
# Apples       Red     Low     Always
# Avocados   Black     NaN  Sometimes
# Bananas   Yellow  Medium        NaN
# Berries      Red    High        NaN
like image 174
jpp Avatar answered Nov 15 '22 05:11

jpp