Ye with Pandas Perspicacity,
I'm attempting to update a simple dataframe with another dataframe, and I'm running into trouble. I have a master dataframe that I would like to update:
Master_df:
color tastey
name
Apples Red Always
Avocados Black Sometimes
Anise Brown NaN
I have some new data that I would like to update this dataframe with. It might append new columns, add new rows, or update old values:
New_df:
color tastey price
name
Bananas Yellow NaN Medium
Apples Red Usually Low
Berries Red NaN High
I would like to merge these two dataframes such that an updated dataframe looks like:
Desired_df:
color tastey price
name
Apples Red Always Low
Avocados Black Sometimes NaN
Anise Brown NaN NaN
Bananas Yellow NaN Medium
Berries Red NaN High
I've played with many different commands, but I'm still struggling to:
Finally, (while not shown in this example) I need to join on multiple columns. i.e. I need to use 3 columns to form my unique key. (Though I'm sure a solution to the above example would expand to that case.)
I sincerely appreciate any help or pointers! I hope the example above is clear.
Cheers,
A Pandas Pinhead.
edit1: I believe this question is different than previously asked questions because when I use combine_first
I get this:
>>> Master_df.combine_first(New_df)
color tastey
name
Apples Red Always
Avocados Black Sometimes
Anise Brown NaN
Edit2: OK, I'm getting closer, but not there yet! I don't want to generate _x
and _y
columns. I want them to be one column, taking data from New_df
when there's a conflict.
>>> updated = pd.merge(Master_df, New_df, how="outer", on=["name"])
name color_x tastey_x color_y tastey_y price
0 Apples Red Always Red Usually Low
1 Avocados Black Sometimes NaN NaN NaN
2 Anise Brown NaN NaN NaN NaN
3 Bananas NaN NaN Yellow NaN Medium
4 Berries NaN NaN Red NaN High
Edit3: Here's an image of what I'm trying to do. It's important that I don't have to hard code the column names ('A', 'B', etc.) other than the keys.
P.S. Code below.
import pandas as pd
import numpy as np
Master_data = {
'name' : ['Apples', 'Avocados', 'Anise'],
'color' : ['Red', 'Black', 'Brown'],
'tastey' : ['Always', 'Sometimes', np.NaN]
}
Master_df = pd.DataFrame(Master_data, columns = ['name', 'color', 'tastey'])
Master_df = Master_df.set_index('name')
print(Master_df)
newData = {
'name' : ['Bananas', 'Apples', 'Berries'],
'color' : ['Yellow', 'Red', 'Red'],
'tastey' : [np.NaN, 'Usually', np.NaN],
'price' : ['Medium', 'Low', 'High']
}
New_df = pd.DataFrame(newData, columns = ['name', 'color', 'tastey', 'price'])
New_df = New_df.set_index('name')
print(New_df)
Desired_data = {
'name' : ['Apples', 'Avocados', 'Anise', 'Bananas', 'Berries'],
'color' : ['Red', 'Black', 'Brown', 'Yellow', 'Red'],
'tastey' : ['Always', 'Sometimes', np.NaN, np.NaN, np.NaN],
'price' : ['Low', np.NaN, np.NaN, 'Medium', 'High']
}
Desired_df = pd.DataFrame(Desired_data, columns = ['name', 'color', 'tastey', 'price'])
Desired_df = Desired_df.set_index('name')
print(Desired_df)
You can use pd.DataFrame.update
(an in-place operation) before pd.DataFrame.combine_first
:
New_df.update(Master_df)
res = New_df.combine_first(Master_df)
# color price tastey
# name
# Anise Brown NaN NaN
# Apples Red Low Always
# Avocados Black NaN Sometimes
# Bananas Yellow Medium NaN
# Berries Red High NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With