Pandas update and add rows one dataframe with key column in another dataframe

Tags:

python

pandas

I have 2 data frames with identical columns. Column 'key' will have unique values.

Data frame 1:-

Data frame 2:-

I would like to update rows in Dataframe-1 with values in Dataframe -2 if key in Dataframe -2 matches with Dataframe -1. Also if key is new then add entire row from Dataframe-2 to Dataframe-1.

Final Output Dataframe is like this with same columns.

A B key C
4 5 k1  2   --> update
1 2 k2  3   --> no changes
2 3 k3  5   --> no changes
2 3 k4  5   --> new row

I have tried with below code. I need only 4 columns 'A', 'B','Key','C' without any suffixes after merge.

df3 = df1.merge(df2,on='key',how='outer')
>>> df3
   A_x  B_x key  C_x  A_y  B_y  C_y
0  0.0  1.0  k1  2.0  4.0  5.0  2.0
1  1.0  2.0  k2  3.0  1.0  2.0  3.0
2  2.0  3.0  k3  5.0  NaN  NaN  NaN
3  NaN  NaN  k4  NaN  2.0  3.0  5.0

360

asked Dec 16 '17 10:12

Chinmay Hegde

3 Answers

It seems like you're looking for combine_first.

a = df2.set_index('key')
b = df1.set_index('key')

(a.combine_first(b)
  .reset_index()
  .reindex(columns=df1.columns))

     A    B key    C
0  4.0  5.0  k1  2.0
1  1.0  2.0  k2  3.0
2  2.0  3.0  k3  5.0
3  2.0  3.0  k4  5.0

answered Oct 08 '22 00:10

cs95

try this:

df1 = {'key': ['k1', 'k2', 'k3'], 'A':[0,1,2], 'B': [1,2,3], 'C':[2,3,5]}
df1 = pd.DataFrame(data=df1)
print (df1)
df2 = {'key': ['k1', 'k2', 'k4'], 'A':[4,1,2], 'B': [5,2,3], 'C':[2,3,5]}
df2 = pd.DataFrame(data=df2)
print (df2)
df3 = df1.append(df2)
df3.drop_duplicates(subset=['key'], keep='last', inplace=True)
df3 = df3.sort_values(by=['key'], ascending=True)
print (df3)

answered Oct 07 '22 23:10

Joe

First, you need to indicate index columns:

df1.set_index('key', inplace=True)
df2.set_index('key', inplace=True)

Then, combine the dataframes to get all the index keys in place (this will not update the df1 values! See: combine_first manual):

df1 = df1.combine_first(df2)

Last step is updating the values in df1 with df2 and resetting the index

df1.update(df2)
df1.reset_index(inplace=True)

answered Oct 07 '22 22:10

Nimov

Related questions
                            
                                How to display Greek letters in Axis labels when plotting with Altair and Jupyter?
                            
                                Sensitivity Analysis using PyFMI - FMU in for-loop
                            
                                What is the fastest and generic way to flatten deeply nested JSON to get a Dataframe?
                            
                                pandas calculate mean of column that has lists instead of single value
                            
                                Is there a elegant way to only keep top[2~3] value for each row in a matrix?
                            
                                Checking strength of password with regex
                            
                                Django rest framework timefield input format
                            
                                How to turn off command line logging in Selenium using Chrome in Python
                            
                                How to convert string from df.to_string() back to DataFrame [duplicate]
                            
                                How to load more content in django application?
                            
                                How to ignore unpacked parts of a tuple as argument of a lambda?
                            
                                Pandas DataFrame: test if index name is set
                            
                                matplotlib color line by "value" [duplicate]
                            
                                Cannot load CLoader with pyyaml
                            
                                How to add a toctree entry?
                            
                                Triplet model for image retrieval from the Keras pretrained network
                            
                                PyGTK Window always on top of all 'always on top` windows
                            
                                Set default content_type for Flask test client
                            
                                Django cannot determine queryset for chaining one-to-many with one-to-one relationship
                            
                                Subparsers.add_parser TypeError: __init__() got an unexpected keyword argument 'prog'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With