join or merge with overwrite in pandas

Tags:

pandas

I want to perform a join/merge/append operation on a dataframe with datetime index.

Let's say I have df1 and I want to add df2 to it. df2 can have fewer or more columns, and overlapping indexes. For all rows where the indexes match, if df2 has the same column as df1, I want the values of df1 be overwritten with those from df2.

How can I obtain the desired result?

900

asked Mar 20 '12 13:03

saroele

2 Answers

How about: df2.combine_first(df1)?

In [33]: df2 Out[33]:                     A         B         C         D 2000-01-03  0.638998  1.277361  0.193649  0.345063 2000-01-04 -0.816756 -1.711666 -1.155077 -0.678726 2000-01-05  0.435507 -0.025162 -1.112890  0.324111 2000-01-06 -0.210756 -1.027164  0.036664  0.884715 2000-01-07 -0.821631 -0.700394 -0.706505  1.193341 2000-01-10  1.015447 -0.909930  0.027548  0.258471 2000-01-11 -0.497239 -0.979071 -0.461560  0.447598  In [34]: df1 Out[34]:                     A         B         C 2000-01-03  2.288863  0.188175 -0.040928 2000-01-04  0.159107 -0.666861 -0.551628 2000-01-05 -0.356838 -0.231036 -1.211446 2000-01-06 -0.866475  1.113018 -0.001483 2000-01-07  0.303269  0.021034  0.471715 2000-01-10  1.149815  0.686696 -1.230991 2000-01-11 -1.296118 -0.172950 -0.603887 2000-01-12 -1.034574 -0.523238  0.626968 2000-01-13 -0.193280  1.857499 -0.046383 2000-01-14 -1.043492 -0.820525  0.868685  In [35]: df2.comb df2.combine        df2.combineAdd     df2.combine_first  df2.combineMult      In [35]: df2.combine_first(df1) Out[35]:                     A         B         C         D 2000-01-03  0.638998  1.277361  0.193649  0.345063 2000-01-04 -0.816756 -1.711666 -1.155077 -0.678726 2000-01-05  0.435507 -0.025162 -1.112890  0.324111 2000-01-06 -0.210756 -1.027164  0.036664  0.884715 2000-01-07 -0.821631 -0.700394 -0.706505  1.193341 2000-01-10  1.015447 -0.909930  0.027548  0.258471 2000-01-11 -0.497239 -0.979071 -0.461560  0.447598 2000-01-12 -1.034574 -0.523238  0.626968       NaN 2000-01-13 -0.193280  1.857499 -0.046383       NaN 2000-01-14 -1.043492 -0.820525  0.868685       NaN

Note that it takes the values from df1 for indices that do not overlap with df2. If this doesn't do exactly what you want I would be willing to improve this function / add options to it.

answered Sep 26 '22 08:09

Wes McKinney

For a merge like this, the update method of a DataFrame is useful.

Taking the examples from the documentation:

import pandas as pd import numpy as np  df1 = pd.DataFrame([[np.nan, 3., 5.], [-4.6, 2.1, np.nan],                    [np.nan, 7., np.nan]]) df2 = pd.DataFrame([[-42.6, np.nan, -8.2], [-5., 1.6, 4]],                    index=[1, 2])

Data before the update:

>>> df1      0    1    2 0  NaN  3.0  5.0 1 -4.6  2.1  NaN 2  NaN  7.0  NaN >>> >>> df2       0    1    2 1 -42.6  NaN -8.2 2  -5.0  1.6  4.0

Let's update df1 with data from df2:

df1.update(df2)

Data after the update:

>>> df1       0    1    2 0   NaN  3.0  5.0 1 -42.6  2.1 -8.2 2  -5.0  1.6  4.0

Remarks:

It's important to notice that this is an operation "in place", modifying the DataFrame that calls update.
Also note that non NaN values in df1 are not overwritten with NaN values in df2

answered Sep 22 '22 08:09

Nicolás Ozimica

Related questions
                            
                                Python issue:Unable to find vcvarsall.bat [duplicate]
                            
                                python seaborn to reset back to the matplotlib
                            
                                Django - Rollback save with transaction atomic
                            
                                Automatically Generating Documentation for All Python Package Contents
                            
                                How do setuptools, distribute, and pip relate to one another?
                            
                                Pythonic way to iterate over a collections.Counter() instance in descending order?
                            
                                What is the difference between pandas agg and apply function?
                            
                                Python nose framework: How to stop execution upon first failure
                            
                                How did Python implement the built-in function pow()?
                            
                                Xpath like query for nested python dictionaries
                            
                                Replace string/value in entire DataFrame
                            
                                Attaching a process with pdb
                            
                                Automatically remove *.pyc files and otherwise-empty directories when I check out a new branch
                            
                                Sorting by arbitrary lambda
                            
                                How do I use TensorFlow GPU?
                            
                                Python ImportError cannot import urandom Since Ubuntu 12.04 upgrade
                            
                                how to get derived class name from base class
                            
                                Size of data type using NumPy
                            
                                What does a colon and comma stand in a python list?
                            
                                lxml etree xmlparser remove unwanted namespace

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With