Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keeping both DataFrame indexes on merge

I'm sure this question must have already been answered somewhere but I couldn't find an answer that suits my case.
I have 2 pandas DataFrames

a = pd.DataFrame({'A1':[1,2,3], 'A2':[2,4,6]}, index=['a','b','c'])
b = pd.DataFrame({'A1':[3,5,6], 'A2':[3,6,9]}, index=['a','c','d'])

I want to merge them in order to obtain something like

result = pd.DataFrame({
    'A1' : [3,2,5,6],
    'A2' : [3,4,6,9]
}, index=['a','b','c','d'])

Basically, I want a new df with the union of both indexes. Where indexes match, the value in each column should be updated with the one from the second df (in this case b). Where there is no match the value is taken from the starting df (in this case a).
I tried with merge(), join() and concat() but I could not manage to obtain this result.

like image 587
Mirko Avatar asked Sep 16 '25 13:09

Mirko


1 Answers

You could use pd.concat to create one dataframe (b being the first one as it is b that has a priority for it's values to be kept over a), and then drop the duplicated index:

Using your sample data:

c = pd.concat([b,a])
c[~c.index.duplicated()].sort_index()

prints:

   A1  A2
a   3   3
b   2   4
c   5   6
d   6   9
like image 136
sophocles Avatar answered Sep 18 '25 08:09

sophocles