Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge two dataframes with multi-index

I have seen several posts about this but I could not get my head around how merge, join and concat would deal with this. How can I merge two dataframes to find matching indexes?

in:

import pandas as pd
import numpy as np
row_x1 = ['a1','b1','c1']
row_x2 = ['a2','b2','c2']
row_x3 = ['a3','b3','c3']
row_x4 = ['a4','b4','c4']
index_arrays = [np.array(['first', 'first', 'second', 'second']), np.array(['one','two','one','two'])]
df1 = pd.DataFrame([row_x1,row_x2,row_x3,row_x4], columns=list('ABC'), index=index_arrays)
print(df1)

out:

             A   B   C
first  one  a1  b1  c1
       two  a2  b2  c2
second one  a3  b3  c3
       two  a4  b4  c4

in:

row_y1 = ['d1','e1','f1']
row_y2 = ['d2','e2','f2']
df2 = pd.DataFrame([row_y1,row_y2], columns=list('DEF'), index=['first','second'])
print(df2)

out

         D   E   F
first   d1  e1  f1
second  d2  e2  f2

in other words, how can I merge them to achieve df3 (as follows)?

in

row_x1 = ['a1','b1','c1']
row_x2 = ['a2','b2','c2']
row_x3 = ['a3','b3','c3']
row_x4 = ['a4','b4','c4']
row_y1 = ['d1','e1','f1']
row_y2 = ['d2','e2','f2']

row_z1 = row_x1 + row_y1
row_z2 = row_x2 + row_y1
row_z3 = row_x3 + row_y2
row_z4 = row_x4 + row_y2

df3 = pd.DataFrame([row_z1,row_z2,row_z3,row_z4], columns=list('ABCDEF'), index=index_arrays)
print(df3)

out

             A   B   C   D   E   F
first  one  a1  b1  c1  d1  e1  f1
       two  a2  b2  c2  d1  e1  f1
second one  a3  b3  c3  d2  e2  f2
       two  a4  b4  c4  d2  e2  f2
like image 909
Andreuccio Avatar asked Oct 03 '17 19:10

Andreuccio


People also ask

What is multi index Dataframe?

In this article, we will discuss Multi-index for Pandas Dataframe and Groupby operations . Multi-index allows you to select more than one row and column in your index. It is a multi-level or hierarchical object for pandas object. Now there are various methods of multi-index that are used such as MultiIndex.

How do I combine multiple data frames into one?

The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.


1 Answers

Option 1
Use pd.DataFrame.reindex + pd.DataFrame.join
reindex has a convenient level parameter that allows you to expand on the index levels not present.

df1.join(df2.reindex(df1.index, level=0))

             A   B   C   D   E   F
first  one  a1  b1  c1  d1  e1  f1
       two  a2  b2  c2  d1  e1  f1
second one  a3  b3  c3  d2  e2  f2
       two  a4  b4  c4  d2  e2  f2

Option 2
You can rename your axes and join will work

df1.rename_axis(['a', 'b']).join(df2.rename_axis('a'))

             A   B   C   D   E   F
a      b                          
first  one  a1  b1  c1  d1  e1  f1
       two  a2  b2  c2  d1  e1  f1
second one  a3  b3  c3  d2  e2  f2
       two  a4  b4  c4  d2  e2  f2

You can follow that up with another rename_axis to get desired results

df1.rename_axis(['a', 'b']).join(df2.rename_axis('a')).rename_axis([None, None])

             A   B   C   D   E   F
first  one  a1  b1  c1  d1  e1  f1
       two  a2  b2  c2  d1  e1  f1
second one  a3  b3  c3  d2  e2  f2
       two  a4  b4  c4  d2  e2  f2

like image 113
piRSquared Avatar answered Sep 17 '22 13:09

piRSquared