Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join dataframes - one with multiindex columns and the other without

I'm trying to join two dataframes - one with multiindex columns and the other with a single column name. They have similar index.

I get the following warning: "UserWarning: merging between different levels can give an unintended result (3 levels on the left, 1 on the right)"

For example:

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)
df2 = pd.DataFrame(np.random.randn(3), index=['A', 'B', 'C'],columns=['w'])
df3 = df.join(df2)

What is the best way to join these two dataframes?

like image 808
Eyal S. Avatar asked Apr 05 '17 06:04

Eyal S.


People also ask

Which are the 3 main ways of combining DataFrames together?

Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame.

Can you concatenate DataFrames with different columns?

It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.

How do I join two DataFrames in pandas based on multiple columns?

To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.


2 Answers

It depends on what you want! Do you want the column from df2 to be aligned with the 1st or second level of columns from df?

You have to add a level to the columns of df2

Super cheezy with pd.concat

df.join(pd.concat([df2], axis=1, keys=['a']))

Better way

df2.columns = pd.MultiIndex.from_product([['a'], df2.columns])

df.join(df2)

enter image description here

like image 190
piRSquared Avatar answered Oct 22 '22 13:10

piRSquared


I think simpliest is create MultiIndex in df2 and then use concat or join:

df2.columns = pd.MultiIndex.from_tuples([('a','w')])
print (df2)
          a
          w
A -0.562729
B -0.212032
C  0.102451
df2.columns = [['a'], df2.columns]
print (df2)
          a
          w
A -1.253881
B -0.637752
C  0.907105

df3 = pd.concat([df, df2], axis=1)

Or:

df3 = df.join(df2)

print (df3)
first        bar                 baz                 foo                 qux  \
second       one       two       one       two       one       two       one   
A      -0.269667  0.221566  1.138393  0.871762 -0.063132 -1.995682 -0.797885   
B      -0.456878  0.293350 -1.040748 -1.307871  0.002462  1.580711 -0.198943   
C      -0.691755 -0.279445 -0.809215 -0.006658  1.452484  0.516414 -0.295961   

first                    a  
second       two         w  
A       1.068843 -0.562729  
B       1.247057 -0.212032  
C      -0.345300  0.102451  
like image 30
jezrael Avatar answered Oct 22 '22 13:10

jezrael