Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you merge two Pandas dataframes with different column index levels?

Tags:

python

pandas

I want to concatenate two dataframes with same indices but different column-levels. One dataframe has a hierarchical index, the other on doesnt.

print df1

              A_1               A_2               A_3                .....
              Value_V  Value_y  Value_V  Value_y  Value_V  Value_y

instance200   50       0        6500     1        50       0
instance201   100      0        6400     1        50       0

the other one:

print df2

              PV         Estimate

instance200   2002313    1231233
instance201   2134124    1124724

result should look like this:

             PV        Estimate   A_1               A_2               A_3                .....
                                  Value_V  Value_y  Value_V  Value_y  Value_V  Value_y

instance200  2002313   1231233    50       0        6500     1        50       0
instance201  2134124   1124724    100      0        6400     1        50       0

but a merge or concatenate on the frames will give me a df with a one-dimensional column index like that:

             PV        Estimate   (A_1,Value_V) (A_1,Value_y) (A_2,Value_V) (A_2,Value_y)  .....


instance200  2002313   1231233    50             0             6500         1
instance201  2134124   1124724    100            0             6400         1 

How can i keep the hierarchical index from df1?

like image 563
Pat Patterson Avatar asked Mar 03 '15 01:03

Pat Patterson


2 Answers

Perhaps use good ole assignment:

df3 = df1.copy()
df3[df2.columns] = df2

yields

                A_1             A_2             A_3               PV Estimate
            Value_V Value_y Value_V Value_y Value_V Value_y                  
instance200      50       0    6500       1      50       0  2002313  1231233
instance201     100       0    6400       1      50       0  2134124  1124724
like image 95
unutbu Avatar answered Nov 15 '22 15:11

unutbu


You could do this by making df2 have the same number of levels as df1:

In [11]: df1
Out[11]:
                A_1             A_2             A_3
            Value_V Value_y Value_V Value_y Value_V Value_y
instance200      50       0    6500       1      50       0
instance201     100       0    6400       1      50       0

In [12]: df2
Out[12]:
                  PV  Estimate
instance200  2002313   1231233
instance201  2134124   1124724

In [13]: df2.columns = pd.MultiIndex.from_arrays([df2.columns, [None] * len(df2.columns)])

In [14]: df2
Out[14]:
                  PV Estimate
                 NaN      NaN
instance200  2002313  1231233
instance201  2134124  1124724

Now you are able to do the concat without mangling the column names:

In [15]: pd.concat([df1, df2], axis=1)
Out[15]:
                A_1             A_2             A_3               PV Estimate
            Value_V Value_y Value_V Value_y Value_V Value_y      NaN      NaN
instance200      50       0    6500       1      50       0  2002313  1231233
instance201     100       0    6400       1      50       0  2134124  1124724

Note: to have the df2 columns first use pd.concat([df2, df1], axis=1).


That said, I'm not sure I can think of a use case for this, keeping them as separate DataFrames might be actually be an easier solution...!

like image 4
Andy Hayden Avatar answered Nov 15 '22 13:11

Andy Hayden