Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge multi-indexed with single-indexed data frames in pandas

Tags:

python

pandas

People also ask

How do I convert MultiIndex to single index in pandas?

To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index(). Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True.


You could use get_level_values:

firsts = df1.index.get_level_values('first')
df1['value2'] = df2.loc[firsts].values

Note: you are almost doing a join here (except the df1 is MultiIndex)... so there may be a neater way to describe this...

.

In an example (similar to what you have):

df1 = pd.DataFrame([['a', 'x', 0.123], ['a','x', 0.234],
                    ['a', 'y', 0.451], ['b', 'x', 0.453]],
                   columns=['first', 'second', 'value1']
                   ).set_index(['first', 'second'])
df2 = pd.DataFrame([['a', 10],['b', 20]],
                   columns=['first', 'value']).set_index(['first'])

firsts = df1.index.get_level_values('first')
df1['value2'] = df2.loc[firsts].values

In [5]: df1
Out[5]: 
              value1  value2
first second                
a     x        0.123      10
      x        0.234      10
      y        0.451      10
b     x        0.453      20

According to the documentation, as of pandas 0.14, you can simply join single-index and multiindex dataframes. It will match on the common index name. The how argument works as expected with 'inner' and 'outer', though interestingly it seems to be reversed for 'left' and 'right' (could this be a bug?).

df1 = pd.DataFrame([['a', 'x', 0.471780], ['a','y', 0.774908], ['a', 'z', 0.563634],
                    ['b', 'x', -0.353756], ['b', 'y', 0.368062], ['b', 'z', -1.721840],
                    ['c', 'x', 1], ['c', 'y', 2], ['c', 'z', 3],
                   ],
                   columns=['first', 'second', 'value1']
                   ).set_index(['first', 'second'])
df2 = pd.DataFrame([['a', 10], ['b', 20]],
                   columns=['first', 'value2']).set_index(['first'])

print(df1.join(df2, how='inner'))
                value1  value2
first second                  
a     x       0.471780      10
      y       0.774908      10
      z       0.563634      10
b     x      -0.353756      20
      y       0.368062      20
      z      -1.721840      20

As the .ix syntax is a powerful shortcut to reindexing, but in this case you are actually not doing any combined rows/column reindexing, this can be done a bit more elegantly (for my humble taste buds) with just using reindexing:

Preparation from hayden:

df1 = pd.DataFrame([['a', 'x', 0.123], ['a','x', 0.234],
                    ['a', 'y', 0.451], ['b', 'x', 0.453]],
                   columns=['first', 'second', 'value1']
                   ).set_index(['first', 'second'])
df2 = pd.DataFrame([['a', 10],['b', 20]],
                   columns=['first', 'value']).set_index(['first'])

Then this looks like this in iPython:

In [4]: df1
Out[4]: 
              value1
first second        
a     x        0.123
      x        0.234
      y        0.451
b     x        0.453

In [5]: df2
Out[5]: 
       value
first       
a         10
b         20

In [7]: df2.reindex(df1.index, level=0)
Out[7]: 
              value
first second       
a     x          10
      x          10
      y          10
b     x          20

In [8]: df1['value2'] = df2.reindex(df1.index, level=0)

In [9]: df1
Out[9]: 
              value1  value2
first second                
a     x        0.123      10
      x        0.234      10
      y        0.451      10
b     x        0.453      20

The mnemotechnic for what level you have to use in the reindex method: It states for the level that you already covered in the bigger index. So, in this case df2 already had level 0 covered of the df1.index.