Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas merge with MultiIndex, when only one level of index is to be used as key

I have a data frame called df1 with a 2-level MultiIndex (levels: '_Date' and _'ItemId'). There are multiple instances of each value of '_ItemId', like this:

                              _SomeOtherLabel
 _Date            _ItemId     
 2014-10-05       6588921     AA
                  6592520     AB 
                  6836143     BA
 2014-10-11       6588921     CA
                  6592520     CB
                  6836143     DA 

I have a second data frame called df2 with '_ItemId' used as a key (not the index). In this df, there is only one occurrence of each value of _ItemId:

                  _ItemId       _Cat
  0               6588921       6_1
  1               6592520       6_1
  2               6836143       7_1

I want to recover the values in the column '_Cat' from df2 and merge them into df1 for the appropriate values of '_ItemId'. This is almost (I think?) a standard many-to-one merge, except that the appropriate key for the left df is one of MultiIndex levels. I tried this:

df1['_cat']=pd.merge(df1,df2,left_index=True, right_on='ItemId')  

but I get the error

   "ValueError: len(right_on) must equal the number of levels in the index of "left"

which I suppose makes sense since my (left) index is actually made of two keys. How do I select the one index level that I need? Or is there a better approach to this merge?

Thanks

like image 251
Charles Avatar asked Dec 02 '14 14:12

Charles


People also ask

How to revert multi-index to single index in pandas Dataframe?

To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index (). Syntax: DataFrame.reset_index (level=None, drop=False, inplace=False, col_level=0, col_fill=”) Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True. Reverting the Multi-index using the above way i.

What are multi-indexes in a Dataframe?

As we know the multi-indexes form a hierarchy of indexes, that’s why these are also known as hierarchical indexes. In this Dataframe, ‘region’ is the level (0) index or the main index and the ‘state’ is the level (1) index and ‘individuals’ is the level (2) index.

Is it possible to merge a singly-indexed Dataframe with a multi- indexed Dataframe?

Yes, since pandas 0.14.0, it is now possible to merge a singly-indexed DataFrame with a level of a multi-indexed DataFrame using .join. The 0.14 pandas docs describes this as equivalent but more memory efficient and faster than: merge (df1.reset_index (), df2.reset_index (), on= ['index1'], how='inner' ).set_index ( ['index1','index2'])

How do I join two DataFrames with different indexes?

If it is a MultiIndex, the number of keys in the other DataFrame (either the index or a number of columns) must match the number of levels. Use the index from the right DataFrame as the join key.


1 Answers

I could think of 2 ways of doing this.

use set_index() and join():

>>> df1.join(df2.set_index('_ItemId'))
                   _SomeOtherLabel _Cat
_Date      _ItemId                     
2014-10-05 6588921              AA  6_1
           6592520              AB  6_1
           6836143              BA  7_1
2014-10-11 6588921              CA  6_1
           6592520              CB  6_1
           6836143              DA  7_1

or use reset_index(), merge() and then set new multiindex

I think first approach should be faster, but not sure.

like image 133
Roman Pekar Avatar answered Oct 24 '22 10:10

Roman Pekar