pandas merge with MultiIndex, when only one level of index is to be used as key

Tags:

I have a data frame called df1 with a 2-level MultiIndex (levels: '_Date' and _'ItemId'). There are multiple instances of each value of '_ItemId', like this:

                              _SomeOtherLabel
 _Date            _ItemId     
 2014-10-05       6588921     AA
                  6592520     AB 
                  6836143     BA
 2014-10-11       6588921     CA
                  6592520     CB
                  6836143     DA

I have a second data frame called df2 with '_ItemId' used as a key (not the index). In this df, there is only one occurrence of each value of _ItemId:

                  _ItemId       _Cat
  0               6588921       6_1
  1               6592520       6_1
  2               6836143       7_1

I want to recover the values in the column '_Cat' from df2 and merge them into df1 for the appropriate values of '_ItemId'. This is almost (I think?) a standard many-to-one merge, except that the appropriate key for the left df is one of MultiIndex levels. I tried this:

df1['_cat']=pd.merge(df1,df2,left_index=True, right_on='ItemId')

but I get the error

   "ValueError: len(right_on) must equal the number of levels in the index of "left"

which I suppose makes sense since my (left) index is actually made of two keys. How do I select the one index level that I need? Or is there a better approach to this merge?

Thanks

251

asked Dec 02 '14 14:12

Charles

1 Answers

I could think of 2 ways of doing this.

use set_index() and join():

>>> df1.join(df2.set_index('_ItemId'))
                   _SomeOtherLabel _Cat
_Date      _ItemId                     
2014-10-05 6588921              AA  6_1
           6592520              AB  6_1
           6836143              BA  7_1
2014-10-11 6588921              CA  6_1
           6592520              CB  6_1
           6836143              DA  7_1

or use reset_index(), merge() and then set new multiindex

I think first approach should be faster, but not sure.

133

answered Oct 24 '22 10:10

Roman Pekar

Related questions
                            
                                Plotting from Pivot Table using Pandas
                            
                                Python: Sort a list of objects based on their attributes
                            
                                Print connected components of a graph in python
                            
                                Make my python command line program interactive with argparse
                            
                                Calling C from Python: passing list of numpy pointers
                            
                                Python grammar: with_stmt
                            
                                How to specify the database for Factory Boy?
                            
                                Force Nosetests to Use Python 2.7 instead of 3.4
                            
                                pyparsing whitespace match issues
                            
                                SSL and TLS in Python requests
                            
                                Solve ode in python with complex matrix as initial value
                            
                                AES_128_CTR encryption by openssl and PyCrypto
                            
                                Which algorithm can rationally reduce multiple lists? ("who killed who" pro​blem)
                            
                                Storage capacity of dictionary and want to store huge data into dictionary in python
                            
                                Python requests SSL certificate verification fails, even after adding CA certificates
                            
                                Unexpected KeyError Pandas while trying to aggregate multiple functions into new column
                            
                                Why would using 8 threads be faster than 4 threads on a 4 core Hyper Threaded CPU?
                            
                                How can I resize the main window depending on screen resolution using PyQt
                            
                                Python Pandas: concatenate rows with unique values
                            
                                Replacing tags of one kind with tags of another in BeautifulSoup

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas merge with MultiIndex, when only one level of index is to be used as key

Tags:

python

merge

pandas

multi-index

Charles

People also ask

1 Answers

Roman Pekar

Recent Activity

Donate For Us