Pandas Merge (pd.merge) How to set the index and join

Tags:

python

pandas

I have two pandas dataframes: dfLeft and dfRight with the date as the index.

dfLeft:

            cusip    factorL
date  
2012-01-03    XXXX      4.5
2012-01-03    YYYY      6.2
....
2012-01-04    XXXX      4.7
2012-01-04    YYYY      6.1
....

dfRight:

            idc__id    factorR
date  
2012-01-03    XXXX      5.0
2012-01-03    YYYY      6.0
....
2012-01-04    XXXX      5.1
2012-01-04    YYYY      6.2

Both have a shape close to (121900,3)

I tried the following merge:

test = pd.merge(dfLeft, dfRight, left_index=True, right_index=True, left_on='cusip', right_on='idc__id', how = 'inner')

This gave test a shape of (60643500, 6).

Any recommendations on what is going wrong here? I want it to merge based on both date and cusip/idc_id. Note: for this example the cusips are lined up, but in reality that may not be so.

Thanks.

Expected Output test:

             cusip    factorL    factorR
date  
2012-01-03    XXXX      4.5          5.0
2012-01-03    YYYY      6.2          6.0
....
2012-01-04    XXXX      4.7          5.1
2012-01-04    YYYY      6.1          6.2

530

asked Jan 15 '13 16:01

user1911092

1 Answers

Reset the indices and then merge on multiple (column-)keys:

dfLeft.reset_index(inplace=True)
dfRight.reset_index(inplace=True)
dfMerged = pd.merge(dfLeft, dfRight,
              left_on=['date', 'cusip'],
              right_on=['date', 'idc__id'],
              how='inner')

You can then reset 'date' as an index:

dfMerged.set_index('date', inplace=True)

Here's an example:

raw1 = '''
2012-01-03    XXXX      4.5
2012-01-03    YYYY      6.2
2012-01-04    XXXX      4.7
2012-01-04    YYYY      6.1
'''

raw2 = '''
2012-01-03    XYXX      45.
2012-01-03    YYYY      62.
2012-01-04    XXXX      -47.
2012-01-05    YYYY      61.
'''

import pandas as pd
from StringIO import StringIO


df1 = pd.read_table(StringIO(raw1), header=None,
                    delim_whitespace=True, parse_dates=[0], skiprows=1)
df2 = pd.read_table(StringIO(raw2), header=None,
                    delim_whitespace=True, parse_dates=[0], skiprows=1)

df1.columns = ['date', 'cusip', 'factorL']
df2.columns = ['date', 'idc__id', 'factorL']

print pd.merge(df1, df2,
         left_on=['date', 'cusip'],
         right_on=['date', 'idc__id'],
         how='inner')

which gives

                  date cusip  factorL_x idc__id  factorL_y
0  2012-01-03 00:00:00  YYYY        6.2    YYYY         62
1  2012-01-04 00:00:00  XXXX        4.7    XXXX        -47

190

answered Oct 07 '22 15:10

tzelleke

Related questions
                            
                                'getattr(): attribute name must be string' error in admin panel for a model with an ImageField
                            
                                How can i grab CData out of BeautifulSoup
                            
                                Opensource Online IDE [closed]
                            
                                Dealing with UTF-8 numbers in Python
                            
                                Does the MySQLdb module support prepared statements? [duplicate]
                            
                                passing an argument to a custom save() method
                            
                                small code redundancy within while-loops (doesn't feel clean)
                            
                                Python ftplib timing out
                            
                                Is there a library for image warping / image morphing for python with controlled points? [closed]
                            
                                Using exponentiation **0.5 less efficient than math.sqrt?
                            
                                Method without return value in python c extension module
                            
                                checking assertions in a lambda in python
                            
                                How to call the original method when it is monkey-patched?
                            
                                suppressing print as stdout python
                            
                                Where do you set the task_id of a celery task?
                            
                                python SyntaxError with dict(1=...), but {1:...} works
                            
                                How to have an alias of URL on Python Flask?
                            
                                How can I check attribute existings by Elementtree?
                            
                                Sublist in a List
                            
                                How to compile all py file to pyc file in a folder by writing a python script?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With