Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas merge on index not working

I have two dataframes (Series actually) generated by a groupby operation:

bw

l1
Consumer Discretionary         0.118718
Consumer Staples               0.089850
Energy                         0.109988
Financials                     0.159418
Health Care                    0.115060
Industrials                    0.109078
Information Technology         0.200392
Materials                      0.035509
Telecommunications Services    0.030796
Utilities                      0.031190
dtype: float64

and pw

l1
Consumer Discretionary         0.148655
Consumer Staples               0.067873
Energy                         0.063899
Financials                     0.095689
Health Care                    0.116015
Industrials                    0.181346
Information Technology         0.117715
Materials                      0.043155
Telecommunications Services    0.009550
Utilities                      0.156103
dtype: float64

When I try and merge them using

pd.merge(bw,pw,left_index=True,right_index=True)

I get an error

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2883, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-174-739bb362e06d>", line 1, in <module>
    pd.merge(pw,attr,left_index=True,right_index=True)
  File "/usr/lib/python2.7/dist-packages/pandas/tools/merge.py", line 39, in merge
    return op.get_result()
  File "/usr/lib/python2.7/dist-packages/pandas/tools/merge.py", line 185, in get_result
    join_index, left_indexer, right_indexer = self._get_join_info()
  File "/usr/lib/python2.7/dist-packages/pandas/tools/merge.py", line 251, in _get_join_info
    left_ax = self.left._data.axes[self.axis]
IndexError: list index out of range

but when I do

bw = bw.reset_index()
pw = pw.reset_index()
mrg = pd.merge(pw,bw,on="l1")

It works. It makes my code much less readable over multiple iterations of joins however so I'd like to know what I'm doing wrong and how I can get the first version of the code merging on indexes to work.

Thanks

like image 710
Tahnoon Pasha Avatar asked Dec 03 '14 21:12

Tahnoon Pasha


1 Answers

Change the series into DataFrame then it is possible to merge

merged = pd.merge(pd.DataFrame(bw),pd.DataFrame(pw),left_index=True,right_index=True)
print(merged)

The result:

                                 0_x       0_y
l1                                             
Consumer Discretionary       0.118718  0.118718
Consumer Staples             0.089850  0.089850
Energy                       0.109988  0.109988
Financials                   0.159418  0.159418
Health Care                  0.115060  0.115060
Industrials                  0.109078  0.109078
Information Technology       0.200392  0.200392
Materials                    0.035509  0.222509
Telecommunications Services  0.030796  0.030796
Utilities                    0.031190  0.031190

Or if the merge is to be performed in a parallel manner (bw and pw have the same index, same number of items).

c = zip(bw.tolist(),pw.tolist())
merged = pd.DataFrame(c, index=bw.index)

should have the same result.

When you reset_index() a series, it turns to a DataFrame (index to column). That is why you can merge after that.

like image 172
Robbie Liu Avatar answered Oct 20 '22 21:10

Robbie Liu