Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas merge and retain the index

A similar question was asked in How to keep index when using pandas merge, but it will not work with MultiIndexes, i.e,

a = DataFrame(np.array([1,2,3,4,1,2,3,3]).reshape((4,2)), columns=['col1','to_merge_on'], index=['a','b','a','b'])
id = pd.MultiIndex.from_arrays([[1,1,2,2],['a','b','a','b']], names =['id1','id2'])
a.index = id

In [207]: a
Out[207]: 
         col1  to_merge_on
id1 id2                   
1   a       1            2
    b       3            4
2   a       1            2
    b       3            4

b=DataFrame(data={"col2": [1,2,3], 'to_merge_on' : [1,3,5]})

In [209]: b
Out[209]: 
   col2  to_merge_on
0     1            1
1     2            3
2     3            5

a.reset_index().merge(b, how="left").set_index('index')

In [208]: a.reset_index().merge(b, how="left").set_index('index')
------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2054, in set_index
    level = frame[col]
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1458, in __getitem__
    return self._get_item_cache(key)
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 294, in _get_item_cache
    values = self._data.get(item)
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 625, in get
    _, block = self._find_block(item)
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 715, in _find_block
    self._check_have(item)
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 722, in _check_have
    raise KeyError('no item named %s' % str(item))
KeyError: 'no item named index'

How can one make the merge while preserving the MultiIndex in the left dataframe?

like image 959
dmvianna Avatar asked Nov 27 '12 01:11

dmvianna


People also ask

How do I keep index on Merge Pandas?

How to Keep index when using Pandas Merge. By default, Pandas merge creates a new integer index for the merged DataFrame. If we wanted to preserve the index from the first DataFrame as the index of the merged DataFrame, we can specify the index explicitly using . set_axis(df1.

Does Pandas Groupby preserve index?

The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.

Does PD concat merge on index?

pd. concat joins on the index and can join two or more DataFrames at once. It does a full outer join by default.


1 Answers

Provisional solution:

In [255]: a = a.reset_index()

In [256]: a
Out[256]: 
   id1 id2  col1  to_merge_on
0    1   a     1            2
1    1   b     3            4
2    2   a     1            2
3    2   b     3            4

In [271]: c = pd.merge(a, b, how="left")

In [272]: c
Out[272]: 
   id1 id2  col1  to_merge_on  col2
0    1   a     1            2   NaN
1    2   a     1            2   NaN
2    2   b     3            3     2
3    1   b     3            4   NaN

In [273]: c = c.set_index(['id1','id2'])

In [274]: c
Out[274]: 
         col1  to_merge_on  col2
id1 id2                         
1   a       1            2   NaN
2   a       1            2   NaN
    b       3            3     2
1   b       3            4   NaN
like image 113
dmvianna Avatar answered Sep 28 '22 06:09

dmvianna