A similar question was asked in How to keep index when using pandas merge, but it will not work with MultiIndexes, i.e,
a = DataFrame(np.array([1,2,3,4,1,2,3,3]).reshape((4,2)), columns=['col1','to_merge_on'], index=['a','b','a','b'])
id = pd.MultiIndex.from_arrays([[1,1,2,2],['a','b','a','b']], names =['id1','id2'])
a.index = id
In [207]: a
Out[207]:
col1 to_merge_on
id1 id2
1 a 1 2
b 3 4
2 a 1 2
b 3 4
b=DataFrame(data={"col2": [1,2,3], 'to_merge_on' : [1,3,5]})
In [209]: b
Out[209]:
col2 to_merge_on
0 1 1
1 2 3
2 3 5
a.reset_index().merge(b, how="left").set_index('index')
In [208]: a.reset_index().merge(b, how="left").set_index('index')
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2054, in set_index
level = frame[col]
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1458, in __getitem__
return self._get_item_cache(key)
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 294, in _get_item_cache
values = self._data.get(item)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 625, in get
_, block = self._find_block(item)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 715, in _find_block
self._check_have(item)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 722, in _check_have
raise KeyError('no item named %s' % str(item))
KeyError: 'no item named index'
How can one make the merge while preserving the MultiIndex in the left dataframe?
How to Keep index when using Pandas Merge. By default, Pandas merge creates a new integer index for the merged DataFrame. If we wanted to preserve the index from the first DataFrame as the index of the merged DataFrame, we can specify the index explicitly using . set_axis(df1.
The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.
pd. concat joins on the index and can join two or more DataFrames at once. It does a full outer join by default.
Provisional solution:
In [255]: a = a.reset_index()
In [256]: a
Out[256]:
id1 id2 col1 to_merge_on
0 1 a 1 2
1 1 b 3 4
2 2 a 1 2
3 2 b 3 4
In [271]: c = pd.merge(a, b, how="left")
In [272]: c
Out[272]:
id1 id2 col1 to_merge_on col2
0 1 a 1 2 NaN
1 2 a 1 2 NaN
2 2 b 3 3 2
3 1 b 3 4 NaN
In [273]: c = c.set_index(['id1','id2'])
In [274]: c
Out[274]:
col1 to_merge_on col2
id1 id2
1 a 1 2 NaN
2 a 1 2 NaN
b 3 3 2
1 b 3 4 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With