I wanted to ask a questions regarding merging multiindex dataframe in pandas, here is a hypothetical scenario:
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])
s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])
Then either
s1.merge(s2, how='left', left_index=True, right_index=True)
or
s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])
will result in error.
Do I have to do reset_index()
on either s1
/s2
to make this work?
We can use either pandas. merge() or DataFrame. merge() to merge multiple Dataframes. Merging multiple Dataframes is similar to SQL join and supports different types of join inner , left , right , outer , cross .
Seems like you need to use a combination of them.
s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
#s1.merge(s2, right_index=True, left_on=['first', 'second'])
s1 s2
bar one 0.765385 -0.365508
two 1.462860 0.751862
baz one 0.304163 0.761663
two -0.816658 -1.810634
foo one 1.891434 1.450081
two 0.571294 1.116862
qux one 1.056516 -0.052927
two -0.574916 -1.197596
Other than using the indexes names as pointed by @ALollz, you can simply use loc
, which will match indexes automatically
s1.loc[:, 's2'] = s2 # Or explicitly, s2['s2']
s1 s2
first second
bar one -0.111384 -2.341803
two -1.226569 1.308240
baz one 1.880835 0.697946
two -0.008979 -0.247896
foo one 0.103864 -1.039990
two 0.836931 0.000811
qux one -0.859005 -1.199615
two -0.321341 -1.098691
A general formula would be
s1.loc[:, s2.columns] = s2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With