Executing the following merge
import pandas as pd
s = pd.Series(range(5, 10), index=range(10, 15), name='score')
df = pd.DataFrame({'id': (11, 13), 'value': ('a', 'b')})
pd.merge(s, df, 'left', left_index=True, right_on='id')
results in this data frame:
score id value
NaN 5 10 NaN
0.0 6 11 a
NaN 7 12 NaN
1.0 8 13 b
NaN 9 14 NaN
Why does Pandas take the index from the right data frame as the index for the result, instead of the index from the left series, even though I specified both a left merge and left_index=True
? The documentation says
left: use only keys from left frame
which I interpreted differently from the result I am actually getting. What I expected was the following data frame.
score id value
10 5 10 NaN
11 6 11 a
12 7 12 NaN
13 8 13 b
14 9 14 NaN
I am using Python 3.7.5 with Pandas 0.25.3.
Here's what happens:
[0, 1]
. float
To set the index, just assign to it:
s2 = pd.merge(s, df, how='left', left_index=True, right_on='id')
s2.index = s.index
score id value
10 5 10 NaN
11 6 11 a
12 7 12 NaN
13 8 13 b
14 9 14 NaN
You can also merge on s
(just because I dislike calling pd.merge
directly):
(s.to_frame()
.merge(df, how='left', left_index=True, right_on='id')
.set_axis(s.index, axis=0, inplace=False))
score id value
10 5 10 NaN
11 6 11 a
12 7 12 NaN
13 8 13 b
14 9 14 NaN
You can do this with reset_index:
df = pd.merge(s,df, 'left', left_index=True, right_on='id').reset_index(drop=True).set_index('id').rename_axis(index=None)
df.insert(1, 'id', df.index)
score id value
10 5 10 NaN
11 6 11 a
12 7 12 NaN
13 8 13 b
14 9 14 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With