Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting the index after merging with pandas?

Executing the following merge

import pandas as pd
s = pd.Series(range(5, 10), index=range(10, 15), name='score')
df = pd.DataFrame({'id': (11, 13), 'value': ('a', 'b')})
pd.merge(s, df, 'left', left_index=True, right_on='id')

results in this data frame:

     score  id value
NaN      5  10   NaN
0.0      6  11     a
NaN      7  12   NaN
1.0      8  13     b
NaN      9  14   NaN

Why does Pandas take the index from the right data frame as the index for the result, instead of the index from the left series, even though I specified both a left merge and left_index=True? The documentation says

left: use only keys from left frame

which I interpreted differently from the result I am actually getting. What I expected was the following data frame.

    score  id value
10      5  10   NaN
11      6  11     a
12      7  12   NaN
13      8  13     b
14      9  14   NaN

I am using Python 3.7.5 with Pandas 0.25.3.

like image 581
Hendrikto Avatar asked Dec 11 '19 20:12

Hendrikto


2 Answers

Here's what happens:

  1. the output index is the intersection of the index/column merge keys [0, 1].
  2. missing keys are replaced with NaN
  3. NaNs result in the index type being upcasted to float

To set the index, just assign to it:

s2 = pd.merge(s, df, how='left', left_index=True, right_on='id')
s2.index = s.index

    score  id value
10      5  10   NaN
11      6  11     a
12      7  12   NaN
13      8  13     b
14      9  14   NaN

You can also merge on s (just because I dislike calling pd.merge directly):

(s.to_frame()
  .merge(df, how='left', left_index=True, right_on='id')
  .set_axis(s.index, axis=0, inplace=False))

    score  id value
10      5  10   NaN
11      6  11     a
12      7  12   NaN
13      8  13     b
14      9  14   NaN
like image 52
cs95 Avatar answered Oct 19 '22 12:10

cs95


You can do this with reset_index:

df = pd.merge(s,df, 'left', left_index=True, right_on='id').reset_index(drop=True).set_index('id').rename_axis(index=None)
df.insert(1, 'id', df.index)

    score  id value
10      5  10   NaN
11      6  11     a
12      7  12   NaN
13      8  13     b
14      9  14   NaN
like image 2
oppressionslayer Avatar answered Oct 19 '22 11:10

oppressionslayer