Number of rows changes even after `pandas.merge` with `left` option

Question

I am merging two data frames using pandas.merge. Even after specifying how = left option, I found the number of rows of merged data frame is larger than the original. Why does this happen?

panel = pd.read_csv(file1, encoding ='cp932')
before_len = len(panel)

prof_2000 = pd.read_csv(file2, encoding ='cp932').drop_duplicates()

temp_2000 = pd.merge(panel, prof_2000, left_on='Candidate_u', right_on="name2", how="left")

after_len =  len(temp_2000)
print(before_len, after_len)
> 12661 13915

Thanos · Accepted Answer

This sounds like having more than one rows in right under 'name2' that match the key you have set for the left. Using option 'how='left' with pandas.DataFrame.merge() only means that:

left: use only keys from left frame

However, the actual number of rows in the result object is not necessarily going to be the same as the number of rows in the left object.

Example:

In [359]: df_1
Out[359]: 
   A    B
0  a  AAA
1  b  BBA
2  c  CCF

and then another DF that looks like this (notice that there are more than one entry for your desired key on the left):

In [360]: df_3
Out[360]: 
  key  value
0   a      1
1   a      2
2   b      3
3   a      4

If I merge these two on left.A, here's what happens:

In [361]: df_1.merge(df_3, how='left', left_on='A', right_on='key')
Out[361]: 
   A    B  key  value
0  a  AAA    a    1.0
1  a  AAA    a    2.0
2  a  AAA    a    4.0
3  b  BBA    b    3.0
4  c  CCF  NaN    NaN

This happened even though I merged with how='left' as you can see above, there were simply more than one rows to merge and as shown here the result pd.DataFrame has in fact more rows than the pd.DataFrame on the left.

I hope this helps!

mirekphd · Answer

The problem of doubling of rows after each merge() (of any type, 'both' or 'left') is usually caused by duplicates in any of the keys, so we need to drop them first:

left_df.drop_duplicates(subset=left_key, inplace=True)
right_df.drop_duplicates(subset=right_key, inplace=True)

Number of rows changes even after `pandas.merge` with `left` option

Tags:

python

pandas

user51966

2 Answers

Thanos

mirekphd

Recent Activity

Donate For Us

Number of rows changes even after `pandas.merge` with `left` option

Tags:

python

pandas

user51966

2 Answers

Thanos

mirekphd

Related questions

Recent Activity

Donate For Us