Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does df.combine() works?

Tags:

python

pandas

df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]})
df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 0]})
df1.combine(df2, take_smaller, fill_value=-5)

The above code yields result. Where does the 4.0 come from?

like image 524
Zzzqn Avatar asked Dec 22 '22 16:12

Zzzqn


1 Answers

From example in docs

take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2

This says if sum of a series in df1 is less than sum of the series in df2 , return series from df1 else from df2.

So when you do:

df1.combine(df2, take_smaller)

   A    B
0  0  3.0
1  0  0.0

This works fine.

However when you do a fill_value=-5 , then the sum of second series in the first dataframe becomes smaller since fill_value first fills NaN and then compares. (-5+4) < (3+0) , hence -5 and 4 is returned.

fill_value scalar value, default None The value to fill NaNs with prior to passing any column to the merge func.

like image 139
anky Avatar answered Jan 07 '23 12:01

anky