Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas sum of two columns - dealing with nan-values correctly

When summing two pandas columns, I want to ignore nan-values when one of the two columns is a float. However when nan appears in both columns, I want to keep nan in the output (instead of 0.0).

Initial dataframe:

Surf1     Surf2
0         0
NaN       8
8         15
NaN       NaN
16        14
15        7

Desired output:

Surf1     Surf2     Sum
0         0         0
NaN       8         8
8         15        23
NaN       NaN       NaN
16        14        30
15        7         22

Tried code: -> the code below ignores nan-values but when taking the sum of two nan-values, it gives 0.0 in the output where I want to keep it as NaN in that particular case to keep these empty values separate from values that are actually 0 after summing.

import pandas as pd
import numpy as np

data = pd.DataFrame({"Surf1": [10,np.nan,8,np.nan,16,15], "Surf2": [22,8,15,np.nan,14,7]})
print(data)

data.loc[:,'Sum'] = data.loc[:,['Surf1','Surf2']].sum(axis=1)
print(data)
like image 858
Matthi9000 Avatar asked Dec 03 '22 17:12

Matthi9000


1 Answers

From the documentation pandas.DataFrame.sum

By default, the sum of an empty or all-NA Series is 0.

>>> pd.Series([]).sum() # min_count=0 is the default 0.0

This can be controlled with the min_count parameter. For example, if you’d like the sum of an empty series to be NaN, pass min_count=1.

Change your code to

data.loc[:,'Sum'] = data.loc[:,['Surf1','Surf2']].sum(axis=1, min_count=1)

output

   Surf1  Surf2
0   10.0   22.0
1    NaN    8.0
2    8.0   15.0
3    NaN    NaN
4   16.0   14.0
5   15.0    7.0
   Surf1  Surf2   Sum
0   10.0   22.0  32.0
1    NaN    8.0   8.0
2    8.0   15.0  23.0
3    NaN    NaN   NaN
4   16.0   14.0  30.0
5   15.0    7.0  22.0
like image 66
SubOptimal Avatar answered Dec 06 '22 07:12

SubOptimal