I'm trying to sum across columns of a Pandas dataframe, and when I have NaNs in every column I'm getting sum = zero; I'd expected sum = NaN based on the docs. Here's what I've got:
In [136]: df = pd.DataFrame() In [137]: df['a'] = [1,2,np.nan,3] In [138]: df['b'] = [4,5,np.nan,6] In [139]: df Out[139]: a b 0 1 4 1 2 5 2 NaN NaN 3 3 6 In [140]: df['total'] = df.sum(axis=1) In [141]: df Out[141]: a b total 0 1 4 5 1 2 5 7 2 NaN NaN 0 3 3 6 9
The pandas.DataFrame.sum docs say "If an entire row/column is NA, the result will be NA", so I don't understand why "total" = 0 and not NaN for index 2. What am I missing?
pandas documentation » API Reference » DataFrame » pandas.DataFrame »
DataFrame.sum(self, axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
min_count: int, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
New in version 0.22.0: Added with the default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.
Quoting from pandas latest docs it says the min_count
will be 0 for the all-NA series.
If you say min_count=1
then the result of the sum will be a NaN
.
Great link provided by Jeff.
Here you can find a example:
df1 = pd.DataFrame(); df1['a'] = [1,2,np.nan,3]; df1['b'] = [np.nan,2,np.nan,3] df1 Out[4]: a b 0 1.0 NaN 1 2.0 2.0 2 NaN NaN 3 3.0 3.0 df1.sum(axis=1, skipna=False) Out[6]: 0 NaN 1 4.0 2 NaN 3 6.0 dtype: float64 df1.sum(axis=1, skipna=True) Out[7]: 0 1.0 1 4.0 2 0.0 3 6.0 dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With