Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summing rows in grouped pandas dataframe and return NaN

Example

import pandas as pd
import numpy as np
d = {'l':  ['left', 'right', 'left', 'right', 'left', 'right'],
     'r': ['right', 'left', 'right', 'left', 'right', 'left'],
     'v': [-1, 1, -1, 1, -1, np.nan]}
df = pd.DataFrame(d)

Problem

When a grouped dataframe contains a value of np.NaN I want the grouped sum to be NaN as is given by the skipna=False flag for pd.Series.sum and also pd.DataFrame.sum however, this

In [235]: df.v.sum(skipna=False)
Out[235]: nan

However, this behavior is not reflected in the pandas.DataFrame.groupby object

In [237]: df.groupby('l')['v'].sum()['right']
Out[237]: 2.0

and cannot be forced by applying the np.sum method directly

In [238]: df.groupby('l')['v'].apply(np.sum)['right']
Out[238]: 2.0

Workaround

I can workaround this by doing

check_cols = ['v']
df['flag'] = df[check_cols].isnull().any(axis=1)
df.groupby('l')['v', 'flag'].apply(np.sum).apply(
    lambda x: x if not x.flag else np.nan,
    axis=1
)

but this is ugly. Is there a better method?

like image 694
Alexander McFarlane Avatar asked Mar 13 '17 17:03

Alexander McFarlane


People also ask

Does pandas sum ignore NaN?

sum() Method to Find the Sum Ignoring NaN Values. Use the default value of the skipna parameter i.e. skipna=True to find the sum of DataFrame along the specified axis, ignoring NaN values. If you set skipna=True , you'll get NaN values of sums if the DataFrame has NaN values.

How do I get the sum of specific rows in pandas?

To sum only specific rows, use the loc() method. Mention the beginning and end row index using the : operator. Using loc(), you can also set the columns to be included. We can display the result in a new column.


2 Answers

I think it's inherent to pandas. A workaround can be :

df.groupby('l')['v'].apply(array).apply(sum)

to mimic the numpy way,

or

df.groupby('l')['v'].apply(pd.Series.sum,skipna=False) # for series, or
df.groupby('l')['v'].apply(pd.DataFrame.sum,skipna=False) # for dataframes.

to call the good function.

like image 172
B. M. Avatar answered Oct 05 '22 13:10

B. M.


I'm not sure where this falls on the ugliness scale, but it works:

>>> series_sum = pd.core.series.Series.sum
>>> df.groupby('l')['v'].agg(series_sum, skipna=False)
l
left     -3
right   NaN
Name: v, dtype: float64

I just dug up the sum method you used when you took df.v.sum, which supports the skipna option:

>>> help(df.v.sum)
Help on method sum in module pandas.core.generic:

sum(axis=None, skipna=None, level=None, numeric_only=None, **kwargs) method 
of pandas.core.series.Series instance
like image 30
alexis Avatar answered Oct 05 '22 14:10

alexis