Pandas groupby mean() not ignoring NaNs

Question

If I calculate the mean of a groupby object and within one of the groups there is a NaN(s) the NaNs are ignored. Even when applying np.mean it is still returning just the mean of all valid numbers. I would expect a behaviour of returning NaN as soon as one NaN is within the group. Here a simplified example of the behaviour

import pandas as pd
import numpy as np
c = pd.DataFrame({'a':[1,np.nan,2,3],'b':[1,2,1,2]})
c.groupby('b').mean()
     a
b     
1  1.5
2  3.0
c.groupby('b').agg(np.mean)
     a
b     
1  1.5
2  3.0

I want to receive following result:

     a
b     
1  1.5
2  NaN

I am aware that I can replace NaNs beforehand and that i probably can write my own aggregation function to return NaN as soon as NaN is within the group. This function wouldn't be optimized though.

Do you know of an argument to achieve the desired behaviour with the optimized functions?

Btw, I think the desired behaviour was implemented in a previous version of pandas.

Mayank Porwal · Accepted Answer

By default, pandas skips the Nan values. You can make it include Nan by specifying skipna=False:

In [215]: c.groupby('b').agg({'a': lambda x: x.mean(skipna=False)})
Out[215]: 
     a
b     
1  1.5
2  NaN

Dmitriy Work · Answer

There is `mean(skipna=False)`, but it's not working

GroupBy aggregation methods (min, max, mean, median, etc.) have the skipna parameter, which is meant for this exact task, but it seems that currently (may-2020) there is a bug (issue opened on mar-2020), which prevents it from working correctly.

Quick workaround

Complete working example based on this comments: @Serge Ballesta, @RoelAdriaans

>>> import pandas as pd
>>> import numpy as np
>>> c = pd.DataFrame({'a':[1,np.nan,2,3],'b':[1,2,1,2]})
>>> c.fillna(np.inf).groupby('b').mean().replace(np.inf, np.nan)

     a
b     
1  1.5
2  NaN

For additional information and updates follow the link above.

Pandas groupby mean() not ignoring NaNs

Tags:

python

pandas

dataframe

nan

Tim Tee

2 Answers

Mayank Porwal

There is `mean(skipna=False)`, but it's not working

Quick workaround

Dmitriy Work

Recent Activity

Donate For Us

Pandas groupby mean() not ignoring NaNs

Tags:

python

pandas

dataframe

nan

Tim Tee

2 Answers

Mayank Porwal

There is mean(skipna=False), but it's not working

Quick workaround

Dmitriy Work

Related questions

Recent Activity

Donate For Us

There is `mean(skipna=False)`, but it's not working