I aggregate my Pandas dataframe: <code>data</code>. Specifically, I want to get the average and sum <code>amount</code>s by tuples of [<code>origin</code> and <code>type</code>]. For averaging and summing I tried the numpy functions below: <pre class="prettyprint"><code>import numpy as np import pandas as pd result = data.groupby(groupbyvars).agg({'amount': [ pd.Series.sum, pd.Series.mean]}).reset_index() </code></pre> My issue is that the <code>amount</code> column includes <code>NaN</code>s, which causes the <code>result</code> of the above code to have a lot of <code>NaN</code> average and sums. I know both <code>pd.Series.sum</code> and <code>pd.Series.mean</code> have <code>skipna=True</code> by default, so why am I still getting <code>NaN</code>s here? I also tried this, which obviously did not work: <pre class="prettyprint"><code>data.groupby(groupbyvars).agg({'amount': [ pd.Series.sum(skipna=True), pd.Series.mean(skipna=True)]}).reset_index() </code></pre> EDIT: Upon @Korem's suggestion, I also tried to use a <code>partial</code> as below: <pre class="prettyprint"><code>s_na_mean = partial(pd.Series.mean, skipna = True) data.groupby(groupbyvars).agg({'amount': [ np.nansum, s_na_mean ]}).reset_index() </code></pre> but get this error: <pre class="prettyprint"><code>error: 'functools.partial' object has no attribute '__name__' </code></pre>

Use numpy's nansum and nanmean: <pre class="prettyprint"><code>from numpy import nansum from numpy import nanmean data.groupby(groupbyvars).agg({'amount': [ nansum, nanmean]}).reset_index() </code></pre> As a workaround for older version of numpy, and also a way to fix your last try: When you do <code>pd.Series.sum(skipna=True)</code> you actually call the method. If you want to use it like this you want to define a partial. So if you don't have <code>nanmean</code>, let's define <code>s_na_mean</code> and use that: <pre class="prettyprint"><code>from functools import partial s_na_mean = partial(pd.Series.mean, skipna = True) </code></pre>

Pandas aggregation ignoring NaN's

Tags:

python

pandas

nan

aggregate

numpy

I aggregate my Pandas dataframe: data. Specifically, I want to get the average and sum amounts by tuples of [origin and type]. For averaging and summing I tried the numpy functions below:

import numpy as np
import pandas as pd
result = data.groupby(groupbyvars).agg({'amount': [ pd.Series.sum, pd.Series.mean]}).reset_index()

My issue is that the amount column includes NaNs, which causes the result of the above code to have a lot of NaN average and sums.

I know both pd.Series.sum and pd.Series.mean have skipna=True by default, so why am I still getting NaNs here?

I also tried this, which obviously did not work:

data.groupby(groupbyvars).agg({'amount': [ pd.Series.sum(skipna=True), pd.Series.mean(skipna=True)]}).reset_index()

EDIT: Upon @Korem's suggestion, I also tried to use a partial as below:

s_na_mean = partial(pd.Series.mean, skipna = True)    
data.groupby(groupbyvars).agg({'amount': [ np.nansum, s_na_mean ]}).reset_index()

but get this error:

error: 'functools.partial' object has no attribute '__name__'

351

asked Oct 01 '14 16:10

Zhubarb

2 Answers

Use numpy's nansum and nanmean:

from numpy import nansum
from numpy import nanmean
data.groupby(groupbyvars).agg({'amount': [ nansum, nanmean]}).reset_index()

As a workaround for older version of numpy, and also a way to fix your last try:

When you do pd.Series.sum(skipna=True) you actually call the method. If you want to use it like this you want to define a partial. So if you don't have nanmean, let's define s_na_mean and use that:

from functools import partial
s_na_mean = partial(pd.Series.mean, skipna = True)

153

answered Sep 22 '22 12:09

Korem

It might be too late but anyways it might be useful for others.

Try apply function:

import numpy as np
import pandas as pd

def nan_agg(x):
    res = {}

    res['nansum'] = x.loc[ not x['amount'].isnull(), :]['amount'].sum()
    res['nanmean'] = x.loc[ not x['amount'].isnull(), :]['amount'].mean()

    return pd.Series(res, index=['nansum', 'nanmean'])

result = data.groupby(groupbyvars).apply(nan_agg).reset_index()

answered Sep 22 '22 12:09

Miros

Related questions
                            
                                How do I set attribute default values in sqlalchemy declarative?
                            
                                pyplot tab character
                            
                                No Column Names in pandas python
                            
                                Detect whether sequence is a multiple of a subsequence in Python
                            
                                What are some Python libraries written to demostrate Functional Reactive Programming? [closed]
                            
                                Determine if string input could be a valid directory in Python
                            
                                Find starting and ending indices of sublist in list
                            
                                django admin - select reverse foreign key relationships (not create, I want to add available)
                            
                                Prevent CSS/other resource download in PhantomJS/Selenium driven by Python
                            
                                No module named 'x' when reloading with os.execl()
                            
                                Modify file create / access / write timestamp with python under windows
                            
                                Custom user model in django
                            
                                Why is json.loads an order of magnitude faster than ast.literal_eval?
                            
                                Spyder IDE: How do you configure default end-of-line character?
                            
                                Pandas MultiIndex versus Panel
                            
                                How to make a rest_framework Serializer disallow superfluous fields?
                            
                                How to embed an interactive matplotlib plot in a webpage
                            
                                How can a plug-in enhance Anki's JavaScript?
                            
                                Fast checking of ranges in Python
                            
                                How to compress csv file into zip archive directly?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With