I have a dataframe with various columns and would like to compute mean values of groups under the condition that each group has a minimum number of valid members. I tried the following using groupby, filter, and mean. It seems to work, but I am wondering if there is a more efficient solution?
import pandas as pd
import numpy as np
df = pd.DataFrame({'id' : ['one', 'one', 'two', 'three', 'two',
'two', 'two', 'one', 'three', 'one'],
'idprop' : [1., 1., 2., 3., 2., # property corresponding to id
2., 2., 1., 3., 1.],
'x' : np.random.randn(10),
'y' : np.random.randn(10)})
# set a couple of x values to nan
s = df['x'].values
s[s < -0.6] = np.nan
df['x'] = s
g = df.groupby('id', sort=False)
# filter out small group(s) with less than 3 valid values in x
# result is a new dataframe
dff = g.filter(lambda d: d['x'].count() >= 3)
# this means we must group again to obtain the mean value of each filtered group
result = dff.groupby('id').mean()
print result
print type(result)
There is a related question at how to get multiple conditional operations after a Pandas groupby? which, however, only "filters" by row values not by the number of group elements. Converted to my code this would be:
res2 = g.agg({'x': lambda d: df.loc[d.index, 'x'][d >= -0.6].sum()})
As a side question: is there a more efficient way to set values below or above a given threshold to NaN? My brain got twisted when I tried this using loc.
You can achieve this using the groupby apply function:
def mean_cond(dfg):
if dfg['x'].count() >= 3:
return dfg.mean()
return None
print df.groupby('id').apply(mean_cond).dropna()
The advantage here is that the grouping process is performed only once, which might be more efficient than running another groupby after the filter. The only problem, perhaps, is that this causes groups which don't meet the criteria to be presented as NaNs in the resulting table. This is easily solved by adding the dropna
command in the end.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With