Pandas Groupby apply function to count values greater than zero
I am using groupby and agg in the following manner:
df.groupby('group')['a'].agg({'mean' : np.mean, 'std' : np.std})
and I would like to also count the values above zero in the same column ['a']
the following line does the count as I want,
sum(x > 0 for x in df['a'])
but I can't get it work when applying to groupby.
Following an example for applying a pandas calculation to a groupby I tried:
df.groupby('group')['a'].apply(sum(x > 0 for x in df['a']))
but I get an error message: AttributeError: 'numpy.int32' object has no attribute 'module'
Can anybody please suggest how this might be done?
count_nonzero() function. It will return the count of True values in Series i.e. count of values greater than the given limit in the selected column.
x > x. mean() gives True if the element is larger than the mean and 0 otherwise, sum then counts the number of Trues.
Use count() by Column Name Use pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values.
The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.
Answer from the comments:
.agg({'pos':lambda ts: (ts > 0).sum()}) # – behzad.nouri Mar 31 at 0:00
This is my contribution to the backlog of unanswered questions :) Credits to behzad.nouri
Update 2020 In the latest pandas version, you need to do the following:
.agg(pos=lambda ts: (ts > 0).sum())
otherwise it will result in the following error:
SpecificationError: nested renamer is not supported
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With